13 Jul, 2011

1 commit


08 Jul, 2011

1 commit

  • GCC 4.6 complains about an array subscript is above array bounds when
    using the btree index to index into the agf_levels array. The only
    two indices passed in are 0 and 1, and we have an assert insuring that.

    Replace the trick of using the array index directly with using constants
    in the already existing branch for assigning the XFS_BTREE_LASTREC_UPDATE
    flag.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     

25 May, 2011

1 commit

  • Blocks for the allocation btree are allocated from and released to
    the AGFL, and thus frequently reused. Even worse we do not have an
    easy way to avoid using an AGFL block when it is discarded due to
    the simple FILO list of free blocks, and thus can frequently stall
    on blocks that are currently undergoing a discard.

    Add a flag to the busy extent tracking structure to skip the discard
    for allocation btree blocks. In normal operation these blocks are
    reused frequently enough that there is no need to discard them
    anyway, but if they spill over to the allocation btree as part of a
    balance we "leak" blocks that we would otherwise discard. We could
    fix this by adding another flag and keeping these block in the
    rbtree even after they aren't busy any more so that we could discard
    them when they migrate out of the AGFL. Given that this would cause
    significant overhead I don't think it's worthwile for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

29 Apr, 2011

2 commits

  • Update the extent tree in case we have to reuse a busy extent, so that it
    always is kept uptodate. This is done by replacing the busy list searches
    with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree
    in case of a reuse. This allows us to allow reusing metadata extents
    unconditionally, and thus avoid log forces especially for allocation btree
    blocks.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • While we need to make sure we do not reuse busy extents, there is no need
    to force out busy extents when moving them between the AGFL and the
    freespace btree as we still take care of that when doing the real allocation.

    To avoid the log force when just moving extents from the different free
    space tracking structures, move the busy search out of
    xfs_alloc_get_freelist into the callers that need it, and move the busy
    list insert from xfs_free_ag_extent which is used both by AGFL refills
    and real allocation to xfs_free_extent, which is only used by the latter.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

19 Oct, 2010

1 commit

  • The implementation os ->kill_root only differ by either simply
    zeroing out the now unused buffer in the btree cursor in the inode
    allocation btree or using xfs_btree_setbuf in the allocation btree.

    Initially both of them used xfs_btree_setbuf, but the use in the
    ialloc btree was removed early on because it interacted badly with
    xfs_trans_binval.

    In addition to zeroing out the buffer in the cursor xfs_btree_setbuf
    updates the bc_ra array in the btree cursor, and calls
    xfs_trans_brelse on the buffer previous occupying the slot.

    The bc_ra update should be done for the alloc btree updated too,
    although the lack of it does not cause serious problems. The
    xfs_trans_brelse call on the other hand is effectively a no-op in
    the end - it keeps decrementing the bli_recur refcount until it hits
    zero, and then just skips out because the buffer will always be
    dirty at this point. So removing it for the allocation btree is
    just fine.

    So unify the code and move it to xfs_btree.c. While we're at it
    also replace the call to xfs_btree_setbuf with a NULL bp argument in
    xfs_btree_del_cursor with a direct call to xfs_trans_brelse given
    that the cursor is beeing freed just after this and the state
    updates are superflous. After this xfs_btree_setbuf is only used
    with a non-NULL bp argument and can thus be simplified.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

27 Jul, 2010

2 commits

  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Dmapi support was never merged upstream, but we still have a lot of hooks
    bloating XFS for it, all over the fast pathes of the filesystem.

    This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM
    support in mainline at least the namespace events can be done much saner
    in the VFS instead of the individual filesystem, so it's not like this
    is much help for future work.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     

24 May, 2010

1 commit

  • When we free a metadata extent, we record it in the per-AG busy
    extent array so that it is not re-used before the freeing
    transaction hits the disk. This array is fixed size, so when it
    overflows we make further allocation transactions synchronous
    because we cannot track more freed extents until those transactions
    hit the disk and are completed. Under heavy mixed allocation and
    freeing workloads with large log buffers, we can overflow this array
    quite easily.

    Further, the array is sparsely populated, which means that inserts
    need to search for a free slot, and array searches often have to
    search many more slots that are actually used to check all the
    busy extents. Quite inefficient, really.

    To enable this aspect of extent freeing to scale better, we need
    a structure that can grow dynamically. While in other areas of
    XFS we have used radix trees, the extents being freed are at random
    locations on disk so are better suited to being indexed by an rbtree.

    So, use a per-AG rbtree indexed by block number to track busy
    extents. This incures a memory allocation when marking an extent
    busy, but should not occur too often in low memory situations. This
    should scale to an arbitrary number of extents so should not be a
    limitation for features such as in-memory aggregation of
    transactions.

    However, there are still situations where we can't avoid allocating
    busy extents (such as allocation from the AGFL). To minimise the
    overhead of such occurences, we need to avoid doing a synchronous
    log force while holding the AGF locked to ensure that the previous
    transactions are safely on disk before we use the extent. We can do
    this by marking the transaction doing the allocation as synchronous
    rather issuing a log force.

    Because of the locking involved and the ordering of transactions,
    the synchronous transaction provides the same guarantees as a
    synchronous log force because it ensures that all the prior
    transactions are already on disk when the synchronous transaction
    hits the disk. i.e. it preserves the free->allocate order of the
    extent correctly in recovery.

    By doing this, we avoid holding the AGF locked while log writes are
    in progress, hence reducing the length of time the lock is held and
    therefore we increase the rate at which we can allocate and free
    from the allocation group, thereby increasing overall throughput.

    The only problem with this approach is that when a metadata buffer is
    marked stale (e.g. a directory block is removed), then buffer remains
    pinned and locked until the log goes to disk. The issue here is that
    if that stale buffer is reallocated in a subsequent transaction, the
    attempt to lock that buffer in the transaction will hang waiting
    the log to go to disk to unlock and unpin the buffer. Hence if
    someone tries to lock a pinned, stale, locked buffer we need to
    push on the log to get it unlocked ASAP. Effectively we are trading
    off a guaranteed log force for a much less common trigger for log
    force to occur.

    Ideally we should not reallocate busy extents. That is a much more
    complex fix to the problem as it involves direct intervention in the
    allocation btree searches in many places. This is left to a future
    set of modifications.

    Finally, now that we track busy extents in allocated memory, we
    don't need the descriptors in the transaction structure to point to
    them. We can replace the complex busy chunk infrastructure with a
    simple linked list of busy extents. This allows us to remove a large
    chunk of code, making the overall change a net reduction in code
    size.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     

16 Jan, 2010

1 commit

  • Start abstracting the perag references so that the indexing of the
    structures is not directly coded into all the places that uses the
    perag structures. This will allow us to separate the use of the
    perag structure and the way it is indexed and hence avoid the known
    deadlocks related to growing a busy filesystem.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     

15 Dec, 2009

1 commit

  • Convert the old xfs tracing support that could only be used with the
    out of tree kdb and xfsidbg patches to use the generic event tracer.

    To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
    all xfs trace channels by:

    echo 1 > /sys/kernel/debug/tracing/events/xfs/enable

    or alternatively enable single events by just doing the same in one
    event subdirectory, e.g.

    echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable

    or set more complex filters, etc. In Documentation/trace/events.txt
    all this is desctribed in more detail. To reads the events do a

    cat /sys/kernel/debug/tracing/trace

    Compared to the last posting this patch converts the tracing mostly to
    the one tracepoint per callsite model that other users of the new
    tracing facility also employ. This allows a very fine-grained control
    of the tracing, a cleaner output of the traces and also enables the
    perf tool to use each tracepoint as a virtual performance counter,
    allowing us to e.g. count how often certain workloads git various
    spots in XFS. Take a look at

    http://lwn.net/Articles/346470/

    for some examples.

    Also the btree tracing isn't included at all yet, as it will require
    additional core tracing features not in mainline yet, I plan to
    deliver it later.

    And the really nice thing about this patch is that it actually removes
    many lines of code while adding this nice functionality:

    fs/xfs/Makefile | 8
    fs/xfs/linux-2.6/xfs_acl.c | 1
    fs/xfs/linux-2.6/xfs_aops.c | 52 -
    fs/xfs/linux-2.6/xfs_aops.h | 2
    fs/xfs/linux-2.6/xfs_buf.c | 117 +--
    fs/xfs/linux-2.6/xfs_buf.h | 33
    fs/xfs/linux-2.6/xfs_fs_subr.c | 3
    fs/xfs/linux-2.6/xfs_ioctl.c | 1
    fs/xfs/linux-2.6/xfs_ioctl32.c | 1
    fs/xfs/linux-2.6/xfs_iops.c | 1
    fs/xfs/linux-2.6/xfs_linux.h | 1
    fs/xfs/linux-2.6/xfs_lrw.c | 87 --
    fs/xfs/linux-2.6/xfs_lrw.h | 45 -
    fs/xfs/linux-2.6/xfs_super.c | 104 ---
    fs/xfs/linux-2.6/xfs_super.h | 7
    fs/xfs/linux-2.6/xfs_sync.c | 1
    fs/xfs/linux-2.6/xfs_trace.c | 75 ++
    fs/xfs/linux-2.6/xfs_trace.h | 1369 +++++++++++++++++++++++++++++++++++++++++
    fs/xfs/linux-2.6/xfs_vnode.h | 4
    fs/xfs/quota/xfs_dquot.c | 110 ---
    fs/xfs/quota/xfs_dquot.h | 21
    fs/xfs/quota/xfs_qm.c | 40 -
    fs/xfs/quota/xfs_qm_syscalls.c | 4
    fs/xfs/support/ktrace.c | 323 ---------
    fs/xfs/support/ktrace.h | 85 --
    fs/xfs/xfs.h | 16
    fs/xfs/xfs_ag.h | 14
    fs/xfs/xfs_alloc.c | 230 +-----
    fs/xfs/xfs_alloc.h | 27
    fs/xfs/xfs_alloc_btree.c | 1
    fs/xfs/xfs_attr.c | 107 ---
    fs/xfs/xfs_attr.h | 10
    fs/xfs/xfs_attr_leaf.c | 14
    fs/xfs/xfs_attr_sf.h | 40 -
    fs/xfs/xfs_bmap.c | 507 +++------------
    fs/xfs/xfs_bmap.h | 49 -
    fs/xfs/xfs_bmap_btree.c | 6
    fs/xfs/xfs_btree.c | 5
    fs/xfs/xfs_btree_trace.h | 17
    fs/xfs/xfs_buf_item.c | 87 --
    fs/xfs/xfs_buf_item.h | 20
    fs/xfs/xfs_da_btree.c | 3
    fs/xfs/xfs_da_btree.h | 7
    fs/xfs/xfs_dfrag.c | 2
    fs/xfs/xfs_dir2.c | 8
    fs/xfs/xfs_dir2_block.c | 20
    fs/xfs/xfs_dir2_leaf.c | 21
    fs/xfs/xfs_dir2_node.c | 27
    fs/xfs/xfs_dir2_sf.c | 26
    fs/xfs/xfs_dir2_trace.c | 216 ------
    fs/xfs/xfs_dir2_trace.h | 72 --
    fs/xfs/xfs_filestream.c | 8
    fs/xfs/xfs_fsops.c | 2
    fs/xfs/xfs_iget.c | 111 ---
    fs/xfs/xfs_inode.c | 67 --
    fs/xfs/xfs_inode.h | 76 --
    fs/xfs/xfs_inode_item.c | 5
    fs/xfs/xfs_iomap.c | 85 --
    fs/xfs/xfs_iomap.h | 8
    fs/xfs/xfs_log.c | 181 +----
    fs/xfs/xfs_log_priv.h | 20
    fs/xfs/xfs_log_recover.c | 1
    fs/xfs/xfs_mount.c | 2
    fs/xfs/xfs_quota.h | 8
    fs/xfs/xfs_rename.c | 1
    fs/xfs/xfs_rtalloc.c | 1
    fs/xfs/xfs_rw.c | 3
    fs/xfs/xfs_trans.h | 47 +
    fs/xfs/xfs_trans_buf.c | 62 -
    fs/xfs/xfs_vnodeops.c | 8
    70 files changed, 2151 insertions(+), 2592 deletions(-)

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

19 Jan, 2009

1 commit


30 Oct, 2008

21 commits

  • structures.

    Always use the generic xfs_btree_block type instead of the short / long
    structures. Add XFS_BTREE_SBLOCK_LEN / XFS_BTREE_LBLOCK_LEN defines for
    the length of a short / long form block. The rationale for this is that we
    will grow more btree block header variants to support CRCs and other RAS
    information, and always accessing them through the same datatype with
    unions for the short / long form pointers makes implementing this much
    easier.

    SGI-PV: 988146

    SGI-Modid: xfs-linux-melb:xfs-kern:32300a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Donald Douwsma
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Replace the generic record / key / ptr addressing macros that use cpp
    token pasting with simpler macros that do the job for just one given btree
    type. The new macros lose the cur argument and thus can be used outside
    the core btree code, but also gain an xfs_mount * argument to allow for
    checking the CRC flag in the near future. Note that many of these macros
    aren't actually used in the kernel code, but only in userspace (mostly in
    xfs_repair).

    SGI-PV: 988146

    SGI-Modid: xfs-linux-melb:xfs-kern:32295a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Donald Douwsma
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Clean up the way the maximum and minimum records for the btree blocks are
    calculated. For the alloc and inobt btrees all the values are
    pre-calculated in xfs_mount_common, and we switch the current loop around
    the ugly generic macros that use cpp token pasting to generate type names
    to two small helpers in normal C code. For the bmbt and bmdr trees these
    helpers also exist, but can be called during runtime, too. Here we also
    kill various macros dealing with them and inline the logic into the
    get_minrecs / get_maxrecs / get_dmaxrecs methods in xfs_bmap_btree.c.

    Note that all these new helpers take an xfs_mount * argument which will be
    needed to determine the size of a btree block once we add support for
    extended btree blocks with CRCs and other RAS information.

    SGI-PV: 988146

    SGI-Modid: xfs-linux-melb:xfs-kern:32292a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Donald Douwsma
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Add methods to check whether two keys/records are in the righ order. This
    replaces the xfs_btree_check_key and xfs_btree_check_rec methods. For the
    callers from xfs_bmap.c just opencode the bmbt-specific asserts.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32208a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Not really much reason to make it generic given that it's so small, but
    this is the last non-method in xfs_alloc_btree.c and xfs_ialloc_btree.c,
    so it makes the whole btree implementation more structured.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32206a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Make the btree delete code generic. Based on a patch from David Chinner
    with lots of changes to follow the original btree implementations more
    closely. While this loses some of the generic helper routines for
    inserting/moving/removing records it also solves some of the one off bugs
    in the original code and makes it easier to verify.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32205a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • xfs_bmbt_killroot is a mostly generic implementation of moving from a real
    block based root to an inode based root. So move it to xfs_btree.c where
    it can use all the nice infrastructure there and make it pointer size
    agnostic

    The new name for it is xfs_btree_kill_iroot, following the old naming but
    making it clear we're dealing with the root in inode case here, and to
    avoid confusion with xfs_btree_new_root which is used for the not inode
    rooted case. I've also added a comment describing what it does and why
    it's named the way it is.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32203a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Make the btree insert code generic. Based on a patch from David Chinner
    with lots of changes to follow the original btree implementations more
    closely. While this loses some of the generic helper routines for
    inserting/moving/removing records it also solves some of the one off bugs
    in the original code and makes it easier to verify.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32202a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    Add a xfs_btree_new_root helper for the alloc and ialloc btrees. The bmap
    btree needs it's own version and is not converted.

    [hch: split out from bigger patch and minor adaptions]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32200a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Make the btree split code generic. Based on a patch from David Chinner
    with lots of changes to follow the original btree implementations more
    closely. While this loses some of the generic helper routines for
    inserting/moving/removing records it also solves some of the one off bugs
    in the original code and makes it easier to verify.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32198a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Make the btree left shift code generic. Based on a patch from David
    Chinner with lots of changes to follow the original btree implementations
    more closely. While this loses some of the generic helper routines for
    inserting/moving/removing records it also solves some of the one off bugs
    in the original code and makes it easier to verify.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32197a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Make the btree right shift code generic. Based on a patch from David
    Chinner with lots of changes to follow the original btree implementations
    more closely. While this loses some of the generic helper routines for
    inserting/moving/removing records it also solves some of the one off bugs
    in the original code and makes it easier to verify.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32196a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    The most complicated part here is the lastrec tracking for the alloc
    btree. Most logic is in the update_lastrec method which has to do some
    hopefully good enough dirty magic to maintain it.

    [hch: split out from bigger patch and a rework of the lastrec

    logic]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32194a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    Note that there are many > 80 char lines introduced due to the
    xfs_btree_key casts. But the places where this happens is throw-away code
    once the whole btree code gets merged into a common implementation.

    The same is true for the temporary xfs_alloc_log_keys define to the new
    name. All old users will be gone after a few patches.

    [hch: split out from bigger patch and minor adaptions]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32193a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    [hch: split out from bigger patch and minor adaptions]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32192a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    [hch: split out from bigger patch and minor adaptions]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32191a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    Because this is the first major generic btree routine this patch includes
    some infrastrucure, first a few routines to deal with a btree block that
    can be either in short or long form, second xfs_btree_read_buf_block,
    which is the new central routine to read a btree block given a cursor, and
    third the new xfs_btree_ptr_addr routine to calculate the address for a
    given btree pointer record.

    [hch: split out from bigger patch and minor adaptions]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32190a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Add new helpers in xfs_btree.c to find the record, key and block pointer
    entries inside a btree block. To implement this genericly the
    ->get_maxrecs methods and two new xfs_btree_ops entries for the key and
    record sizes are used. Also add a big comment describing how the
    addressing inside a btree block works.

    Note that these helpers are unused until users are introduced in the next
    patches and this patch will thus cause some harmless compiler warnings.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32189a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Factor xfs_btree_maxrecs into a per-btree operation.

    The get_maxrecs method is based on a patch from Dave Chinner.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32188a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Make the existing bmap btree tracing generic so that it applies to all
    btree types.

    Some fragments lifted from a patch by Dave Chinner.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32187a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • xfs_btree_init_cursor contains close to little shared code for the
    different btrees and will get even more non-common code in the future.
    Split it up into one routine per btree type.

    Because xfs_btree_dup_cursor needs to call the init routine for a generic
    btree cursor add a new btree operation vector that contains a dup_cursor
    method that initializes a new cursor based on an existing one.

    The btree operations vector is based on an idea and code from Dave Chinner
    and will grow more entries later during this series.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32176a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     

14 Feb, 2008

1 commit


14 Jul, 2007

1 commit

  • When we have a couple of hundred transactions on the fly at once, they all
    typically modify the on disk superblock in some way.
    create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify
    free block counts.

    When these counts are modified in a transaction, they must eventually lock
    the superblock buffer and apply the mods. The buffer then remains locked
    until the transaction is committed into the incore log buffer. The result
    of this is that with enough transactions on the fly the incore superblock
    buffer becomes a bottleneck.

    The result of contention on the incore superblock buffer is that
    transaction rates fall - the more pressure that is put on the superblock
    buffer, the slower things go.

    The key to removing the contention is to not require the superblock fields
    in question to be locked. We do that by not marking the superblock dirty
    in the transaction. IOWs, we modify the incore superblock but do not
    modify the cached superblock buffer. In short, we do not log superblock
    modifications to critical fields in the superblock on every transaction.
    In fact we only do it just before we write the superblock to disk every
    sync period or just before unmount.

    This creates an interesting problem - if we don't log or write out the
    fields in every transaction, then how do the values get recovered after a
    crash? the answer is simple - we keep enough duplicate, logged information
    in other structures that we can reconstruct the correct count after log
    recovery has been performed.

    It is the AGF and AGI structures that contain the duplicate information;
    after recovery, we walk every AGI and AGF and sum their individual
    counters to get the correct value, and we do a transaction into the log to
    correct them. An optimisation of this is that if we have a clean unmount
    record, we know the value in the superblock is correct, so we can avoid
    the summation walk under normal conditions and so mount/recovery times do
    not change under normal operation.

    One wrinkle that was discovered during development was that the blocks
    used in the freespace btrees are never accounted for in the AGF counters.
    This was once a valid optimisation to make; when the filesystem is full,
    the free space btrees are empty and consume no space. Hence when it
    matters, the "accounting" is correct. But that means the when we do the
    AGF summations, we would not have a correct count and xfs_check would
    complain. Hence a new counter was added to track the number of blocks used
    by the free space btrees. This is an *on-disk format change*.

    As a result of this, lazy superblock counters are a mkfs option and at the
    moment on linux there is no way to convert an old filesystem. This is
    possible - xfs_db can be used to twiddle the right bits and then
    xfs_repair will do the format conversion for you. Similarly, you can
    convert backwards as well. At some point we'll add functionality to
    xfs_admin to do the bit twiddling easily....

    SGI-PV: 964999
    SGI-Modid: xfs-linux-melb:xfs-kern:28652a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     

28 Sep, 2006

3 commits


20 Jun, 2006

1 commit


02 Nov, 2005

1 commit