22 Jan, 2010

1 commit

  • Currently we define aliases for the buffer flags in various
    namespaces, which only adds confusion. Remove all but the XBF_
    flags to clean this up a bit.

    Note that we still abuse XFS_B_ASYNC/XBF_ASYNC for some non-buffer
    uses, but I'll clean that up later.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

16 Jan, 2010

2 commits

  • The use of an array for the per-ag structures requires reallocation
    of the array when growing the filesystem. This requires locking
    access to the array to avoid use after free situations, and the
    locking is difficult to get right. To avoid needing to reallocate an
    array, change the per-ag structures to an allocated object per ag
    and index them using a tree structure.

    The AGs are always densely indexed (hence the use of an array), but
    the number supported is 2^32 and lookups tend to be random and hence
    indexing needs to scale. A simple choice is a radix tree - it works
    well with this sort of index. This change also removes another
    large contiguous allocation from the mount/growfs path in XFS.

    The growing process now needs to change to only initialise the new
    AGs required for the extra space, and as such only needs to
    exclusively lock the tree for inserts. The rest of the code only
    needs to lock the tree while doing lookups, and hence this will
    remove all the deadlocks that currently occur on the m_perag_lock as
    it is now an innermost lock. The lock is also changed to a spinlock
    from a read/write lock as the hold time is now extremely short.

    To complete the picture, the per-ag structures will need to be
    reference counted to ensure that we don't free/modify them while
    they are still in use. This will be done in subsequent patch.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • Convert the remaining direct lookups of the per ag structures to use
    get/put accesses. Ensure that the loops across AGs and prior users
    of the interface balance gets and puts correctly.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     

12 Dec, 2009

1 commit

  • Remove our own STATIC_INLINE macro. For small function inside
    implementation files just use STATIC and let gcc inline it, and for
    those in headers do the normal static inline - they are all small
    enough to be inlined for debug builds, too.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

30 Oct, 2009

1 commit

  • Commit bd169565993b39b9b4b102cdac8b13e0a259ce2f seems
    to have a slight regression where this code path:

    if (!--searchdistance) {
    /*
    * Not in range - save last search
    * location and allocate a new inode
    */
    ...
    goto newino;
    }

    doesn't free the temporary cursor (tcur) that got dup'd in
    this function.

    This leaks an item in the xfs_btree_cur zone, and it's caught
    on module unload:

    ===========================================================
    BUG xfs_btree_cur: Objects remaining on kmem_cache_close()
    -----------------------------------------------------------

    It seems like maybe a single free at the end of the function might
    be cleaner, but for now put a del_cursor right in this code block
    similar to the handling in the rest of the function.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Christoph Hellwig

    Eric Sandeen
     

02 Sep, 2009

8 commits

  • xfs_inobt_lookup is also used in xfs_itable.c, remove the STATIC modifier
    from it's declaration to fix non-debug builds.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Felix Blyakher
    Signed-off-by: Felix Blyakher

    Christoph Hellwig
     
  • Don't search too far - abort if it is outside a certain radius and simply do
    a linear search for the first free inode. In AGs with a million inodes this
    can speed up allocation speed by 3-4x.

    [hch: ported to the new xfs_ialloc.c world order]

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Felix Blyakher

    Dave Chinner
     
  • Currenly we have a xfs_inobt_lookup* variant for each comparism direction,
    and all these get all three fields of the inobt records passed, while the
    common case is just looking for the inode number and we have only marginally
    more callers than xfs_inobt_lookup* variants.

    So opencode a direct call to xfs_btree_lookup for the single case where we
    need all fields, and replace xfs_inobt_lookup* with a xfs_inobt_looku that
    just takes the inode number and the direction for all other callers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Felix Blyakher

    Christoph Hellwig
     
  • Clarify the control flow in xfs_dialloc. Factor out a helper to go to the
    next node from the current one and improve the control flow by expanding
    composite if statements and using gotos.

    The xfs_ialloc_next_rec helper is borrowed from Dave Chinners dynamic
    allocation policy patches.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Felix Blyakher

    Christoph Hellwig
     
  • Factor out a common helper from repeated debug checks in xfs_dialloc and
    xfs_difree.

    [hch: split out from Dave's dynamic allocation policy patches]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Felix Blyakher

    Dave Chinner
     
  • Both callers of xfs_inobt_update have the record in form of a
    xfs_inobt_rec_incore_t, so just pass a pointer to it instead of the
    individual variables.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Felix Blyakher

    Christoph Hellwig
     
  • Most callers of xfs_inobt_get_rec need to fill a xfs_inobt_rec_incore_t, and
    those who don't yet are fine with a xfs_inobt_rec_incore_t, instead of the
    three individual variables, too. So just change xfs_inobt_get_rec to write
    the output into a xfs_inobt_rec_incore_t directly.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Felix Blyakher

    Christoph Hellwig
     
  • Factor out code to initialize new inode clusters into a function of it's own.
    This keeps xfs_ialloc_ag_alloc smaller and better structured and enables a
    future inode cluster initialization transaction. Also initialize the agno
    variable earlier in xfs_ialloc_ag_alloc to avoid repeated byte swaps.

    [hch: The original patch is from Dave from his unpublished inode create
    transaction patch series, with some modifcations by me to apply stand-alone]

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Felix Blyakher

    Dave Chinner
     

29 Mar, 2009

1 commit


09 Feb, 2009

1 commit

  • xfs_ialloc_btree.h has a a cuple of macros that only obsfucate the code
    but don't provide any abstraction benefits. This patches removes those
    and cleans up the reamaining defintions up a little.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     

16 Jan, 2009

1 commit


01 Dec, 2008

8 commits

  • Just pass down the XFS_IGET_* flags all the way down to xfs_imap instead
    of translating them mid-way.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • Most uses of struct xfs_imap are to map and inode to a buffer. To avoid
    copying around the inode location information we should just embedd a
    strcut xfs_imap into the xfs_inode. To make sure it doesn't bloat an
    inode the im_len is changed to a ushort, which is fine as that's what
    the users exepect anyway.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • xfs_imap is the only caller of xfs_dilocate and doesn't add any significant
    value. Merge the two functions and document the various cases we have for
    inode cluster lookup in the new xfs_imap.

    Also remove the unused im_agblkno and im_ioffset fields from struct xfs_imap
    while we're at it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • We have removed the support for old-style inode items a while ago and
    xlog_recover_do_inode_trans is now only called for XFS_LI_INODE items.
    That means we can remove the call to xfs_imap there and with it the
    XFS_IMAP_LOOKUP that is set by all other callers. We can also mark
    xfs_imap static now.

    (First sent on October 21st)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • These names don't add any value at all over just using the numerical
    values.

    (First sent on October 9th)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • Now that we have a separate xfs_icdinode_t for the in-core inode which
    gets logged there is no need anymore for the xfs_dinode vs xfs_dinode_core
    split - the fact that part of the structure gets logged through the inode
    log item and a small part not can better be described in a comment.

    All sizeof operations on the dinode_core either really wanted the
    icdinode and are switched to that one, or had already added the size
    of the agi unlinked list pointer. Later both will be replaced with
    helpers once we get the larger CRC-enabled dinode.

    Removing the data and attribute fork unions also has the advantage that
    xfs_dinode.h doesn't need to pull in every header under the sun.

    While we're at it also add some more comments describing the dinode
    structure.

    (First sent on October 7th)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • xfs_ialloc_log_di is only used to log the full inode core + di_next_unlinked.
    That means all the offset magic is not nessecary and we can simply use
    xfs_trans_log_buf directly. Also add a comment describing what we should do
    here instead.

    (First sent on October 7th)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • Add a helper to read the AGI header and perform basic verification.
    Based on hunks from a larger patch from Dave Chinner.

    (First sent on Juli 23rd)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     

30 Oct, 2008

8 commits

  • Not really much reason to make it generic given that it's so small, but
    this is the last non-method in xfs_alloc_btree.c and xfs_ialloc_btree.c,
    so it makes the whole btree implementation more structured.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32206a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Make the btree delete code generic. Based on a patch from David Chinner
    with lots of changes to follow the original btree implementations more
    closely. While this loses some of the generic helper routines for
    inserting/moving/removing records it also solves some of the one off bugs
    in the original code and makes it easier to verify.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32205a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Make the btree insert code generic. Based on a patch from David Chinner
    with lots of changes to follow the original btree implementations more
    closely. While this loses some of the generic helper routines for
    inserting/moving/removing records it also solves some of the one off bugs
    in the original code and makes it easier to verify.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32202a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    The most complicated part here is the lastrec tracking for the alloc
    btree. Most logic is in the update_lastrec method which has to do some
    hopefully good enough dirty magic to maintain it.

    [hch: split out from bigger patch and a rework of the lastrec

    logic]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32194a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    [hch: split out from bigger patch and minor adaptions]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32192a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    [hch: split out from bigger patch and minor adaptions]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32191a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    Because this is the first major generic btree routine this patch includes
    some infrastrucure, first a few routines to deal with a btree block that
    can be either in short or long form, second xfs_btree_read_buf_block,
    which is the new central routine to read a btree block given a cursor, and
    third the new xfs_btree_ptr_addr routine to calculate the address for a
    given btree pointer record.

    [hch: split out from bigger patch and minor adaptions]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32190a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • xfs_btree_init_cursor contains close to little shared code for the
    different btrees and will get even more non-common code in the future.
    Split it up into one routine per btree type.

    Because xfs_btree_dup_cursor needs to call the init routine for a generic
    btree cursor add a new btree operation vector that contains a dup_cursor
    method that initializes a new cursor based on an existing one.

    The btree operations vector is based on an idea and code from Dave Chinner
    and will grow more entries later during this series.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32176a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     

29 Apr, 2008

1 commit

  • When we allocation new inode chunks, we initialise the generation numbers
    to zero. This works fine until we delete a chunk and then reallocate it,
    resulting in the same inode numbers but with a reset generation count.
    This can result in inode/generation pairs of different inodes occurring
    relatively close together.

    Given that the inode/gen pair makes up the "unique" portion of an NFS
    filehandle on XFS, this can result in file handles cached on clients being
    seen on the wire from the server but refer to a different file. This
    causes .... issues for NFS clients.

    Hence we need a unique generation number initialisation for each inode to
    prevent reuse of a small portion of the generation number space. Use a
    random number to initialise the generation number so we don't need to keep
    any new state on disk whilst making the new number difficult to guess from
    previous allocations.

    SGI-PV: 979416
    SGI-Modid: xfs-linux-melb:xfs-kern:31001a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    David Chinner
     

18 Apr, 2008

1 commit

  • At ENOSPC, we can get a filesystem shutdown due to a cancelling a dirty
    transaction in xfs_mkdir or xfs_create. This is due to the initial
    allocation attempt not taking into account inode alignment and hence we
    can prepare the AGF freelist for allocation when it's not actually
    possible to do an allocation. This results in inode allocation returning
    ENOSPC with a dirty transaction, and hence we shut down the filesystem.

    Because the first allocation is an exact allocation attempt, we must tell
    the allocator that the alignment does not affect the allocation attempt.
    i.e. we will accept any extent alignment as long as the extent starts at
    the block we want. Unfortunately, this means that if the longest free
    extent is less than the length + alignment necessary for fallback
    allocation attempts but is long enough to attempt a non-aligned
    allocation, we will modify the free list.

    If we then have the exact allocation fail, all other allocation attempts
    will also fail due to the alignment constraint being taken into account.
    Hence the initial attempt needs to set the "alignment slop" field so that
    alignment, while not required, must be taken into account when determining
    if there is enough space left in the AG to do the allocation.

    That means if the exact allocation fails, we will not dirty the freelist
    if there is not enough space available fo a subsequent allocation to
    succeed. Hence we get an ENOSPC error back to userspace without shutting
    down the filesystem.

    SGI-PV: 978886
    SGI-Modid: xfs-linux-melb:xfs-kern:30699a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    David Chinner
     

10 Apr, 2008

1 commit


29 Feb, 2008

1 commit

  • the "ikeep" option is set rather than "noikeep".

    This regression was introduced in 970451.

    With no mount options specified, xfs_parseargs() does the following:

    int ikeep = 0;

    args->flags |= XFSMNT_BARRIER;

    args->flags2 |= XFSMNT2_COMPAT_IOSIZE;

    if (!options)

    goto done;

    It only sets the above two options by default and before, it also used to
    set XFSMNT_IDELETE by default.

    If options are specified, then

    if (!(args->flags & XFSMNT_DMAPI) && !ikeep)

    args->flags |= XFSMNT_IDELETE;

    is executed later on which is skipped by the "goto done;" above.

    The solution is to invert the logic.

    SGI-PV: 977771
    SGI-Modid: xfs-linux-melb:xfs-kern:30590a

    Signed-off-by: Niv Sardi
    Signed-off-by: Barry Naujok
    Signed-off-by: Josef 'Jeff' Sipek
    Signed-off-by: Lachlan McIlroy

    Josef Jeff Sipek
     

14 Feb, 2008

1 commit


15 Oct, 2007

1 commit

  • Biggest bit is duplicating the dinode structure so we have one annotated for
    native endianess and one for disk endianess. The other significant change
    is that xfs_xlate_dinode_core is split into one helper per direction to
    allow for proper annotations, everything else is trivial.

    As a sidenode splitting out the incore dinode means we can move it into
    xfs_inode.h in a later patch and severely improving on the include hell in
    xfs.

    SGI-PV: 968563
    SGI-Modid: xfs-linux-melb:xfs-kern:29476a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Christoph Hellwig
     

14 Jul, 2007

1 commit

  • When we have a couple of hundred transactions on the fly at once, they all
    typically modify the on disk superblock in some way.
    create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify
    free block counts.

    When these counts are modified in a transaction, they must eventually lock
    the superblock buffer and apply the mods. The buffer then remains locked
    until the transaction is committed into the incore log buffer. The result
    of this is that with enough transactions on the fly the incore superblock
    buffer becomes a bottleneck.

    The result of contention on the incore superblock buffer is that
    transaction rates fall - the more pressure that is put on the superblock
    buffer, the slower things go.

    The key to removing the contention is to not require the superblock fields
    in question to be locked. We do that by not marking the superblock dirty
    in the transaction. IOWs, we modify the incore superblock but do not
    modify the cached superblock buffer. In short, we do not log superblock
    modifications to critical fields in the superblock on every transaction.
    In fact we only do it just before we write the superblock to disk every
    sync period or just before unmount.

    This creates an interesting problem - if we don't log or write out the
    fields in every transaction, then how do the values get recovered after a
    crash? the answer is simple - we keep enough duplicate, logged information
    in other structures that we can reconstruct the correct count after log
    recovery has been performed.

    It is the AGF and AGI structures that contain the duplicate information;
    after recovery, we walk every AGI and AGF and sum their individual
    counters to get the correct value, and we do a transaction into the log to
    correct them. An optimisation of this is that if we have a clean unmount
    record, we know the value in the superblock is correct, so we can avoid
    the summation walk under normal conditions and so mount/recovery times do
    not change under normal operation.

    One wrinkle that was discovered during development was that the blocks
    used in the freespace btrees are never accounted for in the AGF counters.
    This was once a valid optimisation to make; when the filesystem is full,
    the free space btrees are empty and consume no space. Hence when it
    matters, the "accounting" is correct. But that means the when we do the
    AGF summations, we would not have a correct count and xfs_check would
    complain. Hence a new counter was added to track the number of blocks used
    by the free space btrees. This is an *on-disk format change*.

    As a result of this, lazy superblock counters are a mkfs option and at the
    moment on linux there is no way to convert an old filesystem. This is
    possible - xfs_db can be used to twiddle the right bits and then
    xfs_repair will do the format conversion for you. Similarly, you can
    convert backwards as well. At some point we'll add functionality to
    xfs_admin to do the bit twiddling easily....

    SGI-PV: 964999
    SGI-Modid: xfs-linux-melb:xfs-kern:28652a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     

10 Feb, 2007

1 commit

  • gcc-4.1 and more recent aggressively inline static functions which
    increases XFS stack usage by ~15% in critical paths. Prevent this from
    occurring by adding noinline to the STATIC definition.

    Also uninline some functions that are too large to be inlined and were
    causing problems with CONFIG_FORCED_INLINING=y.

    Finally, clean up all the different users of inline, __inline and
    __inline__ and put them under one STATIC_INLINE macro. For debug kernels
    the STATIC_INLINE macro uninlines those functions.

    SGI-PV: 957159
    SGI-Modid: xfs-linux-melb:xfs-kern:27585a

    Signed-off-by: David Chinner
    Signed-off-by: David Chatterton
    Signed-off-by: Tim Shimmin

    David Chinner