01 Dec, 2008

7 commits

  • Most uses of struct xfs_imap are to map and inode to a buffer. To avoid
    copying around the inode location information we should just embedd a
    strcut xfs_imap into the xfs_inode. To make sure it doesn't bloat an
    inode the im_len is changed to a ushort, which is fine as that's what
    the users exepect anyway.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • xfs_imap is the only caller of xfs_dilocate and doesn't add any significant
    value. Merge the two functions and document the various cases we have for
    inode cluster lookup in the new xfs_imap.

    Also remove the unused im_agblkno and im_ioffset fields from struct xfs_imap
    while we're at it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • We have removed the support for old-style inode items a while ago and
    xlog_recover_do_inode_trans is now only called for XFS_LI_INODE items.
    That means we can remove the call to xfs_imap there and with it the
    XFS_IMAP_LOOKUP that is set by all other callers. We can also mark
    xfs_imap static now.

    (First sent on October 21st)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • These names don't add any value at all over just using the numerical
    values.

    (First sent on October 9th)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • Now that we have a separate xfs_icdinode_t for the in-core inode which
    gets logged there is no need anymore for the xfs_dinode vs xfs_dinode_core
    split - the fact that part of the structure gets logged through the inode
    log item and a small part not can better be described in a comment.

    All sizeof operations on the dinode_core either really wanted the
    icdinode and are switched to that one, or had already added the size
    of the agi unlinked list pointer. Later both will be replaced with
    helpers once we get the larger CRC-enabled dinode.

    Removing the data and attribute fork unions also has the advantage that
    xfs_dinode.h doesn't need to pull in every header under the sun.

    While we're at it also add some more comments describing the dinode
    structure.

    (First sent on October 7th)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • xfs_ialloc_log_di is only used to log the full inode core + di_next_unlinked.
    That means all the offset magic is not nessecary and we can simply use
    xfs_trans_log_buf directly. Also add a comment describing what we should do
    here instead.

    (First sent on October 7th)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • Add a helper to read the AGI header and perform basic verification.
    Based on hunks from a larger patch from Dave Chinner.

    (First sent on Juli 23rd)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     

30 Oct, 2008

8 commits

  • Not really much reason to make it generic given that it's so small, but
    this is the last non-method in xfs_alloc_btree.c and xfs_ialloc_btree.c,
    so it makes the whole btree implementation more structured.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32206a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Make the btree delete code generic. Based on a patch from David Chinner
    with lots of changes to follow the original btree implementations more
    closely. While this loses some of the generic helper routines for
    inserting/moving/removing records it also solves some of the one off bugs
    in the original code and makes it easier to verify.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32205a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Make the btree insert code generic. Based on a patch from David Chinner
    with lots of changes to follow the original btree implementations more
    closely. While this loses some of the generic helper routines for
    inserting/moving/removing records it also solves some of the one off bugs
    in the original code and makes it easier to verify.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32202a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    The most complicated part here is the lastrec tracking for the alloc
    btree. Most logic is in the update_lastrec method which has to do some
    hopefully good enough dirty magic to maintain it.

    [hch: split out from bigger patch and a rework of the lastrec

    logic]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32194a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    [hch: split out from bigger patch and minor adaptions]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32192a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    [hch: split out from bigger patch and minor adaptions]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32191a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    Because this is the first major generic btree routine this patch includes
    some infrastrucure, first a few routines to deal with a btree block that
    can be either in short or long form, second xfs_btree_read_buf_block,
    which is the new central routine to read a btree block given a cursor, and
    third the new xfs_btree_ptr_addr routine to calculate the address for a
    given btree pointer record.

    [hch: split out from bigger patch and minor adaptions]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32190a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • xfs_btree_init_cursor contains close to little shared code for the
    different btrees and will get even more non-common code in the future.
    Split it up into one routine per btree type.

    Because xfs_btree_dup_cursor needs to call the init routine for a generic
    btree cursor add a new btree operation vector that contains a dup_cursor
    method that initializes a new cursor based on an existing one.

    The btree operations vector is based on an idea and code from Dave Chinner
    and will grow more entries later during this series.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32176a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     

29 Apr, 2008

1 commit

  • When we allocation new inode chunks, we initialise the generation numbers
    to zero. This works fine until we delete a chunk and then reallocate it,
    resulting in the same inode numbers but with a reset generation count.
    This can result in inode/generation pairs of different inodes occurring
    relatively close together.

    Given that the inode/gen pair makes up the "unique" portion of an NFS
    filehandle on XFS, this can result in file handles cached on clients being
    seen on the wire from the server but refer to a different file. This
    causes .... issues for NFS clients.

    Hence we need a unique generation number initialisation for each inode to
    prevent reuse of a small portion of the generation number space. Use a
    random number to initialise the generation number so we don't need to keep
    any new state on disk whilst making the new number difficult to guess from
    previous allocations.

    SGI-PV: 979416
    SGI-Modid: xfs-linux-melb:xfs-kern:31001a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    David Chinner
     

18 Apr, 2008

1 commit

  • At ENOSPC, we can get a filesystem shutdown due to a cancelling a dirty
    transaction in xfs_mkdir or xfs_create. This is due to the initial
    allocation attempt not taking into account inode alignment and hence we
    can prepare the AGF freelist for allocation when it's not actually
    possible to do an allocation. This results in inode allocation returning
    ENOSPC with a dirty transaction, and hence we shut down the filesystem.

    Because the first allocation is an exact allocation attempt, we must tell
    the allocator that the alignment does not affect the allocation attempt.
    i.e. we will accept any extent alignment as long as the extent starts at
    the block we want. Unfortunately, this means that if the longest free
    extent is less than the length + alignment necessary for fallback
    allocation attempts but is long enough to attempt a non-aligned
    allocation, we will modify the free list.

    If we then have the exact allocation fail, all other allocation attempts
    will also fail due to the alignment constraint being taken into account.
    Hence the initial attempt needs to set the "alignment slop" field so that
    alignment, while not required, must be taken into account when determining
    if there is enough space left in the AG to do the allocation.

    That means if the exact allocation fails, we will not dirty the freelist
    if there is not enough space available fo a subsequent allocation to
    succeed. Hence we get an ENOSPC error back to userspace without shutting
    down the filesystem.

    SGI-PV: 978886
    SGI-Modid: xfs-linux-melb:xfs-kern:30699a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    David Chinner
     

10 Apr, 2008

1 commit


29 Feb, 2008

1 commit

  • the "ikeep" option is set rather than "noikeep".

    This regression was introduced in 970451.

    With no mount options specified, xfs_parseargs() does the following:

    int ikeep = 0;

    args->flags |= XFSMNT_BARRIER;

    args->flags2 |= XFSMNT2_COMPAT_IOSIZE;

    if (!options)

    goto done;

    It only sets the above two options by default and before, it also used to
    set XFSMNT_IDELETE by default.

    If options are specified, then

    if (!(args->flags & XFSMNT_DMAPI) && !ikeep)

    args->flags |= XFSMNT_IDELETE;

    is executed later on which is skipped by the "goto done;" above.

    The solution is to invert the logic.

    SGI-PV: 977771
    SGI-Modid: xfs-linux-melb:xfs-kern:30590a

    Signed-off-by: Niv Sardi
    Signed-off-by: Barry Naujok
    Signed-off-by: Josef 'Jeff' Sipek
    Signed-off-by: Lachlan McIlroy

    Josef Jeff Sipek
     

14 Feb, 2008

1 commit


15 Oct, 2007

1 commit

  • Biggest bit is duplicating the dinode structure so we have one annotated for
    native endianess and one for disk endianess. The other significant change
    is that xfs_xlate_dinode_core is split into one helper per direction to
    allow for proper annotations, everything else is trivial.

    As a sidenode splitting out the incore dinode means we can move it into
    xfs_inode.h in a later patch and severely improving on the include hell in
    xfs.

    SGI-PV: 968563
    SGI-Modid: xfs-linux-melb:xfs-kern:29476a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Christoph Hellwig
     

14 Jul, 2007

1 commit

  • When we have a couple of hundred transactions on the fly at once, they all
    typically modify the on disk superblock in some way.
    create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify
    free block counts.

    When these counts are modified in a transaction, they must eventually lock
    the superblock buffer and apply the mods. The buffer then remains locked
    until the transaction is committed into the incore log buffer. The result
    of this is that with enough transactions on the fly the incore superblock
    buffer becomes a bottleneck.

    The result of contention on the incore superblock buffer is that
    transaction rates fall - the more pressure that is put on the superblock
    buffer, the slower things go.

    The key to removing the contention is to not require the superblock fields
    in question to be locked. We do that by not marking the superblock dirty
    in the transaction. IOWs, we modify the incore superblock but do not
    modify the cached superblock buffer. In short, we do not log superblock
    modifications to critical fields in the superblock on every transaction.
    In fact we only do it just before we write the superblock to disk every
    sync period or just before unmount.

    This creates an interesting problem - if we don't log or write out the
    fields in every transaction, then how do the values get recovered after a
    crash? the answer is simple - we keep enough duplicate, logged information
    in other structures that we can reconstruct the correct count after log
    recovery has been performed.

    It is the AGF and AGI structures that contain the duplicate information;
    after recovery, we walk every AGI and AGF and sum their individual
    counters to get the correct value, and we do a transaction into the log to
    correct them. An optimisation of this is that if we have a clean unmount
    record, we know the value in the superblock is correct, so we can avoid
    the summation walk under normal conditions and so mount/recovery times do
    not change under normal operation.

    One wrinkle that was discovered during development was that the blocks
    used in the freespace btrees are never accounted for in the AGF counters.
    This was once a valid optimisation to make; when the filesystem is full,
    the free space btrees are empty and consume no space. Hence when it
    matters, the "accounting" is correct. But that means the when we do the
    AGF summations, we would not have a correct count and xfs_check would
    complain. Hence a new counter was added to track the number of blocks used
    by the free space btrees. This is an *on-disk format change*.

    As a result of this, lazy superblock counters are a mkfs option and at the
    moment on linux there is no way to convert an old filesystem. This is
    possible - xfs_db can be used to twiddle the right bits and then
    xfs_repair will do the format conversion for you. Similarly, you can
    convert backwards as well. At some point we'll add functionality to
    xfs_admin to do the bit twiddling easily....

    SGI-PV: 964999
    SGI-Modid: xfs-linux-melb:xfs-kern:28652a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     

10 Feb, 2007

1 commit

  • gcc-4.1 and more recent aggressively inline static functions which
    increases XFS stack usage by ~15% in critical paths. Prevent this from
    occurring by adding noinline to the STATIC definition.

    Also uninline some functions that are too large to be inlined and were
    causing problems with CONFIG_FORCED_INLINING=y.

    Finally, clean up all the different users of inline, __inline and
    __inline__ and put them under one STATIC_INLINE macro. For debug kernels
    the STATIC_INLINE macro uninlines those functions.

    SGI-PV: 957159
    SGI-Modid: xfs-linux-melb:xfs-kern:27585a

    Signed-off-by: David Chinner
    Signed-off-by: David Chatterton
    Signed-off-by: Tim Shimmin

    David Chinner
     

28 Sep, 2006

3 commits


20 Jun, 2006

1 commit


09 Jun, 2006

1 commit


11 Apr, 2006

1 commit


29 Mar, 2006

2 commits


14 Mar, 2006

1 commit


02 Nov, 2005

4 commits


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds