27 Jul, 2007

1 commit


20 Jul, 2007

5 commits

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     
  • * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
    [XFS] Fix inode size update before data write in xfs_setattr
    [XFS] Allow punching holes to free space when at ENOSPC
    [XFS] Implement ->page_mkwrite in XFS.
    [FS] Implement block_page_mkwrite.

    Manually fix up conflict with Nick's VM fault handling patches in
    fs/xfs/linux-2.6/xfs_file.c

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Change ->fault prototype. We now return an int, which contains
    VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
    FAULT_RET_ code tells the VM whether a page was found, whether it has been
    locked, and potentially other things. This is not quite the way he wanted
    it yet, but that's changed in the next patch (which requires changes to
    arch code).

    This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
    that a page is locked which requires filemap_nopage to go away (because we
    can no longer remain backward compatible without that flag), but we were
    going to do that anyway.

    struct fault_data is renamed to struct vm_fault as Linus asked. address
    is now a void __user * that we should firmly encourage drivers not to use
    without really good reason.

    The page is now returned via a page pointer in the vm_fault struct.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
    the virtual address -> file offset differently from linear mappings.

    ->populate is a layering violation because the filesystem/pagecache code
    should need to know anything about the virtual memory mapping. The hitch here
    is that the ->nopage handler didn't pass down enough information (ie. pgoff).
    But it is more logical to pass pgoff rather than have the ->nopage function
    calculate it itself anyway (because that's a similar layering violation).

    Having the populate handler install the pte itself is likewise a nasty thing
    to be doing.

    This patch introduces a new fault handler that replaces ->nopage and
    ->populate and (later) ->nopfn. Most of the old mechanism is still in place
    so there is a lot of duplication and nice cleanups that can be removed if
    everyone switches over.

    The rationale for doing this in the first place is that nonlinear mappings are
    subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
    to duplicate the synchronisation logic rather than just consolidate the two.

    After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
    pagecache. Seems like a fringe functionality anyway.

    NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no
    users have hit mainline yet.

    [akpm@linux-foundation.org: cleanup]
    [randy.dunlap@oracle.com: doc. fixes for readahead]
    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Nick Piggin
    Signed-off-by: Randy Dunlap
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Fix the race between invalidate_inode_pages and do_no_page.

    Andrea Arcangeli identified a subtle race between invalidation of pages from
    pagecache with userspace mappings, and do_no_page.

    The issue is that invalidation has to shoot down all mappings to the page,
    before it can be discarded from the pagecache. Between shooting down ptes to
    a particular page, and actually dropping the struct page from the pagecache,
    do_no_page from any process might fault on that page and establish a new
    mapping to the page just before it gets discarded from the pagecache.

    The most common case where such invalidation is used is in file truncation.
    This case was catered for by doing a sort of open-coded seqlock between the
    file's i_size, and its truncate_count.

    Truncation will decrease i_size, then increment truncate_count before
    unmapping userspace pages; do_no_page will read truncate_count, then find the
    page if it is within i_size, and then check truncate_count under the page
    table lock and back out and retry if it had subsequently been changed (ptl
    will serialise against unmapping, and ensure a potentially updated
    truncate_count is actually visible).

    Complexity and documentation issues aside, the locking protocol fails in the
    case where we would like to invalidate pagecache inside i_size. do_no_page
    can come in anytime and filemap_nopage is not aware of the invalidation in
    progress (as it is when it is outside i_size). The end result is that
    dangling (->mapping == NULL) pages that appear to be from a particular file
    may be mapped into userspace with nonsense data. Valid mappings to the same
    place will see a different page.

    Andrea implemented two working fixes, one using a real seqlock, another using
    a page->flags bit. He also proposed using the page lock in do_no_page, but
    that was initially considered too heavyweight. However, it is not a global or
    per-file lock, and the page cacheline is modified in do_no_page to increment
    _count and _mapcount anyway, so a further modification should not be a large
    performance hit. Scalability is not an issue.

    This patch implements this latter approach. ->nopage implementations return
    with the page locked if it is possible for their underlying file to be
    invalidated (in that case, they must set a special vm_flags bit to indicate
    so). do_no_page only unlocks the page after setting up the mapping
    completely. invalidation is excluded because it holds the page lock during
    invalidation of each page (and ensures that the page is not mapped while
    holding the lock).

    This also allows significant simplifications in do_no_page, because we have
    the page locked in the right place in the pagecache from the start.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

19 Jul, 2007

3 commits

  • When changing the file size by a truncate() call, we log the change in the
    inode size. However, we do not flush any outstanding data that might not
    have been written to disk, thereby violating the data/inode size update
    order. This can leave files full of NULLs on crash.

    Hence if we are truncating the file, flush any unwritten data that may lie
    between the curret on disk inode size and the new inode size that is being
    logged to ensure that ordering is preserved.

    SGI-PV: 966308
    SGI-Modid: xfs-linux-melb:xfs-kern:29174a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • Make the free file space transaction able to dip into the reserved blocks
    to ensure that we can successfully free blocks when the filesystem is at
    ENOSPC.

    SGI-PV: 967788
    SGI-Modid: xfs-linux-melb:xfs-kern:29167a

    Signed-off-by: David Chinner
    Signed-off-by: Vlad Apostolov
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • Hook XFS up to ->page_mkwrite to ensure that we know about mmap pages
    being written to. This allows use to do correct delayed allocation and
    ENOSPC checking as well as remap unwritten extents so that they get
    converted correctly during writeback. This is done via the generic
    block_page_mkwrite code.

    SGI-PV: 940392
    SGI-Modid: xfs-linux-melb:xfs-kern:29149a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     

18 Jul, 2007

3 commits

  • currently the export_operation structure and helpers related to it are in
    fs.h. fs.h is already far too large and there are very few places needing the
    export bits, so split them off into a separate header.

    [akpm@linux-foundation.org: fix cifs build]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Neil Brown
    Cc: Steven French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Currently, the freezer treats all tasks as freezable, except for the kernel
    threads that explicitly set the PF_NOFREEZE flag for themselves. This
    approach is problematic, since it requires every kernel thread to either
    set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
    care for the freezing of tasks at all.

    It seems better to only require the kernel threads that want to or need to
    be frozen to use some freezer-related code and to remove any
    freezer-related code from the other (nonfreezable) kernel threads, which is
    done in this patch.

    The patch causes all kernel threads to be nonfreezable by default (ie. to
    have PF_NOFREEZE set by default) and introduces the set_freezable()
    function that should be called by the freezable kernel threads in order to
    unset PF_NOFREEZE. It also makes all of the currently freezable kernel
    threads call set_freezable(), so it shouldn't cause any (intentional)
    change of behaviour to appear. Additionally, it updates documentation to
    describe the freezing of tasks more accurately.

    [akpm@linux-foundation.org: build fixes]
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Nigel Cunningham
    Cc: Pavel Machek
    Cc: Oleg Nesterov
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • I can never remember what the function to register to receive VM pressure
    is called. I have to trace down from __alloc_pages() to find it.

    It's called "set_shrinker()", and it needs Your Help.

    1) Don't hide struct shrinker. It contains no magic.
    2) Don't allocate "struct shrinker". It's not helpful.
    3) Call them "register_shrinker" and "unregister_shrinker".
    4) Call the function "shrink" not "shrinker".
    5) Reduce the 17 lines of waffly comments to 13, but document it properly.

    Signed-off-by: Rusty Russell
    Cc: David Chinner
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

14 Jul, 2007

28 commits

  • SGI-PV: 967035
    SGI-Modid: xfs-linux-melb:xfs-kern:29026a

    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • * 32bit struct xfs_fsop_bulkreq has different size and layout of
    members, no matter the alignment. Move the code out of the #else
    branch (why was it there in the first place?). Define _32 variants of
    the ioctl constants.
    * 32bit struct xfs_bstat is different because of time_t and on
    i386 because of different padding. Make xfs_bulkstat_one() accept a
    custom "output formatter" in the private_data argument which takes care
    of the xfs_bulkstat_one_compat() that takes care of the different
    layout in the compat case.
    * i386 struct xfs_inogrp has different padding.
    Add a similar "output formatter" mecanism to xfs_inumbers().

    SGI-PV: 967354
    SGI-Modid: xfs-linux-melb:xfs-kern:29102a

    Signed-off-by: Michal Marek
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Michal Marek
     
  • 32bit struct xfs_fsop_handlereq has different size and offsets (due to
    pointers). TODO: case XFS_IOC_{FSSETDM,ATTRLIST,ATTRMULTI}_BY_HANDLE still
    not handled.

    SGI-PV: 967354
    SGI-Modid: xfs-linux-melb:xfs-kern:29101a

    Signed-off-by: Michal Marek
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Michal Marek
     
  • i386 struct xfs_fsop_geom_v1 has no padding after the last member, so the
    size is different.

    SGI-PV: 967354
    SGI-Modid: xfs-linux-melb:xfs-kern:29100a

    Signed-off-by: Michal Marek
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Michal Marek
     
  • Remove the hardcoded "fnames" for tracing, and just embed them in tracing
    macros via __FUNCTION__. Kills a lot of #ifdefs too.

    SGI-PV: 967353
    SGI-Modid: xfs-linux-melb:xfs-kern:29099a

    Signed-off-by: Eric Sandeen
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Eric Sandeen
     
  • Avoid using a special "zero inode" as the parent of the quota inode as
    this can confuse the filestreams code into thinking the quota inode has a
    parent. We do not want the quota inode to follow filestreams allocation
    rules, so pass a NULL as the parent inode and detect this condition when
    doing stream associations.

    SGI-PV: 964469
    SGI-Modid: xfs-linux-melb:xfs-kern:29098a

    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • In media spaces, video is often stored in a frame-per-file format. When
    dealing with uncompressed realtime HD video streams in this format, it is
    crucial that files do not get fragmented and that multiple files a placed
    contiguously on disk.

    When multiple streams are being ingested and played out at the same time,
    it is critical that the filesystem does not cross the streams and
    interleave them together as this creates seek and readahead cache miss
    latency and prevents both ingest and playout from meeting frame rate
    targets.

    This patch set creates a "stream of files" concept into the allocator to
    place all the data from a single stream contiguously on disk so that RAID
    array readahead can be used effectively. Each additional stream gets
    placed in different allocation groups within the filesystem, thereby
    ensuring that we don't cross any streams. When an AG fills up, we select a
    new AG for the stream that is not in use.

    The core of the functionality is the stream tracking - each inode that we
    create in a directory needs to be associated with the directories' stream.
    Hence every time we create a file, we look up the directories' stream
    object and associate the new file with that object.

    Once we have a stream object for a file, we use the AG that the stream
    object point to for allocations. If we can't allocate in that AG (e.g. it
    is full) we move the entire stream to another AG. Other inodes in the same
    stream are moved to the new AG on their next allocation (i.e. lazy
    update).

    Stream objects are kept in a cache and hold a reference on the inode.
    Hence the inode cannot be reclaimed while there is an outstanding stream
    reference. This means that on unlink we need to remove the stream
    association and we also need to flush all the associations on certain
    events that want to reclaim all unreferenced inodes (e.g. filesystem
    freeze).

    SGI-PV: 964469
    SGI-Modid: xfs-linux-melb:xfs-kern:29096a

    Signed-off-by: David Chinner
    Signed-off-by: Barry Naujok
    Signed-off-by: Donald Douwsma
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin
    Signed-off-by: Vlad Apostolov

    David Chinner
     
  • Appease gcc in regards to "warning: 'rtx' is used uninitialized in
    this function".

    SGI-PV: 907752
    SGI-Modid: xfs-linux-melb:xfs-kern:29007a

    Signed-off-by: Andrew Morton
    Signed-off-by: Tim Shimmin

    Andrew Morton
     
  • A check for file_count is always a bad idea. Linux has the ->release
    method to deal with cleanups on last close and ->flush is only for the
    very rare case where we want to perform an operation on every drop of a
    reference to a file struct.

    This patch gets rid of vop_close and surrounding code in favour of simply
    doing the page flushing from ->release.

    SGI-PV: 966562
    SGI-Modid: xfs-linux-melb:xfs-kern:28952a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Christoph Hellwig
     
  • SGI-PV: 966576
    SGI-Modid: xfs-linux-melb:xfs-kern:28950a

    Signed-off-by: Vignesh Babu
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Vignesh Babu
     
  • SGI-PV: 966505
    SGI-Modid: xfs-linux-melb:xfs-kern:28947a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Christoph Hellwig
     
  • SGI-PV: 964547
    SGI-Modid: xfs-linux-melb:xfs-kern:28945a

    Signed-off-by: David Chinner
    Signed-off-by: Nathan Scott
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • xfs_count_bits is only called once, and is then compared to 0. IOW, what
    it really wants to know is, is the bitmap empty. This can be done more
    simply, certainly.

    SGI-PV: 966503
    SGI-Modid: xfs-linux-melb:xfs-kern:28944a

    Signed-off-by: Eric Sandeen
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Eric Sandeen
     
  • SGI-PV: 966502
    SGI-Modid: xfs-linux-melb:xfs-kern:28943a

    Signed-off-by: Jesper Juhl
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Jesper Juhl
     
  • SGI-PV: 966145
    SGI-Modid: xfs-linux-melb:xfs-kern:28889a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Christoph Hellwig
     
  • The remount readonly path can fail to writeback properly because we still
    have active transactions after calling xfs_quiesce_fs(). Further
    investigation shows that this path is broken in the same ways that the xfs
    freeze path was broken so fix it the same way.

    SGI-PV: 964464
    SGI-Modid: xfs-linux-melb:xfs-kern:28869a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • SGI-PV: 966004
    SGI-Modid: xfs-linux-melb:xfs-kern:28866a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • During delayed allocation extent conversion or unwritten extent
    conversion, we need to reserve some blocks for transactions reservations.
    We need to reserve these blocks in case a btree split occurs and we need
    to allocate some blocks.

    Unfortunately, we've only ever reserved the number of data blocks we are
    allocating, so in both the unwritten and delalloc case we can get ENOSPC
    to the transaction reservation. This is bad because in both cases we
    cannot report the failure to the writing application.

    The fix is two-fold:

    1 - leverage the reserved block infrastructure XFS already
    has to reserve a small pool of blocks by default to allow
    specially marked transactions to dip into when we are at
    ENOSPC.
    Default setting is min(5%, 1024 blocks).

    2 - convert critical transaction reservations to be allowed
    to dip into this pool. Spots changed are delalloc
    conversion, unwritten extent conversion and growing a
    filesystem at ENOSPC.
    This also allows growing the filesytsem to succeed at ENOSPC.

    SGI-PV: 964468
    SGI-Modid: xfs-linux-melb:xfs-kern:28865a

    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • When we are unmounting the filesystem, we flush all the inodes to disk.
    Unfortunately, if we have an inode cluster that has just been freed and
    marked stale sitting in an incore log buffer (i.e. hasn't been flushed to
    disk), it will be holding all the flush locks on the inodes in that
    cluster.

    xfs_iflush_all() which is called during unmount walks all the inodes
    trying to reclaim them, and it doing so calls xfs_finish_reclaim() on each
    inode. If the inode is dirty, if grabs the flush lock and flushes it.
    Unfortunately, find dirty inodes that already have their flush lock held
    and so we sleep.

    At this point in the unmount process, we are running single-threaded.
    There is nothing more that can push on the log to force the transaction
    holding the inode flush locks to disk and hence we deadlock.

    The fix is to issue a log force before flushing the inodes on unmount so
    that all the flush locks will be released before we start flushing the
    inodes.

    SGI-PV: 964538
    SGI-Modid: xfs-linux-melb:xfs-kern:28862a

    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • SGI-PV: 963528
    SGI-Modid: xfs-linux-melb:xfs-kern:28856a

    Signed-off-by: Tim Shimmin
    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig

    Tim Shimmin
     
  • If we have multiple unwritten extents within a single page, we fail to
    tell the I/o completion construction handlers we need a new handle for the
    second and subsequent blocks in the page. While we still issue the I/O
    correctly, we do not have the correct ranges recorded in the ioend
    structures and hence when we go to convert the unwritten extents we screw
    it up.

    Make sure we start a new ioend every time the mapping changes so that we
    convert the correct ranges on I/O completion.

    SGI-PV: 964647
    SGI-Modid: xfs-linux-melb:xfs-kern:28797a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • With the per-cpu superblock counters, batch updates are no longer atomic
    across the entire batch of changes. This is not an issue if each
    individual change in the batch is applied atomically. Unfortunately, free
    block count changes are not applied atomically, and they are applied in a
    manner guaranteed to cause problems.

    Essentially, the free block count reservation that the transaction took
    initially is returned to the in core counters before a second delta takes
    away what is used. because these two operations are not atomic, we can
    race with another thread that can use the returned transaction reservation
    before the transaction takes the space away again and we can then get
    ENOSPC being reported in a spot where we don't have an ENOSPC condition,
    nor should we ever see one there.

    Fix it up by rolling the two deltas into the one so it can be applied
    safely (i.e. atomically) to the incore counters.

    SGI-PV: 964465
    SGI-Modid: xfs-linux-melb:xfs-kern:28796a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • SGI-PV: 965636
    SGI-Modid: xfs-linux-melb:xfs-kern:28777a

    Signed-off-by: David Chinner
    Signed-off-by: Olaf Weber
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • Currently we do not wait on extent conversion to occur, and hence we can
    return to userspace from a synchronous direct I/O write without having
    completed all the actions in the write. Hence a read after the write may
    see zeroes (unwritten extent) rather than the data that was written.

    Block the I/O completion by triggering a synchronous workqueue flush to
    ensure that the conversion has occurred before we return to userspace.

    SGI-PV: 964092
    SGI-Modid: xfs-linux-melb:xfs-kern:28775a

    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • SGI-PV: 965630
    SGI-Modid: xfs-linux-melb:xfs-kern:28774a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • When processing multiple extent maps, xfs_bmapi needs to keep track of the
    extent behind the one it is currently working on to be able to trim extent
    ranges correctly. Failing to update the previous pointer can result in
    corrupted extent lists in memory and this will result in panics or assert
    failures.

    Update the previous pointer correctly when we move to the next extent to
    process.

    SGI-PV: 965631
    SGI-Modid: xfs-linux-melb:xfs-kern:28773a

    Signed-off-by: David Chinner
    Signed-off-by: Vlad Apostolov
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • SGI-PV: 964999
    SGI-Modid: xfs-linux-melb:xfs-kern:28653a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner
     
  • When we have a couple of hundred transactions on the fly at once, they all
    typically modify the on disk superblock in some way.
    create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify
    free block counts.

    When these counts are modified in a transaction, they must eventually lock
    the superblock buffer and apply the mods. The buffer then remains locked
    until the transaction is committed into the incore log buffer. The result
    of this is that with enough transactions on the fly the incore superblock
    buffer becomes a bottleneck.

    The result of contention on the incore superblock buffer is that
    transaction rates fall - the more pressure that is put on the superblock
    buffer, the slower things go.

    The key to removing the contention is to not require the superblock fields
    in question to be locked. We do that by not marking the superblock dirty
    in the transaction. IOWs, we modify the incore superblock but do not
    modify the cached superblock buffer. In short, we do not log superblock
    modifications to critical fields in the superblock on every transaction.
    In fact we only do it just before we write the superblock to disk every
    sync period or just before unmount.

    This creates an interesting problem - if we don't log or write out the
    fields in every transaction, then how do the values get recovered after a
    crash? the answer is simple - we keep enough duplicate, logged information
    in other structures that we can reconstruct the correct count after log
    recovery has been performed.

    It is the AGF and AGI structures that contain the duplicate information;
    after recovery, we walk every AGI and AGF and sum their individual
    counters to get the correct value, and we do a transaction into the log to
    correct them. An optimisation of this is that if we have a clean unmount
    record, we know the value in the superblock is correct, so we can avoid
    the summation walk under normal conditions and so mount/recovery times do
    not change under normal operation.

    One wrinkle that was discovered during development was that the blocks
    used in the freespace btrees are never accounted for in the AGF counters.
    This was once a valid optimisation to make; when the filesystem is full,
    the free space btrees are empty and consume no space. Hence when it
    matters, the "accounting" is correct. But that means the when we do the
    AGF summations, we would not have a correct count and xfs_check would
    complain. Hence a new counter was added to track the number of blocks used
    by the free space btrees. This is an *on-disk format change*.

    As a result of this, lazy superblock counters are a mkfs option and at the
    moment on linux there is no way to convert an old filesystem. This is
    possible - xfs_db can be used to twiddle the right bits and then
    xfs_repair will do the format conversion for you. Similarly, you can
    convert backwards as well. At some point we'll add functionality to
    xfs_admin to do the bit twiddling easily....

    SGI-PV: 964999
    SGI-Modid: xfs-linux-melb:xfs-kern:28652a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    David Chinner