Doug / smarc-fsl-linux-kernel | Embedian Git Server

30 May, 2012

1 commit

77ba78776 xfs: switch to proper __bitwise type for KM_... flags ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-05-30 11:28:32 +0800

15 May, 2012

4 commits

ad1e95c54 xfs: clean up xfs_bit.h includes ... Browse Code »

With the removal of xfs_rw.h and other changes over time, xfs_bit.h
is being included in many files that don't actually need it. Clean
up the includes as necessary.

Also move the only-used-once xfs_ialloc_find_free() static inline
function out of a header file that is widely included to reduce
the number of needless dependencies on xfs_bit.h.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-05-15 05:21:00 +0800
4ecbfe637 xfs: clean up busy extent naming ... Browse Code »

Now that the busy extent tracking has been moved out of the
allocation files, clean up the namespace it uses to
"xfs_extent_busy" rather than a mix of "xfs_busy" and
"xfs_alloc_busy".

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-05-15 05:20:56 +0800
efc27b525 xfs: move busy extent handling to it's own file ... Browse Code »

To make it easier to handle userspace code merges, move all the busy
extent handling out of the allocation code and into it's own file.
The userspace code does not need the busy extent code, so this
simplifies the merging of the kernel code into the userspace
xfsprogs library.

Because the busy extent code has been almost completely rewritten
over the past couple of years, also update the copyright on this new
file to include the authors that made all those changes.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-05-15 05:20:55 +0800
60a34607b xfs: move xfsagino_t to xfs_types.h ... Browse Code »

Untangle the header file includes a bit by moving the definition of
xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
xfs_ag.h.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-05-15 05:20:54 +0800

23 Feb, 2012

1 commit

9006fb91c xfs: split and cleanup xfs_log_reserve ... Browse Code »

Split the log regrant case out of xfs_log_reserve into a separate function,
and merge xlog_grant_log_space and xlog_regrant_write_log_space into their
respective callers. Also replace the XFS_LOG_PERM_RESERV flag, which easily
got misused before the previous cleanups with a simple boolean parameter.

Signed-off-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Christoph Hellwig
2012-02-23 12:37:04 +0800

14 Feb, 2012

1 commit

f65020a83 XFS: xfs_trans_add_item() - don't assign in ASSERT() when compare is intended ... Browse Code »

It looks to me like the two ASSERT()s in xfs_trans_add_item() really
want to do a compare (==) rather than assignment (=).
This patch changes it from the latter to the former.

Signed-off-by: Jesper Juhl
Signed-off-by: Ben Myers

(cherry picked from commit 05293485a0b6b1f803e8a3c0ff188c38f6969985)

Jesper Juhl
2012-02-14 07:09:21 +0800

09 Dec, 2011

3 commits

b39342134 xfs: remove the lid_size field in struct log_item_desc ... Browse Code »

Outside the now removed nodelaylog code this field is only used for
asserts and can be safely removed now.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Christoph Hellwig
2011-12-09 03:53:30 +0800
0244b9603 xfs: cleanup the transaction commit path a bit ... Browse Code »

Now that the nodelaylog mode is gone we can simplify the transaction commit
path a bit by removing the xfs_trans_commit_cil routine. Restoring the
process flags is merged into xfs_trans_commit which already does it for
the error path, and allocating the log vectors is merged into
xlog_cil_format_items, which already fills them with data, thus avoiding
one loop over all log items.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Christoph Hellwig
2011-12-09 03:53:30 +0800
93b8a5854 xfs: remove the deprecated nodelaylog option ... Browse Code »

The delaylog mode has been the default for a long time, and the nodelaylog
option has been scheduled for removal in Linux 3.3. Remove it and code
only used by it now that we have opened the 3.3 window.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Christoph Hellwig
2011-12-09 02:30:32 +0800

12 Oct, 2011

2 commits

ddc3415ab xfs: simplify xfs_trans_ijoin* again ... Browse Code »

There is no reason to keep a reference to the inode even if we unlock
it during transaction commit because we never drop a reference between
the ijoin and commit. Also use this fact to merge xfs_trans_ijoin_ref
back into xfs_trans_ijoin - the third argument decides if an unlock
is needed now.

I'm actually starting to wonder if allowing inodes to be unlocked
at transaction commit really is worth the effort. The only real
benefit is that they can be unlocked earlier when commiting a
synchronous transactions, but that could be solved by doing the
log force manually after the unlock, too.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2011-10-12 10:15:08 +0800
b10370585 xfs: unlock the inode before log force in xfs_fsync ... Browse Code »

Only read the LSN we need to push to with the ilock held, and then release
it before we do the log force to improve concurrency.

This also removes the only direct caller of _xfs_trans_commit, thus
allowing it to be merged into the plain xfs_trans_commit again.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2011-10-12 10:15:08 +0800

21 Jul, 2011

1 commit

1d8c95a36 xfs: use a cursor for bulk AIL insertion ... Browse Code »

Delayed logging can insert tens of thousands of log items into the
AIL at the same LSN. When the committing of log commit records
occur, we can get insertions occurring at an LSN that is not at the
end of the AIL. If there are thousands of items in the AIL on the
tail LSN, each insertion has to walk the AIL to find the correct
place to insert the new item into the AIL. This can consume large
amounts of CPU time and block other operations from occurring while
the traversals are in progress.

To avoid this repeated walk, use a AIL cursor to record
where we should be inserting the new items into the AIL without
having to repeat the walk. The cursor infrastructure already
provides this functionality for push walks, so is a simple extension
of existing code. While this will not avoid the initial walk, it
will avoid repeating it tens of thousands of times during a single
checkpoint commit.

This version includes logic improvements from Christoph Hellwig.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2011-07-21 07:37:20 +0800

11 Jul, 2011

1 commit

b2ce39740 Revert "xfs: fix filesystsem freeze race in xfs_trans_alloc" ... Browse Code »

This reverts commit 7a249cf83da1813cfa71cfe1e265b40045eceb47.

That commit created a situation that could lead to a filesystem
hang. As Dave Chinner pointed out, xfs_trans_alloc() could hold a
reference to m_active_trans (i.e., keep it non-zero) and then wait
for SB_FREEZE_TRANS to complete. Meanwhile a filesystem freeze
request could set SB_FREEZE_TRANS and then wait for m_active_trans
to drop to zero. Nobody benefits from this sequence of events...

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Alex Elder
2011-07-11 23:21:03 +0800

08 Jul, 2011

1 commit

7a249cf83 xfs: fix filesystsem freeze race in xfs_trans_alloc ... Browse Code »

As pointed out by Jan xfs_trans_alloc can race with a concurrent filesystem
freeze when it sleeps during the memory allocation. Fix this by moving the
wait_for_freeze call after the memory allocation. This means moving the
freeze into the low-level _xfs_trans_alloc helper, which thus grows a new
argument. Also fix up some comments in that area while at it.

Signed-off-by: Christoph Hellwig
Reviewed-by: Alex Elder
Reviewed-by: Dave Chinner

Christoph Hellwig
2011-07-08 20:34:42 +0800

07 Jul, 2011

1 commit

1316d4da3 xfs: unpin stale inodes directly in IOP_COMMITTED ... Browse Code »

When inodes are marked stale in a transaction, they are treated
specially when the inode log item is being inserted into the AIL.
It tries to avoid moving the log item forward in the AIL due to a
race condition with the writing the underlying buffer back to disk.
The was "fixed" in commit de25c18 ("xfs: avoid moving stale inodes
in the AIL").

To avoid moving the item forward, we return a LSN smaller than the
commit_lsn of the completing transaction, thereby trying to trick
the commit code into not moving the inode forward at all. I'm not
sure this ever worked as intended - it assumes the inode is already
in the AIL, but I don't think the returned LSN would have been small
enough to prevent moving the inode. It appears that the reason it
worked is that the lower LSN of the inodes meant they were inserted
into the AIL and flushed before the inode buffer (which was moved to
the commit_lsn of the transaction).

The big problem is that with delayed logging, the returning of the
different LSN means insertion takes the slow, non-bulk path. Worse
yet is that insertion is to a position -before- the commit_lsn so it
is doing a AIL traversal on every insertion, and has to walk over
all the items that have already been inserted into the AIL. It's
expensive.

To compound the matter further, with delayed logging inodes are
likely to go from clean to stale in a single checkpoint, which means
they aren't even in the AIL at all when we come across them at AIL
insertion time. Hence these were all getting inserted into the AIL
when they simply do not need to be as inodes marked XFS_ISTALE are
never written back.

Transactional/recovery integrity is maintained in this case by the
other items in the unlink transaction that were modified (e.g. the
AGI btree blocks) and committed in the same checkpoint.

So to fix this, simply unpin the stale inodes directly in
xfs_inode_item_committed() and return -1 to indicate that the AIL
insertion code does not need to do any further processing of these
inodes.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2011-07-07 04:44:40 +0800

25 May, 2011

1 commit

e84661aa8 xfs: add online discard support ... Browse Code »

Now that we have reliably tracking of deleted extents in a
transaction we can easily implement "online" discard support
which calls blkdev_issue_discard once a transaction commits.

The actual discard is a two stage operation as we first have
to mark the busy extent as not available for reuse before we
can start the actual discard. Note that we don't bother
supporting discard for the non-delaylog mode.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2011-05-25 00:17:13 +0800

29 Apr, 2011

1 commit

8a072a4d4 xfs: reduce the number of pagb_lock roundtrips in xfs_alloc_clear_busy ... Browse Code »

Instead of finding the per-ag and then taking and releasing the pagb_lock
for every single busy extent completed sort the list of busy extents and
only switch betweens AGs where nessecary. This becomes especially important
with the online discard support which will hit this lock more often.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2011-04-29 02:18:09 +0800

28 Jan, 2011

2 commits

c6f990d1f xfs: handle CIl transaction commit failures correctly ... Browse Code »

Failure to commit a transaction into the CIL is not handled
correctly. This currently can only happen when racing with a
shutdown and requires an explicit shutdown check, so it rare and can
be avoided. Remove the shutdown check and make the CIL commit a void
function to indicate it will always succeed, thereby removing the
incorrectly handled failure case.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder

Dave Chinner
2011-01-28 23:05:36 +0800
e34a314c5 xfs: fix efi item leak on forced shutdown ... Browse Code »

After test 139, kmemleak shows:

unreferenced object 0xffff880078b405d8 (size 400):
comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
hex dump (first 32 bytes):
60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff `..y....`..y....
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmemleak_alloc+0x2d/0x60
[] kmem_cache_alloc+0x13f/0x2b0
[] kmem_zone_alloc+0x77/0xf0
[] kmem_zone_zalloc+0x1e/0x50
[] xfs_efi_init+0x4b/0xb0
[] xfs_trans_get_efi+0x58/0x90
[] xfs_bmap_finish+0x8b/0x1d0
[] xfs_itruncate_finish+0x2c4/0x5d0
[] xfs_setattr+0x8df/0xa70
[] xfs_vn_setattr+0x1b/0x20
[] notify_change+0x170/0x2e0
[] do_truncate+0x66/0xa0
[] sys_ftruncate+0xdb/0xe0
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff

The cause of the leak is that the "remove" parameter of IOP_UNPIN()
is never set when a CIL push is aborted. This means that the EFI
item is never freed if it was in the push being cancelled. The
problem is specific to delayed logging, but has uncovered a couple
of problems with the handling of IOP_UNPIN(remove).

Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN()
in the CIL commit failure path or the iclog write failure path
because for delayed loging we have no transaction context. Hence we
must only call xfs_trans_del_item() if the log item being unpinned
has an active log item descriptor.

Secondly, xfs_trans_uncommit() does not handle log item descriptor
freeing during the traversal of log items on a transaction. It can
reference a freed log item descriptor when unpinning an EFI item.
Hence it needs to use a safe list traversal method to allow items to
be removed from the transaction during IOP_UNPIN().

Signed-off-by: Dave Chinner
Reviewed-by: Alex Elder

Dave Chinner
2011-01-28 23:01:33 +0800

12 Jan, 2011

1 commit

1884bd835 xfs: fix an assignment within an ASSERT() ... Browse Code »

In fs/xfs/xfs_trans.c::xfs_trans_unreserve_and_mod_sb() at the out:
label we have this:
ASSERT(error = 0);
I believe a comparison was intended, not an assignment. If I'm
right, the patch below fixes that up.

Signed-off-by: Jesper Juhl
Signed-off-by: Alex Elder

Jesper Juhl
2011-01-12 12:29:13 +0800

20 Dec, 2010

1 commit

0e57f6a36 xfs: bulk AIL insertion during transaction commit ... Browse Code »

When inserting items into the AIL from the transaction committed
callbacks, we take the AIL lock for every single item that is to be
inserted. For a CIL checkpoint commit, this can be tens of thousands
of individual inserts, yet almost all of the items will be inserted
at the same point in the AIL because they have the same index.

To reduce the overhead and contention on the AIL lock for such
operations, introduce a "bulk insert" operation which allows a list
of log items with the same LSN to be inserted in a single operation
via a list splice. To do this, we need to pre-sort the log items
being committed into a temporary list for insertion.

The complexity is that not every log item will end up with the same
LSN, and not every item is actually inserted into the AIL. Items
that don't match the commit LSN will be inserted and unpinned as per
the current one-at-a-time method (relatively rare), while items that
are not to be inserted will be unpinned and freed immediately. Items
that are to be inserted at the given commit lsn are placed in a
temporary array and inserted into the AIL in bulk each time the
array fills up.

As a result of this, we trade off AIL hold time for a significant
reduction in traffic. lock_stat output shows that the worst case
hold time is unchanged, but contention from AIL inserts drops by an
order of magnitude and the number of lock traversal decreases
significantly.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-12-20 09:02:19 +0800

19 Oct, 2010

4 commits

4957a449a xfs: fix the xfs_trans_committed ... Browse Code »

Use the correct prototype for xfs_trans_committed instead of casting it.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-10-19 04:08:02 +0800
dfe188d42 xfs: remove unused t_callback field in struct xfs_trans ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-10-19 04:08:01 +0800
1b0407125 xfs: do not use xfs_mod_incore_sb_batch for per-cpu counters ... Browse Code »

Update the per-cpu counters manually in xfs_trans_unreserve_and_mod_sb
and remove support for per-cpu counters from xfs_mod_incore_sb_batch
to simplify it. And added benefit is that we don't have to take
m_sb_lock for transactions that only modify per-cpu counters.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-10-19 04:08:00 +0800
96540c785 xfs: do not use xfs_mod_incore_sb for per-cpu counters ... Browse Code »

Export xfs_icsb_modify_counters and always use it for modifying
the per-cpu counters. Remove support for per-cpu counters from
xfs_mod_incore_sb to simplify it.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-10-19 04:07:59 +0800

24 Aug, 2010

1 commit

d17c701ce xfs: unlock items before allowing the CIL to commit ... Browse Code »

When we commit a transaction using delayed logging, we need to
unlock the items in the transaciton before we unlock the CIL context
and allow it to be checkpointed. If we unlock them after we release
the CIl context lock, the CIL can checkpoint and complete before
we free the log items. This breaks stale buffer item unlock and
unpin processing as there is an implicit assumption that the unlock
will occur before the unpin.

Also, some log items need to store the LSN of the transaction commit
in the item (inodes and EFIs) and so can race with other transaction
completions if we don't prevent the CIL from checkpointing before
the unlock occurs.

Cc:
Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-08-24 09:42:52 +0800

27 Jul, 2010

6 commits

438697064 xfs: fix xfs_trans_add_item() lockdep warnings ... Browse Code »

xfs_trans_add_item() is called with ip->i_ilock held, which means it
is unsafe for memory reclaim to recurse back into the filesystem
(ilock is required in writeback). Hence the allocation needs to be
KM_NOFS to avoid recursion.

Lockdep report indicating memory allocation being called with the
ip->i_ilock held is as follows:

[ 1749.866796] =================================
[ 1749.867788] [ INFO: inconsistent lock state ]
[ 1749.868327] 2.6.35-rc3-dgc+ #25
[ 1749.868741] ---------------------------------
[ 1749.868741] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
[ 1749.868741] dd/2835 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1749.868741] (&(&ip->i_lock)->mr_lock){++++?.}, at: [] xfs_ilock+0x10b/0x190
[ 1749.868741] {IN-RECLAIM_FS-W} state was registered at:
[ 1749.868741] [] __lock_acquire+0x437/0x1450
[ 1749.868741] [] lock_acquire+0xa6/0x160
[ 1749.868741] [] down_write_nested+0x65/0xb0
[ 1749.868741] [] xfs_ilock+0x10b/0x190
[ 1749.868741] [] xfs_reclaim_inode+0x99/0x310
[ 1749.868741] [] xfs_inode_ag_walk+0x8b/0x150
[ 1749.868741] [] xfs_inode_ag_iterator+0x8b/0xf0
[ 1749.868741] [] xfs_reclaim_inode_shrink+0x88/0x90
[ 1749.868741] [] shrink_slab+0x137/0x1a0
[ 1749.868741] [] balance_pgdat+0x421/0x6a0
[ 1749.868741] [] kswapd+0x11d/0x320
[ 1749.868741] [] kthread+0x96/0xa0
[ 1749.868741] [] kernel_thread_helper+0x4/0x10
[ 1749.868741] irq event stamp: 4234335
[ 1749.868741] hardirqs last enabled at (4234335): [] kmem_cache_free+0x115/0x220
[ 1749.868741] hardirqs last disabled at (4234334): [] kmem_cache_free+0x3d/0x220
[ 1749.868741] softirqs last enabled at (4233112): [] __do_softirq+0x142/0x260
[ 1749.868741] softirqs last disabled at (4233095): [] call_softirq+0x1c/0x50
[ 1749.868741]
[ 1749.868741] other info that might help us debug this:
[ 1749.868741] 2 locks held by dd/2835:
[ 1749.868741] #0: (&(&ip->i_iolock)->mr_lock#2){+.+.+.}, at: [] xfs_ilock_nowait+0xed/0x200
[ 1749.868741] #1: (&(&ip->i_lock)->mr_lock){++++?.}, at: [] xfs_ilock+0x10b/0x190
[ 1749.868741]
[ 1749.868741] stack backtrace:
[ 1749.868741] Pid: 2835, comm: dd Not tainted 2.6.35-rc3-dgc+ #25
[ 1749.868741] Call Trace:
[ 1749.868741] [] print_usage_bug+0x18a/0x190
[ 1749.868741] [] ? save_stack_trace+0x2f/0x50
[ 1749.868741] [] ? check_usage_backwards+0x0/0xf0
[ 1749.868741] [] mark_lock+0x331/0x400
[ 1749.868741] [] mark_held_locks+0x67/0x90
[ 1749.868741] [] lockdep_trace_alloc+0xa1/0xe0
[ 1749.868741] [] kmem_cache_alloc+0x39/0x1e0
[ 1749.868741] [] kmem_zone_alloc+0x94/0xe0
[ 1749.868741] [] kmem_zone_zalloc+0x1e/0x50
[ 1749.868741] [] xfs_trans_add_item+0x72/0xb0
[ 1749.868741] [] xfs_trans_ijoin+0xa1/0xd0
[ 1749.868741] [] xfs_itruncate_finish+0x312/0x5d0
[ 1749.868741] [] xfs_free_eofblocks+0x227/0x280
[ 1749.868741] [] xfs_release+0x138/0x190
[ 1749.868741] [] xfs_file_release+0x15/0x20
[ 1749.868741] [] fput+0x13f/0x260
[ 1749.868741] [] filp_close+0x52/0x80
[ 1749.868741] [] sys_close+0xb9/0x120
[ 1749.868741] [] system_call_fastpath+0x16/0x1b

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Alex Elder
Signed-off-by: Dave Chinner

Dave Chinner
2010-07-27 02:16:49 +0800
898621d5a xfs: simplify inode to transaction joining ... Browse Code »

Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
joining it to a transaction via xfs_trans_ijoin.

This patches instead makes xfs_trans_ijoin usable on it's own by doing
an implicity xfs_trans_ihold, which also allows us to drop the third
argument. For the case where we want to hold a reference on the inode
a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
the inode for needing an xfs_iput. In addition to the cleaner interface
to the caller this also simplifies the implementation.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:36 +0800
9412e3181 xfs: merge iop_unpin_remove into iop_unpin ... Browse Code »

The unpin_remove item operation instances always share most of the
implementation with the respective unpin implementation. So instead
of keeping two different entry points add a remove flag to the unpin
operation and share the code more easily.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:34 +0800
e98c414f9 xfs: simplify log item descriptor tracking ... Browse Code »

Currently we track log item descriptor belonging to a transaction using a
complex opencoded chunk allocator. This code has been there since day one
and seems to work around the lack of an efficient slab allocator.

This patch replaces it with dynamically allocated log item descriptors
from a dedicated slab pool, linked to the transaction by a linked list.

This allows to greatly simplify the log item descriptor tracking to the
point where it's just a couple hundred lines in xfs_trans.c instead of
a separate file. The external API has also been simplified while we're
at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/
delete items from a transaction have been simplified to the bare minium,
and the xfs_trans_find_item function is replaced with a direct dereference
of the li_desc field. All debug code walking the list of log items in
a transaction is down to a simple list_for_each_entry.

Note that we could easily use a singly linked list here instead of the
double linked list from list.h as the fastpath only does deletion from
sequential traversal. But given that we don't have one available as
a library function yet I use the list.h functions for simplicity.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:34 +0800
3400777ff xfs: remove unneeded #include statements ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:33 +0800
288699fec xfs: drop dmapi hooks ... Browse Code »

Dmapi support was never merged upstream, but we still have a lot of hooks
bloating XFS for it, all over the fast pathes of the filesystem.

This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM
support in mainline at least the namespace events can be done much saner
in the VFS instead of the individual filesystem, so it's not like this
is much help for future work.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-07-27 02:16:33 +0800

29 May, 2010

1 commit

025101dca xfs: cleanup log reservation calculactions ... Browse Code »

Instead of having small helper functions calling big macros do the
calculations for the log reservations directly in the functions.
These are mostly 1:1 from the macros execept that the macros kept
the quota calculations in their callers.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2010-05-29 03:58:30 +0800

24 May, 2010

2 commits

71e330b59 xfs: Introduce delayed logging core code ... Browse Code »

The delayed logging code only changes in-memory structures and as
such can be enabled and disabled with a mount option. Add the mount
option and emit a warning that this is an experimental feature that
should not be used in production yet.

We also need infrastructure to track committed items that have not
yet been written to the log. This is what the Committed Item List
(CIL) is for.

The log item also needs to be extended to track the current log
vector, the associated memory buffer and it's location in the Commit
Item List. Extend the log item and log vector structures to enable
this tracking.

To maintain the current log format for transactions with delayed
logging, we need to introduce a checkpoint transaction and a context
for tracking each checkpoint from initiation to transaction
completion. This includes adding a log ticket for tracking space
log required/used by the context checkpoint.

To track all the changes we need an io vector array per log item,
rather than a single array for the entire transaction. Using the new
log vector structure for this requires two passes - the first to
allocate the log vector structures and chain them together, and the
second to fill them out. This log vector chain can then be passed
to the CIL for formatting, pinning and insertion into the CIL.

Formatting of the log vector chain is relatively simple - it's just
a loop over the iovecs on each log vector, but it is made slightly
more complex because we re-write the iovec after the copy to point
back at the memory buffer we just copied into.

This code also needs to pin log items. If the log item is not
already tracked in this checkpoint context, then it needs to be
pinned. Otherwise it is already pinned and we don't need to pin it
again.

The only other complexity is calculating the amount of new log space
the formatting has consumed. This needs to be accounted to the
transaction in progress, and the accounting is made more complex
becase we need also to steal space from it for log metadata in the
checkpoint transaction. Calculate all this at insert time and update
all the tickets, counters, etc correctly.

Once we've formatted all the log items in the transaction, attach
the busy extents to the checkpoint context so the busy extents live
until checkpoint completion and can be processed at that point in
time. Transactions can then be freed at this point in time.

Now we need to issue checkpoints - we are tracking the amount of log space
used by the items in the CIL, so we can trigger background checkpoints when the
space usage gets to a certain threshold. Otherwise, checkpoints need ot be
triggered when a log synchronisation point is reached - a log force event.

Because the log write code already handles chained log vectors, writing the
transaction is trivial, too. Construct a transaction header, add it
to the head of the chain and write it into the log, then issue a
commit record write. Then we can release the checkpoint log ticket
and attach the context to the log buffer so it can be called during
Io completion to complete the checkpoint.

We also need to allow for synchronising multiple in-flight
checkpoints. This is needed for two things - the first is to ensure
that checkpoint commit records appear in the log in the correct
sequence order (so they are replayed in the correct order). The
second is so that xfs_log_force_lsn() operates correctly and only
flushes and/or waits for the specific sequence it was provided with.

To do this we need a wait variable and a list tracking the
checkpoint commits in progress. We can walk this list and wait for
the checkpoints to change state or complete easily, an this provides
the necessary synchronisation for correct operation in both cases.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-05-24 23:38:03 +0800
ed3b4d6cd xfs: Improve scalability of busy extent tracking ... Browse Code »

When we free a metadata extent, we record it in the per-AG busy
extent array so that it is not re-used before the freeing
transaction hits the disk. This array is fixed size, so when it
overflows we make further allocation transactions synchronous
because we cannot track more freed extents until those transactions
hit the disk and are completed. Under heavy mixed allocation and
freeing workloads with large log buffers, we can overflow this array
quite easily.

Further, the array is sparsely populated, which means that inserts
need to search for a free slot, and array searches often have to
search many more slots that are actually used to check all the
busy extents. Quite inefficient, really.

To enable this aspect of extent freeing to scale better, we need
a structure that can grow dynamically. While in other areas of
XFS we have used radix trees, the extents being freed are at random
locations on disk so are better suited to being indexed by an rbtree.

So, use a per-AG rbtree indexed by block number to track busy
extents. This incures a memory allocation when marking an extent
busy, but should not occur too often in low memory situations. This
should scale to an arbitrary number of extents so should not be a
limitation for features such as in-memory aggregation of
transactions.

However, there are still situations where we can't avoid allocating
busy extents (such as allocation from the AGFL). To minimise the
overhead of such occurences, we need to avoid doing a synchronous
log force while holding the AGF locked to ensure that the previous
transactions are safely on disk before we use the extent. We can do
this by marking the transaction doing the allocation as synchronous
rather issuing a log force.

Because of the locking involved and the ordering of transactions,
the synchronous transaction provides the same guarantees as a
synchronous log force because it ensures that all the prior
transactions are already on disk when the synchronous transaction
hits the disk. i.e. it preserves the free->allocate order of the
extent correctly in recovery.

By doing this, we avoid holding the AGF locked while log writes are
in progress, hence reducing the length of time the lock is held and
therefore we increase the rate at which we can allocate and free
from the allocation group, thereby increasing overall throughput.

The only problem with this approach is that when a metadata buffer is
marked stale (e.g. a directory block is removed), then buffer remains
pinned and locked until the log goes to disk. The issue here is that
if that stale buffer is reallocated in a subsequent transaction, the
attempt to lock that buffer in the transaction will hang waiting
the log to go to disk to unlock and unpin the buffer. Hence if
someone tries to lock a pinned, stale, locked buffer we need to
push on the log to get it unlocked ASAP. Effectively we are trading
off a guaranteed log force for a much less common trigger for log
force to occur.

Ideally we should not reallocate busy extents. That is a much more
complex fix to the problem as it involves direct intervention in the
allocation btree searches in many places. This is left to a future
set of modifications.

Finally, now that we track busy extents in allocated memory, we
don't need the descriptors in the transaction structure to point to
them. We can replace the complex busy chunk infrastructure with a
simple linked list of busy extents. This allows us to remove a large
chunk of code, making the overall change a net reduction in code
size.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder

Dave Chinner
2010-05-24 23:34:00 +0800

19 May, 2010

4 commits

b1c1b5b61 xfs: Clean up xfs_trans_committed code after factoring ... Browse Code »

Now that the code has been factored, clean up all the remaining
style cruft, simplify the code and re-order functions so that it
doesn't need forward declarations.

Also move the remaining functions that require forward declarations
(xfs_trans_uncommit, xfs_trans_free) so that all the forward
declarations can be removed from the file.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-05-19 22:58:09 +0800
8e646a55a xfs: update and factor xfs_trans_committed() ... Browse Code »

The function header to xfs-trans_committed has long had this
comment:

* THIS SHOULD BE REWRITTEN TO USE xfs_trans_next_item()

To prepare for different methods of committing items, convert the
code to use xfs_trans_next_item() and factor the code into smaller,
more digestible chunks.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-05-19 22:58:09 +0800
a3ccd2ca4 xfs: clean up xfs_trans_commit logic even more ... Browse Code »

> +shut_us_down:
> + shutdown = XFS_FORCED_SHUTDOWN(mp) ? EIO : 0;
> + if (!(tp->t_flags & XFS_TRANS_DIRTY) || shutdown) {
> + xfs_trans_unreserve_and_mod_sb(tp);
> + /*

This whole area in _xfs_trans_commit is still a complete mess.

So while touching this code, unravel this mess as well to make the
whole flow of the function simpler and clearer.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner

Christoph Hellwig
2010-05-19 22:58:08 +0800
0924378a6 xfs: split out iclog writing from xfs_trans_commit() ... Browse Code »

Split the the part of xfs_trans_commit() that deals with writing the
transaction into the iclog into a separate function. This isolates the
physical commit process from the logical commit operation and makes
it easier to insert different transaction commit paths without affecting
the existing algorithm adversely.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig

Dave Chinner
2010-05-19 22:58:08 +0800