27 Jul, 2010
8 commits
-
The b_strat callback is used by xfs_buf_iostrategy to perform additional
checks before submitting a buffer. It is used in xfs_bwrite and when
writing out delayed buffers. In xfs_bwrite it we can de-virtualize the
call easily as b_strat is set a few lines above the call to
xfs_buf_iostrategy. For the delayed buffers the rationale is a bit
more complicated:- there are three callers of xfs_buf_delwri_queue, which places buffers
on the delwri list:
(1) xfs_bdwrite - this sets up b_strat, so it's fine
(2) xfs_buf_iorequest. None of the callers can have XBF_DELWRI set:
- xlog_bdstrat is only used for log buffers, which are never delwri
- _xfs_buf_read explicitly clears the delwri flag
- xfs_buf_iodone_work retries log buffers only
- xfsbdstrat - only used for reads, superblock writes without the
delwri flag, log I/O and file zeroing with explicitly allocated
buffers.
- xfs_buf_iostrategy - only calls xfs_buf_iorequest if b_strat is
not set
(3) xfs_buf_unlock
- only puts the buffer on the delwri list if the DELWRI flag is
already set. The DELWRI flag is only ever set in xfs_bwrite,
xfs_buf_iodone_callbacks, or xfs_trans_log_buf. For
xfs_buf_iodone_callbacks and xfs_trans_log_buf we require
an initialized buf item, which means b_strat was set to
xfs_bdstrat_cb in xfs_buf_item_init.Conclusion: we can just get rid of the callback and replace it with
explicit calls to xfs_bdstrat_cb.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner -
By making this member a void pointer we can get rid of a lot of pointless
casts.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner -
Get rid of the xfs_buf_pin/xfs_buf_unpin/xfs_buf_ispin helpers and opencode
them in their only callers, just like we did for the inode pinning a while
ago. Also remove duplicate trace points - the bufitem tracepoints cover
all the information that is present in a buffer tracepoint.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner -
Stop the function pointer casting madness and give all the li_cb instances
correct prototype.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner -
Stop the function pointer casting madness and give all the xfs_item_ops the
correct prototypes.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner -
The unpin_remove item operation instances always share most of the
implementation with the respective unpin implementation. So instead
of keeping two different entry points add a remove flag to the unpin
operation and share the code more easily.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner -
Currently we track log item descriptor belonging to a transaction using a
complex opencoded chunk allocator. This code has been there since day one
and seems to work around the lack of an efficient slab allocator.This patch replaces it with dynamically allocated log item descriptors
from a dedicated slab pool, linked to the transaction by a linked list.This allows to greatly simplify the log item descriptor tracking to the
point where it's just a couple hundred lines in xfs_trans.c instead of
a separate file. The external API has also been simplified while we're
at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/
delete items from a transaction have been simplified to the bare minium,
and the xfs_trans_find_item function is replaced with a direct dereference
of the li_desc field. All debug code walking the list of log items in
a transaction is down to a simple list_for_each_entry.Note that we could easily use a singly linked list here instead of the
double linked list from list.h as the fastpath only does deletion from
sequential traversal. But given that we don't have one available as
a library function yet I use the list.h functions for simplicity.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner -
Dmapi support was never merged upstream, but we still have a lot of hooks
bloating XFS for it, all over the fast pathes of the filesystem.This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM
support in mainline at least the namespace events can be done much saner
in the VFS instead of the individual filesystem, so it's not like this
is much help for future work.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
24 May, 2010
3 commits
-
With delayed logging, we can get inode allocation buffers in the
same transaction inode unlink buffers. We don't currently mark inode
allocation buffers in the log, so inode unlink buffers take
precedence over allocation buffers.The result is that when they are combined into the same checkpoint,
only the unlinked inode chain fields are replayed, resulting in
uninitialised inode buffers being detected when the next inode
modification is replayed.To fix this, we need to ensure that we do not set the inode buffer
flag in the buffer log item format flags if the inode allocation has
not already hit the log. To avoid requiring a change to log
recovery, we really need to make this a modification that relies
only on in-memory sate.We can do this by checking during buffer log formatting (while the
CIL cannot be flushed) if we are still in the same sequence when we
commit the unlink transaction as the inode allocation transaction.
If we are, then we do not add the inode buffer flag to the buffer
log format item flags. This means the entire buffer will be
replayed, not just the unlinked fields. We do this while
CIL flusheѕ are locked out to ensure that we don't race with the
sequence numbers changing and hence fail to put the inode buffer
flag in the buffer format flags when we really need to.Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder -
Clean up the buffer log format (XFS_BLI_*) flags because they have a
polluted namespace. They XFS_BLI_ prefix is used for both in-memory
and on-disk flag feilds, but have overlapping values for different
flags. Rename the buffer log format flags to use the XFS_BLF_*
prefix to avoid confusing them with the in-memory XFS_BLI_* prefixed
flags.Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder -
The buffer log item reference counts used to take referenceѕ for every
transaction, similar to the pin counting. This is symmetric (like the
pin/unpin) with respect to transaction completion, but with dleayed logging
becomes assymetric as the pinning becomes assymetric w.r.t. transaction
completion.To make both cases the same, allow the buffer pinning to take a reference to
the buffer log item and always drop the reference the transaction has on it
when being unlocked. This is balanced correctly because the unpin operation
always drops a reference to the log item. Hence reference counting becomes
symmetric w.r.t. item pinning as well as w.r.t active transactions and as a
result the reference counting model remain consistent between normal and
delayed logging.Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Alex Elder
19 May, 2010
2 commits
-
The staleness of a object being unpinned can be directly derived
from the object itself - there is no need to extract it from the
object then pass it as a parameter into IOP_UNPIN().This means we can kill the XFS_LID_BUF_STALE flag - it is set,
checked and cleared in the same places XFS_BLI_STALE flag in the
xfs_buf_log_item so it is now redundant and hence safe to remove.Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig -
Each log item type does manual initialisation of the log item.
Delayed logging introduces new fields that need initialisation, so
factor all the open coded initialisation into a common function
first.Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
02 Feb, 2010
1 commit
-
All buffers logged into the AIL are marked as delayed write.
When the AIL needs to push the buffer out, it issues an async write of the
buffer. This means that IO patterns are dependent on the order of
buffers in the AIL.Instead of flushing the buffer, promote the buffer in the delayed
write list so that the next time the xfsbufd is run the buffer will
be flushed by the xfsbufd. Return the state to the xfsaild that the
buffer was promoted so that the xfsaild knows that it needs to cause
the xfsbufd to run to flush the buffers that were promoted.Using the xfsbufd for issuing the IO allows us to dispatch all
buffer IO from the one queue. This means that we can make much more
enlightened decisions on what order to flush buffers to disk as
we don't have multiple places issuing IO. Optimisations to xfsbufd
will be in a future patch.Version 2
- kill XFS_ITEM_FLUSHING as it is now unused.Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
22 Jan, 2010
1 commit
-
This macro only obsfucates the log item type assignments, so kill it.
Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder
15 Dec, 2009
1 commit
-
Convert the old xfs tracing support that could only be used with the
out of tree kdb and xfsidbg patches to use the generic event tracer.To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
all xfs trace channels by:echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
or alternatively enable single events by just doing the same in one
event subdirectory, e.g.echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable
or set more complex filters, etc. In Documentation/trace/events.txt
all this is desctribed in more detail. To reads the events do acat /sys/kernel/debug/tracing/trace
Compared to the last posting this patch converts the tracing mostly to
the one tracepoint per callsite model that other users of the new
tracing facility also employ. This allows a very fine-grained control
of the tracing, a cleaner output of the traces and also enables the
perf tool to use each tracepoint as a virtual performance counter,
allowing us to e.g. count how often certain workloads git various
spots in XFS. Take a look athttp://lwn.net/Articles/346470/
for some examples.
Also the btree tracing isn't included at all yet, as it will require
additional core tracing features not in mainline yet, I plan to
deliver it later.And the really nice thing about this patch is that it actually removes
many lines of code while adding this nice functionality:fs/xfs/Makefile | 8
fs/xfs/linux-2.6/xfs_acl.c | 1
fs/xfs/linux-2.6/xfs_aops.c | 52 -
fs/xfs/linux-2.6/xfs_aops.h | 2
fs/xfs/linux-2.6/xfs_buf.c | 117 +--
fs/xfs/linux-2.6/xfs_buf.h | 33
fs/xfs/linux-2.6/xfs_fs_subr.c | 3
fs/xfs/linux-2.6/xfs_ioctl.c | 1
fs/xfs/linux-2.6/xfs_ioctl32.c | 1
fs/xfs/linux-2.6/xfs_iops.c | 1
fs/xfs/linux-2.6/xfs_linux.h | 1
fs/xfs/linux-2.6/xfs_lrw.c | 87 --
fs/xfs/linux-2.6/xfs_lrw.h | 45 -
fs/xfs/linux-2.6/xfs_super.c | 104 ---
fs/xfs/linux-2.6/xfs_super.h | 7
fs/xfs/linux-2.6/xfs_sync.c | 1
fs/xfs/linux-2.6/xfs_trace.c | 75 ++
fs/xfs/linux-2.6/xfs_trace.h | 1369 +++++++++++++++++++++++++++++++++++++++++
fs/xfs/linux-2.6/xfs_vnode.h | 4
fs/xfs/quota/xfs_dquot.c | 110 ---
fs/xfs/quota/xfs_dquot.h | 21
fs/xfs/quota/xfs_qm.c | 40 -
fs/xfs/quota/xfs_qm_syscalls.c | 4
fs/xfs/support/ktrace.c | 323 ---------
fs/xfs/support/ktrace.h | 85 --
fs/xfs/xfs.h | 16
fs/xfs/xfs_ag.h | 14
fs/xfs/xfs_alloc.c | 230 +-----
fs/xfs/xfs_alloc.h | 27
fs/xfs/xfs_alloc_btree.c | 1
fs/xfs/xfs_attr.c | 107 ---
fs/xfs/xfs_attr.h | 10
fs/xfs/xfs_attr_leaf.c | 14
fs/xfs/xfs_attr_sf.h | 40 -
fs/xfs/xfs_bmap.c | 507 +++------------
fs/xfs/xfs_bmap.h | 49 -
fs/xfs/xfs_bmap_btree.c | 6
fs/xfs/xfs_btree.c | 5
fs/xfs/xfs_btree_trace.h | 17
fs/xfs/xfs_buf_item.c | 87 --
fs/xfs/xfs_buf_item.h | 20
fs/xfs/xfs_da_btree.c | 3
fs/xfs/xfs_da_btree.h | 7
fs/xfs/xfs_dfrag.c | 2
fs/xfs/xfs_dir2.c | 8
fs/xfs/xfs_dir2_block.c | 20
fs/xfs/xfs_dir2_leaf.c | 21
fs/xfs/xfs_dir2_node.c | 27
fs/xfs/xfs_dir2_sf.c | 26
fs/xfs/xfs_dir2_trace.c | 216 ------
fs/xfs/xfs_dir2_trace.h | 72 --
fs/xfs/xfs_filestream.c | 8
fs/xfs/xfs_fsops.c | 2
fs/xfs/xfs_iget.c | 111 ---
fs/xfs/xfs_inode.c | 67 --
fs/xfs/xfs_inode.h | 76 --
fs/xfs/xfs_inode_item.c | 5
fs/xfs/xfs_iomap.c | 85 --
fs/xfs/xfs_iomap.h | 8
fs/xfs/xfs_log.c | 181 +----
fs/xfs/xfs_log_priv.h | 20
fs/xfs/xfs_log_recover.c | 1
fs/xfs/xfs_mount.c | 2
fs/xfs/xfs_quota.h | 8
fs/xfs/xfs_rename.c | 1
fs/xfs/xfs_rtalloc.c | 1
fs/xfs/xfs_rw.c | 3
fs/xfs/xfs_trans.h | 47 +
fs/xfs/xfs_trans_buf.c | 62 -
fs/xfs/xfs_vnodeops.c | 8
70 files changed, 2151 insertions(+), 2592 deletions(-)Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder
22 Dec, 2008
1 commit
-
Code does nothing so remove it.
Reviewed-by: Christoph Hellwig
Signed-off-by: Lachlan McIlroy
11 Dec, 2008
1 commit
-
Replace the b_fspriv pointer and it's ugly accessors with a properly types
xfs_mount pointer. Also switch log reocvery over to it instead of using
b_fspriv for the mount pointer.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Lachlan McIlroy
30 Oct, 2008
3 commits
-
Change all the remaining AIL API functions that are passed struct
xfs_mount pointers to pass pointers directly to the struct xfs_ail being
used. With this conversion, all external access to the AIL is via the
struct xfs_ail. Hence the operation and referencing of the AIL is almost
entirely independent of the xfs_mount that is using it - it is now much
more tightly tied to the log and the items it is tracking in the log than
it is tied to the xfs_mount.SGI-PV: 988143
SGI-Modid: xfs-linux-melb:xfs-kern:32353a
Signed-off-by: David Chinner
Signed-off-by: Lachlan McIlroy
Signed-off-by: Christoph Hellwig -
Add an xfs_ail pointer to log items so that the log items can reference
the AIL directly during callbacks without needed a struct xfs_mount.SGI-PV: 988143
SGI-Modid: xfs-linux-melb:xfs-kern:32352a
Signed-off-by: David Chinner
Signed-off-by: Lachlan McIlroy
Signed-off-by: Christoph Hellwig -
Bring the ail lock inside the struct xfs_ail. This means the AIL can be
entirely manipulated via the struct xfs_ail rather than needing both the
struct xfs_mount and the struct xfs_ail.SGI-PV: 988143
SGI-Modid: xfs-linux-melb:xfs-kern:32350a
Signed-off-by: David Chinner
Signed-off-by: Lachlan McIlroy
Signed-off-by: Christoph Hellwig
17 Sep, 2008
1 commit
-
We have a use-after-free issue where log completions access buffers via
the buffer log item and the buffer has already been freed. Fix this by
taking a reference on the buffer when attaching the buffer log item and
release the hold when the buffer log item is detached and we no longer
need the buffer. Also create a new function xfs_buf_item_free() to combine
some common code.SGI-PV: 985757
SGI-Modid: xfs-linux-melb:xfs-kern:32025a
Signed-off-by: Lachlan McIlroy
Signed-off-by: Christoph Hellwig
13 Aug, 2008
2 commits
-
Use KM_NOFS to prevent recursion back into the filesystem which can cause
deadlocks.In the case of xfs_iread() we hold the lock on the inode cluster buffer
while allocating memory for the trace buffers. If we recurse back into XFS
to flush data that may require a transaction to allocate extents which
needs log space. This can deadlock with the xfsaild thread which can't
push the tail of the log because it is trying to get the inode cluster
buffer lock.SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31838a
Signed-off-by: Lachlan McIlroy
Signed-off-by: David Chinner -
The xfs_buf_t b_iodonesema is really just a semaphore that wants to be a
completion. Change it to a completion and remove the last user of the
sema_t from XFS.SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31815a
Signed-off-by: David Chinner
Signed-off-by: Lachlan McIlroy
28 Jul, 2008
1 commit
-
kmem_free() function takes (ptr, size) arguments but doesn't actually use
second one.This patch removes size argument from all callsites.
SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31050aSigned-off-by: Denys Vlasenko
Signed-off-by: David Chinner
Signed-off-by: Lachlan McIlroy
18 Apr, 2008
1 commit
-
xfs_bawrite() can return immediate error status on async writes. Unlike
xfsbdstrat() we don't ever check the error on the buffer after the call,
so we currently do not catch errors at all here. Ensure we catch and
propagate or warn to the syslog about up-front async write errors.SGI-PV: 980084
SGI-Modid: xfs-linux-melb:xfs-kern:30824aSigned-off-by: David Chinner
Signed-off-by: Niv Sardi
Signed-off-by: Lachlan McIlroy
07 Feb, 2008
1 commit
-
SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29739aSigned-off-by: Donald Douwsma
Signed-off-by: Eric Sandeen
Signed-off-by: Tim Shimmin
15 Oct, 2007
1 commit
-
One of the perpetual scaling problems XFS has is indexing it's incore
inodes. We currently uses hashes and the default hash sizes chosen can
only ever be a tradeoff between memory consumption and the maximum
realistic size of the cache.As a result, anyone who has millions of inodes cached on a filesystem
needs to tunes the size of the cache via the ihashsize mount option to
allow decent scalability with inode cache operations.A further problem is the separate inode cluster hash, whose size is based
on the ihashsize but is smaller, and so under certain conditions (sparse
cluster cache population) this can become a limitation long before the
inode hash is causing issues.The following patchset removes the inode hash and cluster hash and
replaces them with radix trees to avoid the scalability limitations of the
hashes. It also reduces the size of the inodes by 3 pointers....SGI-PV: 969561
SGI-Modid: xfs-linux-melb:xfs-kern:29481aSigned-off-by: David Chinner
Signed-off-by: Christoph Hellwig
Signed-off-by: Tim Shimmin
14 Jul, 2007
1 commit
-
xfs_count_bits is only called once, and is then compared to 0. IOW, what
it really wants to know is, is the bitmap empty. This can be done more
simply, certainly.SGI-PV: 966503
SGI-Modid: xfs-linux-melb:xfs-kern:28944aSigned-off-by: Eric Sandeen
Signed-off-by: David Chinner
Signed-off-by: Tim Shimmin
10 Feb, 2007
1 commit
-
gcc-4.1 and more recent aggressively inline static functions which
increases XFS stack usage by ~15% in critical paths. Prevent this from
occurring by adding noinline to the STATIC definition.Also uninline some functions that are too large to be inlined and were
causing problems with CONFIG_FORCED_INLINING=y.Finally, clean up all the different users of inline, __inline and
__inline__ and put them under one STATIC_INLINE macro. For debug kernels
the STATIC_INLINE macro uninlines those functions.SGI-PV: 957159
SGI-Modid: xfs-linux-melb:xfs-kern:27585aSigned-off-by: David Chinner
Signed-off-by: David Chatterton
Signed-off-by: Tim Shimmin
28 Sep, 2006
2 commits
-
SGI-PV: 955302
SGI-Modid: xfs-linux-melb:xfs-kern:26747aSigned-off-by: Eric Sandeen
Signed-off-by: Nathan Scott
Signed-off-by: Tim Shimmin -
SGI-PV: 955302
SGI-Modid: xfs-linux-melb:xfs-kern:26746aSigned-off-by: Eric Sandeen
Signed-off-by: Nathan Scott
Signed-off-by: Tim Shimmin
20 Jun, 2006
1 commit
-
pure bloat.
SGI-PV: 952969
SGI-Modid: xfs-linux-melb:xfs-kern:26251aSigned-off-by: Nathan Scott
09 Jun, 2006
2 commits
-
interface.
SGI-PV: 953338
SGI-Modid: xfs-linux-melb:xfs-kern:26103aSigned-off-by: Nathan Scott
-
shutdown vop flags consistent with sync vop flags declarations too.
SGI-PV: 939911
SGI-Modid: xfs-linux-melb:xfs-kern:26096aSigned-off-by: Nathan Scott
29 Mar, 2006
1 commit
-
these typos.
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:25539aSigned-off-by: Nathan Scott
02 Nov, 2005
2 commits
-
boilerplate.
SGI-PV: 913862
SGI-Modid: xfs-linux:xfs-kern:23903aSigned-off-by: Nathan Scott
-
SGI-PV: 943122
SGI-Modid: xfs-linux:xfs-kern:23901aSigned-off-by: Nathan Scott
02 Sep, 2005
1 commit
-
SGI-PV: 931456
SGI-Modid: xfs-linux:xfs-kern:23155aSigned-off-by: Tim Shimmin
Signed-off-by: Nathan Scott
21 Jun, 2005
1 commit
-
SGI-PV: 936255
SGI-Modid: xfs-linux:xfs-kern:192760aSigned-off-by: Christoph Hellwig
Signed-off-by: Nathan Scott