Eric Lee / smarc-fsl-linux-kernel

12 Jan, 2017

1 commit

f380ee72a xfs: provide helper for counting extents from if_bytes ... Browse Code »

commit 5d829300bee000980a09ac2ccb761cb25867b67c upstream.

The open-coded pattern:

ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)

is all over the xfs code; provide a new helper
xfs_iext_count(ifp) to count the number of inline extents
in an inode fork.

[dchinner: pick up several missed conversions]

Signed-off-by: Eric Sandeen
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Greg Kroah-Hartman

Eric Sandeen
2017-01-12 18:39:39 +0800

06 Oct, 2016

1 commit

f7ca35227 xfs: create a separate cow extent size hint for the allocator ... Browse Code »

Create a per-inode extent size allocator hint for copy-on-write. This
hint is separate from the existing extent size hint so that CoW can
take advantage of the fragmentation-reducing properties of extent size
hints without disabling delalloc for regular writes.

The extent size hint that's fed to the allocator during a copy on
write operation is the greater of the cowextsize and regular extsize
hint.

During reflink, if we're sharing the entire source file to the entire
destination file and the destination file doesn't already have a
cowextsize hint, propagate the source file's cowextsize hint to the
destination file.

Furthermore, zero the bulkstat buffer prior to setting the fields
so that we don't copy kernel memory contents into userspace.

Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2016-10-06 07:26:26 +0800

22 Jul, 2016

1 commit

b1c5ebb21 xfs: allocate log vector buffers outside CIL context lock ... Browse Code »

One of the problems we currently have with delayed logging is that
under serious memory pressure we can deadlock memory reclaim. THis
occurs when memory reclaim (such as run by kswapd) is reclaiming XFS
inodes and issues a log force to unpin inodes that are dirty in the
CIL.

The CIL is pushed, but this will only occur once it gets the CIL
context lock to ensure that all committing transactions are complete
and no new transactions start being committed to the CIL while the
push switches to a new context.

The deadlock occurs when the CIL context lock is held by a
committing process that is doing memory allocation for log vector
buffers, and that allocation is then blocked on memory reclaim
making progress. Memory reclaim, however, is blocked waiting for
a log force to make progress, and so we effectively deadlock at this
point.

To solve this problem, we have to move the CIL log vector buffer
allocation outside of the context lock so that memory reclaim can
always make progress when it needs to force the log. The problem
with doing this is that a CIL push can take place while we are
determining if we need to allocate a new log vector buffer for
an item and hence the current log vector may go away without
warning. That means we canot rely on the existing log vector being
present when we finally grab the context lock and so we must have a
replacement buffer ready to go at all times.

To ensure this, introduce a "shadow log vector" buffer that is
always guaranteed to be present when we gain the CIL context lock
and format the item. This shadow buffer may or may not be used
during the formatting, but if the log item does not have an existing
log vector buffer or that buffer is too small for the new
modifications, we swap it for the new shadow buffer and format
the modifications into that new log vector buffer.

The result of this is that for any object we modify more than once
in a given CIL checkpoint, we double the memory required
to track dirty regions in the log. For single modifications then
we consume the shadow log vectorwe allocate on commit, and that gets
consumed by the checkpoint. However, if we make multiple
modifications, then the second transaction commit will allocate a
shadow log vector and hence we will end up with double the memory
usage as only one of the log vectors is consumed by the CIL
checkpoint. The remaining shadow vector will be freed when th elog
item is freed.

This can probably be optimised in future - access to the shadow log
vector is serialised by the object lock (as opposited to the active
log vector, which is controlled by the CIL context lock) and so we
can probably free shadow log vector from some objects when the log
item is marked clean on removal from the AIL.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Dave Chinner
2016-07-22 07:52:35 +0800

20 May, 2016

1 commit

2a4ad5894 Merge branch 'xfs-4.7-misc-fixes' into for-next Browse Code »

Dave Chinner
2016-05-20 08:33:17 +0800

06 Apr, 2016

2 commits

6e3e6d55e xfs: mute some sparse warnings ... Browse Code »

These three warnings are fixed:

fs/xfs/xfs_inode.c:1033:44: warning: Using plain integer as NULL pointer
fs/xfs/xfs_inode_item.c:525:20: warning: context imbalance in 'xfs_inode_item_push' - unexpected unlock
fs/xfs/xfs_dquot.c:696:1: warning: symbol 'xfs_dq_get_next_id' was not declared. Should it be static?

Signed-off-by: Eryu Guan
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Eryu Guan
2016-04-06 07:47:21 +0800
30ee052e1 xfs: optimize inline symlinks ... Browse Code »

By overallocating the in-core inode fork data buffer and zero
terminating the link target in xfs_init_local_fork we can avoid
the memory allocation in ->follow_link.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2016-04-06 05:53:29 +0800

09 Feb, 2016

8 commits

c19b3b05a xfs: mode di_mode to vfs inode ... Browse Code »

Move the di_mode value from the xfs_icdinode to the VFS inode, reducing
the xfs_icdinode byte another 2 bytes and collapsing another 2 byte hole
in the structure.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800
83e06f21b xfs: move di_changecount to VFS inode ... Browse Code »

We can store the di_changecount in the i_version field of the VFS
inode and remove another 8 bytes from the xfs_icdinode.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800
9e9a2674e xfs: move inode generation count to VFS inode ... Browse Code »

Pull another 4 bytes out of the xfs_icdinode.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800
54d7b5c1d xfs: use vfs inode nlink field everywhere ... Browse Code »

The VFS tracks the inode nlink just like the xfs_icdinode. We can
remove the variable from the icdinode and use the VFS inode variable
everywhere, reducing the size of the xfs_icdinode by a further 4
bytes.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800
faeb4e471 xfs: move v1 inode conversion to xfs_inode_from_disk ... Browse Code »

So we don't have to carry an di_onlink variable around anymore, move
the inode conversion from v1 inode format to v2 inode format into
xfs_inode_from_disk(). This means we can remove the di_onlink fields
from the struct xfs_icdinode.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800
93f958f9c xfs: cull unnecessary icdinode fields ... Browse Code »

Now that the struct xfs_icdinode is not directly related to the
on-disk format, we can cull things in it we really don't need to
store:

- magic number never changes
- padding is not necessary
- next_unlinked is never used
- inode number is redundant
- uuid is redundant
- lsn is accessed directly from dinode
- inode CRC is only accessed directly from dinode

Hence we can remove these from the struct xfs_icdinode and redirect
the code that uses them to the xfs_dinode appripriately. This
reduces the size of the struct icdinode from 152 bytes to 88 bytes,
and removes a fair chunk of unnecessary code, too.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800
3987848c7 xfs: remove timestamps from incore inode ... Browse Code »

The struct xfs_inode has two copies of the current timestamps in it,
one in the vfs inode and one in the struct xfs_icdinode. Now that we
no longer log the struct xfs_icdinode directly, we don't need to
keep the timestamps in this structure. instead we can copy them
straight out of the VFS inode when formatting the inode log item or
the on-disk inode.

This reduces the struct xfs_inode in size by 24 bytes.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800
f8d55aa05 xfs: introduce inode log format object ... Browse Code »

We currently carry around and log an entire inode core in the
struct xfs_inode. A lot of the information in the inode core is
duplicated in the VFS inode, but we cannot remove this duplication
of infomration because the inode core is logged directly in
xfs_inode_item_format().

Add a new function xfs_inode_item_format_core() that copies the
inode core data into a struct xfs_icdinode that is pulled directly
from the log vector buffer. This means we no longer directly
copy the inode core, but copy the structures one member at a time.
This will be slightly less efficient than copying, but will allow us
to remove duplicate and unnecessary items from the struct xfs_inode.

To enable us to do this, call the new structure a xfs_log_dinode,
so that we know it's different to the physical xfs_dinode and the
in-core xfs_icdinode.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2016-02-09 13:54:58 +0800

03 Nov, 2015

1 commit

fc0561cef xfs: optimise away log forces on timestamp updates for fdatasync ... Browse Code »

xfs: timestamp updates cause excessive fdatasync log traffic

Sage Weil reported that a ceph test workload was writing to the
log on every fdatasync during an overwrite workload. Event tracing
showed that the only metadata modification being made was the
timestamp updates during the write(2) syscall, but fdatasync(2)
is supposed to ignore them. The key observation was that the
transactions in the log all looked like this:

INODE: #regs: 4 ino: 0x8b flags: 0x45 dsize: 32

And contained a flags field of 0x45 or 0x85, and had data and
attribute forks following the inode core. This means that the
timestamp updates were triggering dirty relogging of previously
logged parts of the inode that hadn't yet been flushed back to
disk.

There are two parts to this problem. The first is that XFS relogs
dirty regions in subsequent transactions, so it carries around the
fields that have been dirtied since the last time the inode was
written back to disk, not since the last time the inode was forced
into the log.

The second part is that on v5 filesystems, the inode change count
update during inode dirtying also sets the XFS_ILOG_CORE flag, so
on v5 filesystems this makes a timestamp update dirty the entire
inode.

As a result when fdatasync is run, it looks at the dirty fields in
the inode, and sees more than just the timestamp flag, even though
the only metadata change since the last fdatasync was just the
timestamps. Hence we force the log on every subsequent fdatasync
even though it is not needed.

To fix this, add a new field to the inode log item that tracks
changes since the last time fsync/fdatasync forced the log to flush
the changes to the journal. This flag is updated when we dirty the
inode, but we do it before updating the change count so it does not
carry the "core dirty" flag from timestamp updates. The fields are
zeroed when the inode is marked clean (due to writeback/freeing) or
when an fsync/datasync forces the log. Hence if we only dirty the
timestamps on the inode between fsync/fdatasync calls, the fdatasync
will not trigger another log force.

Over 100 runs of the test program:

Ext4 baseline:
runtime: 1.63s +/- 0.24s
avg lat: 1.59ms +/- 0.24ms
iops: ~2000

XFS, vanilla kernel:
runtime: 2.45s +/- 0.18s
avg lat: 2.39ms +/- 0.18ms
log forces: ~400/s
iops: ~1000

XFS, patched kernel:
runtime: 1.49s +/- 0.26s
avg lat: 1.46ms +/- 0.25ms
log forces: ~30/s
iops: ~1500

Reported-by: Sage Weil
Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Dave Chinner
2015-11-03 10:14:59 +0800

19 Aug, 2015

1 commit

146e54b71 xfs: add helper to conditionally remove items from the AIL ... Browse Code »

Several areas of code duplicate a pattern where we take the AIL lock,
check whether an item is in the AIL and remove it if so. Create a new
helper for this pattern and use it where appropriate.

Signed-off-by: Brian Foster

Brian Foster
2015-08-19 08:01:08 +0800

28 Nov, 2014

3 commits

bb58e6188 xfs: move most of xfs_sb.h to xfs_format.h ... Browse Code »

More on-disk format consolidation.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2014-11-28 11:27:09 +0800
4fb6e8ade xfs: merge xfs_ag.h into xfs_format.h ... Browse Code »

More on-disk format consolidation. A few declarations that weren't on-disk
format related move into better suitable spots.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2014-11-28 11:25:04 +0800
6d3ebaae7 xfs: merge xfs_dinode.h into xfs_format.h ... Browse Code »

More consolidatation for the on-disk format defintions. Note that the
XFS_IS_REALTIME_INODE moves to xfs_linux.h instead as it is not related
to the on disk format, but depends on a CONFIG_ option.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2014-11-28 11:24:06 +0800

03 Oct, 2014

1 commit

52177937e xfs: xfs_iflush_done checks the wrong log item callback ... Browse Code »

Commit 3013683 ("xfs: remove all the inodes on a buffer from the AIL
in bulk") made the xfs inode flush callback more efficient by
combining all the inode writes on the buffer and the deletions of
the inode log item from AIL.

The initial loop in this patch should be looping through all
the log items on the buffer to see which items have
xfs_iflush_done as their callback function. But currently,
only the log item passed to the function has its callback
compared to xfs_iflush_done. If the log item pointer passed to
the function does have the xfs_iflush_done callback function,
then all the log items on the buffer are removed from the
li_bio_list on the buffer b_fspriv and could be removed from
the AIL even though they may have not been written yet.

This problem is masked by the fact that currently all inodes on a
buffer will have the same calback function - either xfs_iflush_done
or xfs_istale_done - and hence the bug cannot manifest in any way.
Still, we need to remove the landmine so that if we add new
callbacks in future this doesn't cause us problems.

Signed-off-by: Mark Tinguely
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Mark Tinguely
2014-10-03 07:09:50 +0800

25 Jun, 2014

1 commit

2451337dd xfs: global error sign conversion ... Browse Code »

Convert all the errors the core XFs code to negative error signs
like the rest of the kernel and remove all the sign conversion we
do in the interface layers.

Errors for conversion (and comparison) found via searches like:

$ git grep " E" fs/xfs
$ git grep "return E" fs/xfs
$ git grep " E[A-Z].*;$" fs/xfs

Negation points found via searches like:

$ git grep "= -[a-z,A-Z]" fs/xfs
$ git grep "return -[a-z,A-D,F-Z]" fs/xfs
$ git grep " -[a-z].*;" fs/xfs

[ with some bits I missed from Brian Foster ]

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Dave Chinner
2014-06-25 12:58:08 +0800

20 May, 2014

1 commit

263997a68 xfs: turn NLINK feature on by default ... Browse Code »

mkfs has turned on the XFS_SB_VERSION_NLINKBIT feature bit by
default since November 2007. It's about time we simply made the
kernel code turn it on by default and so always convert v1 inodes to
v2 inodes when reading them in from disk or allocating them. This
This removes needless version checks and modification when bumping
link counts on inodes, and will take code out of a few common code
paths.

text data bss dec hex filename
783251 100867 616 884734 d7ffe fs/xfs/xfs.o.orig
782664 100867 616 884147 d7db3 fs/xfs/xfs.o.patched

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Dave Chinner

Dave Chinner
2014-05-20 05:46:40 +0800

13 Dec, 2013

6 commits

2f251293b xfs: remove the inode log format from the inode log item ... Browse Code »

No need to keep the inode log format around all the time, we can
easily generate it at iop_format time.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2013-12-13 08:34:05 +0800
da7765031 xfs: format logged extents directly into the CIL ... Browse Code »

With the new iop_format scheme there is no need to have a temporary buffer
to format logged extents into, we can do so directly into the CIL. This
also allows to remove the shortcut for big endian systems that probably
hasn't gotten a lot of test coverage for a long time.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2013-12-13 08:34:04 +0800
bde7cff67 xfs: format log items write directly into the linear CIL buffer ... Browse Code »

Instead of setting up pointers to memory locations in iop_format which then
get copied into the CIL linear buffer after return move the copy into
the individual inode items. This avoids the need to always have a memory
block in the exact same layout that gets written into the log around, and
allow the log items to be much more flexible in their in-memory layouts.

The only caveat is that we need to properly align the data for each
iovec so that don't have structures misaligned in subsequent iovecs.

Note that all log item format routines now need to be careful to modify
the copy of the item that was placed into the CIL after calls to
xlog_copy_iovec instead of the in-memory copy.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2013-12-13 08:34:02 +0800
1234351cb xfs: introduce xlog_copy_iovec ... Browse Code »

Add a helper to abstract out filling the log iovecs in the log item
format handlers. This will allow us to change the way we do the log
item formatting more easily.

The copy in the name is a bit confusing for now as it just assigns a
pointer and lets the CIL code perform the copy, but that will change
soon.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2013-12-13 08:00:43 +0800
3de559fbd xfs: refactor xfs_inode_item_format ... Browse Code »

Split out a function to handle the data and attr fork, as well as a
helper for the really old v1 inodes.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2013-12-13 08:00:43 +0800
ce9641d6c xfs: refactor xfs_inode_item_size ... Browse Code »

Split out two helpers to size the data and attribute to make the
function more readable.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2013-12-13 08:00:43 +0800

24 Oct, 2013

2 commits

a4fbe6ab1 xfs: decouple inode and bmap btree header files ... Browse Code »

Currently the xfs_inode.h header has a dependency on the definition
of the BMAP btree records as the inode fork includes an array of
xfs_bmbt_rec_host_t objects in it's definition.

Move all the btree format definitions from xfs_btree.h,
xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
xfs_format.h to continue the process of centralising the on-disk
format definitions. With this done, the xfs inode definitions are no
longer dependent on btree header files.

The enables a massive culling of unnecessary includes, with close to
200 #include directives removed from the XFS kernel code base.

Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

Dave Chinner
2013-10-24 05:28:49 +0800
239880ef6 xfs: decouple log and transaction headers ... Browse Code »

xfs_trans.h has a dependency on xfs_log.h for a couple of
structures. Most code that does transactions doesn't need to know
anything about the log, but this dependency means that they have to
include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
files and clean up the includes to be in dependency order.

In doing this, remove the direct include of xfs_trans_reserve.h from
xfs_trans.h so that we remove the dependency between xfs_trans.h and
xfs_mount.h. Hence the xfs_trans.h include can be moved to the
indicate the actual dependencies other header files have on it.

Note that these are kernel only header files, so this does not
translate to any userspace changes at all.

Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

Dave Chinner
2013-10-24 05:17:44 +0800

14 Aug, 2013

1 commit

166d13688 xfs: return log item size in IOP_SIZE ... Browse Code »

To begin optimising the CIL commit process, we need to have IOP_SIZE
return both the number of vectors and the size of the data pointed
to by the vectors. This enables us to calculate the size ofthe
memory allocation needed before the formatting step and reduces the
number of memory allocations per item by one.

While there, kill the IOP_SIZE macro.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2013-08-14 05:10:21 +0800

22 Apr, 2013

1 commit

93848a999 xfs: add version 3 inode format with CRCs ... Browse Code »

Add a new inode version with a larger core. The primary objective is
to allow for a crc of the inode, and location information (uuid and ino)
to verify it was written in the right place. We also extend it by:

a creation time (for Samba);
a changecount (for NFSv4);
a flush sequence (in LSN format for recovery);
an additional inode flags field; and
some additional padding.

These additional fields are not implemented yet, but already laid
out in the structure.

[dchinner@redhat.com] Added LSN and flags field, some factoring and rework to
capture all the necessary information in the crc calculation.

Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

Christoph Hellwig
2013-04-22 04:03:33 +0800

18 Dec, 2012

1 commit

ec47eb6b0 xfs remove the XFS_TRANS_DEBUG routines ... Browse Code »

Remove the XFS_TRANS_DEBUG routines. They are no longer appropriate
and have not been used in years

Signed-off-by: Mark Tinguely
Reviewed-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Mark Tinguely
2012-12-18 06:29:00 +0800

22 Jun, 2012

1 commit

9a3a5dab6 xfs: check for stale inode before acquiring iflock on push ... Browse Code »

An inode in the AIL can be flush locked and marked stale if
a cluster free transaction occurs at the right time. The
inode item is then marked as flushing, which causes xfsaild
to spin and leaves the filesystem stalled. This is
reproduced by running xfstests 273 in a loop for an
extended period of time.

Check for stale inodes before the flush lock. This marks
the inode as pinned, leads to a log flush and allows the
filesystem to proceed.

Signed-off-by: Brian Foster
Reviewed-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Brian Foster
2012-06-22 03:20:06 +0800

15 May, 2012

5 commits

ad1e95c54 xfs: clean up xfs_bit.h includes ... Browse Code »

With the removal of xfs_rw.h and other changes over time, xfs_bit.h
is being included in many files that don't actually need it. Clean
up the includes as necessary.

Also move the only-used-once xfs_ialloc_find_free() static inline
function out of a header file that is widely included to reduce
the number of needless dependencies on xfs_bit.h.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-05-15 05:21:00 +0800
60a34607b xfs: move xfsagino_t to xfs_types.h ... Browse Code »

Untangle the header file includes a bit by moving the definition of
xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
xfs_ag.h.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-05-15 05:20:54 +0800
04913fdd9 xfs: pass shutdown method into xfs_trans_ail_delete_bulk ... Browse Code »

xfs_trans_ail_delete_bulk() can be called from different contexts so
if the item is not in the AIL we need different shutdown for each
context. Pass in the shutdown method needed so the correct action
can be taken.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-05-15 05:20:33 +0800
43ff2122e xfs: on-stack delayed write buffer lists ... Browse Code »

Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
and write back the buffers per-process instead of by waking up xfsbufd.

This is now easily doable given that we have very few places left that write
delwri buffers:

- log recovery:
Only done at mount time, and already forcing out the buffers
synchronously using xfs_flush_buftarg

- quotacheck:
Same story.

- dquot reclaim:
Writes out dirty dquots on the LRU under memory pressure. We might
want to look into doing more of this via xfsaild, but it's already
more optimal than the synchronous inode reclaim that writes each
buffer synchronously.

- xfsaild:
This is the main beneficiary of the change. By keeping a local list
of buffers to write we reduce latency of writing out buffers, and
more importably we can remove all the delwri list promotions which
were hitting the buffer cache hard under sustained metadata loads.

The implementation is very straight forward - xfs_buf_delwri_queue now gets
a new list_head pointer that it adds the delwri buffers to, and all callers
need to eventually submit the list using xfs_buf_delwi_submit or
xfs_buf_delwi_submit_nowait. Buffers that already are on a delwri list are
skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
list. The biggest change to pass down the buffer list was done to the AIL
pushing. Now that we operate on buffers the trylock, push and pushbuf log
item methods are merged into a single push routine, which tries to lock the
item, and if possible add the buffer that needs writeback to the buffer list.
This leads to much simpler code than the previous split but requires the
individual IOP_PUSH instances to unlock and reacquire the AIL around calls
to blocking routines.

Given that xfsailds now also handle writing out buffers, the conditions for
log forcing and the sleep times needed some small changes. The most
important one is that we consider an AIL busy as long we still have buffers
to push, and the other one is that we do increment the pushed LSN for
buffers that are under flushing at this moment, but still count them towards
the stuck items for restart purposes. Without this we could hammer on stuck
items without ever forcing the log and not make progress under heavy random
delete workloads on fast flash storage devices.

[ Dave Chinner:
- rebase on previous patches.
- improved comments for XBF_DELWRI_Q handling
- fix XBF_ASYNC handling in queue submission (test 106 failure)
- rename delwri submit function buffer list parameters for clarity
- xfs_efd_item_push() should return XFS_ITEM_PINNED ]

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Christoph Hellwig
2012-05-15 05:20:31 +0800
4c46819a8 xfs: do not write the buffer from xfs_iflush ... Browse Code »

Instead of writing the buffer directly from inside xfs_iflush return it to
the caller and let the caller decide what to do with the buffer. Also
remove the pincount check in xfs_iflush that all non-blocking callers already
implement and the now unused flags parameter.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Christoph Hellwig
2012-05-15 05:20:28 +0800

14 Mar, 2012

1 commit

8f639ddea xfs: reimplement fdatasync support ... Browse Code »

Add an in-memory only flag to say we logged timestamps only, and use it to
check if fdatasync can optimize away the log force.

Reviewed-by: Dave Chinner
Signed-off-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Christoph Hellwig
2012-03-14 06:18:14 +0800