Eric Lee / smarc-fsl-linux-kernel

22 Jul, 2016

1 commit

b1c5ebb21 xfs: allocate log vector buffers outside CIL context lock ... Browse Code »

One of the problems we currently have with delayed logging is that
under serious memory pressure we can deadlock memory reclaim. THis
occurs when memory reclaim (such as run by kswapd) is reclaiming XFS
inodes and issues a log force to unpin inodes that are dirty in the
CIL.

The CIL is pushed, but this will only occur once it gets the CIL
context lock to ensure that all committing transactions are complete
and no new transactions start being committed to the CIL while the
push switches to a new context.

The deadlock occurs when the CIL context lock is held by a
committing process that is doing memory allocation for log vector
buffers, and that allocation is then blocked on memory reclaim
making progress. Memory reclaim, however, is blocked waiting for
a log force to make progress, and so we effectively deadlock at this
point.

To solve this problem, we have to move the CIL log vector buffer
allocation outside of the context lock so that memory reclaim can
always make progress when it needs to force the log. The problem
with doing this is that a CIL push can take place while we are
determining if we need to allocate a new log vector buffer for
an item and hence the current log vector may go away without
warning. That means we canot rely on the existing log vector being
present when we finally grab the context lock and so we must have a
replacement buffer ready to go at all times.

To ensure this, introduce a "shadow log vector" buffer that is
always guaranteed to be present when we gain the CIL context lock
and format the item. This shadow buffer may or may not be used
during the formatting, but if the log item does not have an existing
log vector buffer or that buffer is too small for the new
modifications, we swap it for the new shadow buffer and format
the modifications into that new log vector buffer.

The result of this is that for any object we modify more than once
in a given CIL checkpoint, we double the memory required
to track dirty regions in the log. For single modifications then
we consume the shadow log vectorwe allocate on commit, and that gets
consumed by the checkpoint. However, if we make multiple
modifications, then the second transaction commit will allocate a
shadow log vector and hence we will end up with double the memory
usage as only one of the log vectors is consumed by the CIL
checkpoint. The remaining shadow vector will be freed when th elog
item is freed.

This can probably be optimised in future - access to the shadow log
vector is serialised by the object lock (as opposited to the active
log vector, which is controlled by the CIL context lock) and so we
can probably free shadow log vector from some objects when the log
item is marked clean on removal from the AIL.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner

Dave Chinner
2016-07-22 07:52:35 +0800

28 Nov, 2014

2 commits

bb58e6188 xfs: move most of xfs_sb.h to xfs_format.h ... Browse Code »

More on-disk format consolidation.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2014-11-28 11:27:09 +0800
4fb6e8ade xfs: merge xfs_ag.h into xfs_format.h ... Browse Code »

More on-disk format consolidation. A few declarations that weren't on-disk
format related move into better suitable spots.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2014-11-28 11:25:04 +0800

13 Dec, 2013

4 commits

ffda4e83a xfs: remove the quotaoff log format from the quotaoff log item ... Browse Code »

This one doesn't save a whole lot of memory, but still makes the
code simpler.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2013-12-13 08:34:08 +0800
ce8e96293 xfs: remove the dquot log format from the dquot log item ... Browse Code »

No need to keep the dquot log format around all the time, we can
easily generate it at iop_format time.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2013-12-13 08:34:07 +0800
bde7cff67 xfs: format log items write directly into the linear CIL buffer ... Browse Code »

Instead of setting up pointers to memory locations in iop_format which then
get copied into the CIL linear buffer after return move the copy into
the individual inode items. This avoids the need to always have a memory
block in the exact same layout that gets written into the log around, and
allow the log items to be much more flexible in their in-memory layouts.

The only caveat is that we need to properly align the data for each
iovec so that don't have structures misaligned in subsequent iovecs.

Note that all log item format routines now need to be careful to modify
the copy of the item that was placed into the CIL after calls to
xlog_copy_iovec instead of the in-memory copy.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2013-12-13 08:34:02 +0800
1234351cb xfs: introduce xlog_copy_iovec ... Browse Code »

Add a helper to abstract out filling the log iovecs in the log item
format handlers. This will allow us to change the way we do the log
item formatting more easily.

The copy in the name is a bit confusing for now as it just assigns a
pointer and lets the CIL code perform the copy, but that will change
soon.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Christoph Hellwig
2013-12-13 08:00:43 +0800

24 Oct, 2013

2 commits

a4fbe6ab1 xfs: decouple inode and bmap btree header files ... Browse Code »

Currently the xfs_inode.h header has a dependency on the definition
of the BMAP btree records as the inode fork includes an array of
xfs_bmbt_rec_host_t objects in it's definition.

Move all the btree format definitions from xfs_btree.h,
xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
xfs_format.h to continue the process of centralising the on-disk
format definitions. With this done, the xfs inode definitions are no
longer dependent on btree header files.

The enables a massive culling of unnecessary includes, with close to
200 #include directives removed from the XFS kernel code base.

Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

Dave Chinner
2013-10-24 05:28:49 +0800
239880ef6 xfs: decouple log and transaction headers ... Browse Code »

xfs_trans.h has a dependency on xfs_log.h for a couple of
structures. Most code that does transactions doesn't need to know
anything about the log, but this dependency means that they have to
include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
files and clean up the includes to be in dependency order.

In doing this, remove the direct include of xfs_trans_reserve.h from
xfs_trans.h so that we remove the dependency between xfs_trans.h and
xfs_mount.h. Hence the xfs_trans.h include can be moved to the
indicate the actual dependencies other header files have on it.

Note that these are kernel only header files, so this does not
translate to any userspace changes at all.

Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

Dave Chinner
2013-10-24 05:17:44 +0800

10 Sep, 2013

1 commit

a30b03679 xfs: fix some minor sparse warnings ... Browse Code »

A couple of simple locking annotations and 0 vs NULL warnings.
Nothing that changes any code behaviour, just removes build noise.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Ben Myers

Dave Chinner
2013-09-10 06:43:05 +0800

14 Aug, 2013

1 commit

166d13688 xfs: return log item size in IOP_SIZE ... Browse Code »

To begin optimising the CIL commit process, we need to have IOP_SIZE
return both the number of vectors and the size of the data pointed
to by the vectors. This enables us to calculate the size ofthe
memory allocation needed before the formatting step and reduces the
number of memory allocations per item by one.

While there, kill the IOP_SIZE macro.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2013-08-14 05:10:21 +0800

13 Aug, 2013

1 commit

6ca1c9063 xfs: separate dquot on disk format definitions out of xfs_quota.h ... Browse Code »

The on disk format definitions of the on-disk dquot, log formats and
quota off log formats are all intertwined with other definitions for
quotas. Separate them out into their own header file so they can
easily be shared with userspace.

Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2013-08-13 05:09:52 +0800

15 May, 2012

5 commits

ad1e95c54 xfs: clean up xfs_bit.h includes ... Browse Code »

With the removal of xfs_rw.h and other changes over time, xfs_bit.h
is being included in many files that don't actually need it. Clean
up the includes as necessary.

Also move the only-used-once xfs_ialloc_find_free() static inline
function out of a header file that is widely included to reduce
the number of needless dependencies on xfs_bit.h.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-05-15 05:21:00 +0800
60a34607b xfs: move xfsagino_t to xfs_types.h ... Browse Code »

Untangle the header file includes a bit by moving the definition of
xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
xfs_ag.h.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-05-15 05:20:54 +0800
04913fdd9 xfs: pass shutdown method into xfs_trans_ail_delete_bulk ... Browse Code »

xfs_trans_ail_delete_bulk() can be called from different contexts so
if the item is not in the AIL we need different shutdown for each
context. Pass in the shutdown method needed so the correct action
can be taken.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-05-15 05:20:33 +0800
43ff2122e xfs: on-stack delayed write buffer lists ... Browse Code »

Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
and write back the buffers per-process instead of by waking up xfsbufd.

This is now easily doable given that we have very few places left that write
delwri buffers:

- log recovery:
Only done at mount time, and already forcing out the buffers
synchronously using xfs_flush_buftarg

- quotacheck:
Same story.

- dquot reclaim:
Writes out dirty dquots on the LRU under memory pressure. We might
want to look into doing more of this via xfsaild, but it's already
more optimal than the synchronous inode reclaim that writes each
buffer synchronously.

- xfsaild:
This is the main beneficiary of the change. By keeping a local list
of buffers to write we reduce latency of writing out buffers, and
more importably we can remove all the delwri list promotions which
were hitting the buffer cache hard under sustained metadata loads.

The implementation is very straight forward - xfs_buf_delwri_queue now gets
a new list_head pointer that it adds the delwri buffers to, and all callers
need to eventually submit the list using xfs_buf_delwi_submit or
xfs_buf_delwi_submit_nowait. Buffers that already are on a delwri list are
skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
list. The biggest change to pass down the buffer list was done to the AIL
pushing. Now that we operate on buffers the trylock, push and pushbuf log
item methods are merged into a single push routine, which tries to lock the
item, and if possible add the buffer that needs writeback to the buffer list.
This leads to much simpler code than the previous split but requires the
individual IOP_PUSH instances to unlock and reacquire the AIL around calls
to blocking routines.

Given that xfsailds now also handle writing out buffers, the conditions for
log forcing and the sleep times needed some small changes. The most
important one is that we consider an AIL busy as long we still have buffers
to push, and the other one is that we do increment the pushed LSN for
buffers that are under flushing at this moment, but still count them towards
the stuck items for restart purposes. Without this we could hammer on stuck
items without ever forcing the log and not make progress under heavy random
delete workloads on fast flash storage devices.

[ Dave Chinner:
- rebase on previous patches.
- improved comments for XBF_DELWRI_Q handling
- fix XBF_ASYNC handling in queue submission (test 106 failure)
- rename delwri submit function buffer list parameters for clarity
- xfs_efd_item_push() should return XFS_ITEM_PINNED ]

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Christoph Hellwig
2012-05-15 05:20:31 +0800
fe7257fd4 xfs: do not write the buffer from xfs_qm_dqflush ... Browse Code »

Instead of writing the buffer directly from inside xfs_qm_dqflush return it
to the caller and let the caller decide what to do with the buffer. Also
remove the pincount check in xfs_qm_dqflush that all non-blocking callers
already implement and the now unused flags parameter and the XFS_DQ_IS_DIRTY
check that all callers already perform.

[ Dave Chinner: fixed build error cause by missing '{'. ]

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Christoph Hellwig
2012-05-15 05:20:29 +0800

13 Dec, 2011

2 commits

800b484ec xfs: cleanup dquot locking helpers ... Browse Code »

Mark the trivial lock wrappers as inline, and make the naming consistent
for all of them.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Christoph Hellwig
2011-12-13 07:28:20 +0800
fdedf28b9 xfs: untangle SYNC_WAIT and SYNC_TRYLOCK meanings for xfs_qm_dqflush ... Browse Code »

Only skip pinned dquots if SYNC_TRYLOCK is specified, and adjust the callers
to keep the behaviour unchanged.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Christoph Hellwig
2011-12-13 06:31:01 +0800

09 Dec, 2011

1 commit

b39342134 xfs: remove the lid_size field in struct log_item_desc ... Browse Code »

Outside the now removed nodelaylog code this field is only used for
asserts and can be safely removed now.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Signed-off-by: Ben Myers

Christoph Hellwig
2011-12-09 03:53:30 +0800

09 Nov, 2011

1 commit

272e42b21 xfs: constify xfs_item_ops ... Browse Code »

The log item ops aren't nessecarily the biggest exploit vector, but marking
them const is easy enough. Also remove the unused xfs_item_ops_t typedef
while we're at it.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Alex Elder

Christoph Hellwig
2011-11-09 00:48:23 +0800

12 Oct, 2011

1 commit

17b38471c xfs: force the log if we encounter pinned buffers in .iop_pushbuf ... Browse Code »
1

We need to check for pinned buffers even in .iop_pushbuf given that inode
items flush into the same buffers that may be pinned directly due operations
on the unlinked inode list operating directly on buffers. To do this add a
return value to .iop_pushbuf that tells the AIL push about this and use
the existing log force mechanisms to unpin it.

Signed-off-by: Christoph Hellwig
Reported-by: Stefan Priebe
Tested-by: Stefan Priebe
Reviewed-by: Dave Chinner
Signed-off-by: Alex Elder

Christoph Hellwig
2011-10-12 00:02:48 +0800

13 Aug, 2011

1 commit

c59d87c46 xfs: remove subdirectories ... Browse Code »

Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
annoying subdirectories in the XFS source code. Besides the large
amount of file rename the only changes are to the Makefile, a few
files including headers with the subdirectory prefix, and the binary
sysctl compat code that includes a header under fs/xfs/ from
kernel/.

Signed-off-by: Christoph Hellwig
Signed-off-by: Alex Elder

Christoph Hellwig
2011-08-13 05:21:35 +0800