Doug / smarc-fsl-linux-kernel | Embedian Git Server

Commit 133eeb1747c33b6d75483c074b27d4e5e02286dc

Authored by Dave Chinner 2013-06-27 14:04:48 +0800

Committed by Ben Myers 2013-06-28 02:27:37 +0800

Exists in smarc-imx_3.14.28_1.0.0_ga and in 1 other branch

xfs: don't use speculative prealloc for small files

Dedicated small file workloads have been seeing significant free
space fragmentation causing premature inode allocation failure
when large inode sizes are in use. A particular test case showed
that a workload that runs to a real ENOSPC on 256 byte inodes would
fail inode allocation with ENOSPC about about 80% full with 512 byte
inodes, and at about 50% full with 1024 byte inodes.

The same workload, when run with -o allocsize=4096 on 1024 byte
inodes would run to being 100% full before giving ENOSPC. That is,
no freespace fragmentation at all.

The issue was caused by the specific IO pattern the application had
- the framework it was using did not support direct IO, and so it
was emulating it by using fadvise(DONT_NEED). The result was that
the data was getting written back before the speculative prealloc
had been trimmed from memory by the close(), and so small single
block files were being allocated with 2 blocks, and then having one
truncated away. The result was lots of small 4k free space extents,
and hence each new 8k allocation would take another 8k from
contiguous free space and turn it into 4k of allocated space and 4k
of free space.

Hence inode allocation, which requires contiguous, aligned
allocation of 16k (256 byte inodes), 32k (512 byte inodes) or 64k
(1024 byte inodes) can fail to find sufficiently large freespace and
hence fail while there is still lots of free space available.

There's a simple fix for this, and one that has precendence in the
allocator code already - don't do speculative allocation unless the
size of the file is larger than a certain size. In this case, that
size is the minimum default preallocation size:
mp->m_writeio_blocks. And to keep with the concept of being nice to
people when the files are still relatively small, cap the prealloc
to mp->m_writeio_blocks until the file goes over a stripe unit is
size, at which point we'll fall back to the current behaviour based
on the last extent size.

This will effectively turn off speculative prealloc for very small
files, keep preallocation low for small files, and behave as it
currently does for any file larger than a stripe unit. This
completely avoids the freespace fragmentation problem this
particular IO pattern was causing.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

Showing 1 changed file with 13 additions and 0 deletions Side-by-side Diff

fs/xfs/xfs_iomap.c

fs/xfs/xfs_iomap.c

Diff comments View file @ 133eeb1

...	...	@@ -284,6 +284,15 @@
284	284	return 0;
285	285
286	286	/*
	287	+ * If the file is smaller than the minimum prealloc and we are using
	288	+ * dynamic preallocation, don't do any preallocation at all as it is
	289	+ * likely this is the only write to the file that is going to be done.
	290	+ */
	291	+ if (!(mp->m_flags & XFS_MOUNT_DFLT_IOSIZE) &&
	292	+ XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_writeio_blocks))
	293	+ return 0;
	294	+
	295	+ /*
287	296	* If there are any real blocks past eof, then don't
288	297	* do any speculative allocation.
289	298	*/
...	...	@@ -343,6 +352,10 @@
343	352
344	353	/* if we are using a specific prealloc size, return now */
345	354	if (mp->m_flags & XFS_MOUNT_DFLT_IOSIZE)
	355	+ return 0;
	356	+
	357	+ /* If the file is small, then use the minimum prealloc */
	358	+ if (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign))
346	359	return 0;
347	360
348	361	/*