Commit 309c848002052edbec650075a1eb098b17c17f35

Authored by Dave Chinner
Committed by Alex Elder
1 parent 90810b9e82

xfs: delayed alloc blocks beyond EOF are valid after writeback

There is an assumption in the parts of XFS that flushing a dirty
file will make all the delayed allocation blocks disappear from an
inode. That is, that after calling xfs_flush_pages() then
ip->i_delayed_blks will be zero.

This is an invalid assumption as we may have specualtive
preallocation beyond EOF and they are recorded in
ip->i_delayed_blks. A flush of the dirty pages of an inode will not
change the state of these blocks beyond EOF, so a non-zero
deeelalloc block count after a flush is valid.

The bmap code has an invalid ASSERT() that needs to be removed, and
the swapext code has a bug in that while it swaps the data forks
around, it fails to swap the i_delayed_blks counter associated with
the fork and hence can get the block accounting wrong.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

Showing 2 changed files with 20 additions and 2 deletions Side-by-side Diff

... ... @@ -5471,8 +5471,13 @@
5471 5471 if (error)
5472 5472 goto out_unlock_iolock;
5473 5473 }
5474   -
5475   - ASSERT(ip->i_delayed_blks == 0);
  5474 + /*
  5475 + * even after flushing the inode, there can still be delalloc
  5476 + * blocks on the inode beyond EOF due to speculative
  5477 + * preallocation. These are not removed until the release
  5478 + * function is called or the inode is inactivated. Hence we
  5479 + * cannot assert here that ip->i_delayed_blks == 0.
  5480 + */
5476 5481 }
5477 5482  
5478 5483 lock = xfs_ilock_map_shared(ip);
... ... @@ -377,6 +377,19 @@
377 377 ip->i_d.di_format = tip->i_d.di_format;
378 378 tip->i_d.di_format = tmp;
379 379  
  380 + /*
  381 + * The extents in the source inode could still contain speculative
  382 + * preallocation beyond EOF (e.g. the file is open but not modified
  383 + * while defrag is in progress). In that case, we need to copy over the
  384 + * number of delalloc blocks the data fork in the source inode is
  385 + * tracking beyond EOF so that when the fork is truncated away when the
  386 + * temporary inode is unlinked we don't underrun the i_delayed_blks
  387 + * counter on that inode.
  388 + */
  389 + ASSERT(tip->i_delayed_blks == 0);
  390 + tip->i_delayed_blks = ip->i_delayed_blks;
  391 + ip->i_delayed_blks = 0;
  392 +
380 393 ilf_fields = XFS_ILOG_CORE;
381 394  
382 395 switch(ip->i_d.di_format) {