Commit 1d1d1a767206fbe5d4c69493b7e6d2a8d08cc0a0

Authored by Darrick J. Wong
Committed by Linus Torvalds
1 parent 7d311cdab6

mm: only enforce stable page writes if the backing device requires it

Create a helper function to check if a backing device requires stable
page writes and, if so, performs the necessary wait.  Then, make it so
that all points in the memory manager that handle making pages writable
use the helper function.  This should provide stable page write support
to most filesystems, while eliminating unnecessary waiting for devices
that don't require the feature.

Before this patchset, all filesystems would block, regardless of whether
or not it was necessary.  ext3 would wait, but still generate occasional
checksum errors.  The network filesystems were left to do their own
thing, so they'd wait too.

After this patchset, all the disk filesystems except ext3 and btrfs will
wait only if the hardware requires it.  ext3 (if necessary) snapshots
pages instead of blocking, and btrfs provides its own bdi so the mm will
never wait.  Network filesystems haven't been touched, so either they
provide their own stable page guarantees or they don't block at all.
The blocking behavior is back to what it was before 3.0 if you don't
have a disk requiring stable page writes.

Here's the result of using dbench to test latency on ext2:

3.8.0-rc3:
 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 WriteX        109347     0.028    59.817
 ReadX         347180     0.004     3.391
 Flush          15514    29.828   287.283

Throughput 57.429 MB/sec  4 clients  4 procs  max_latency=287.290 ms

3.8.0-rc3 + patches:
 WriteX        105556     0.029     4.273
 ReadX         335004     0.005     4.112
 Flush          14982    30.540   298.634

Throughput 55.4496 MB/sec  4 clients  4 procs  max_latency=298.650 ms

As you can see, the maximum write latency drops considerably with this
patch enabled.  The other filesystems (ext3/ext4/xfs/btrfs) behave
similarly, but see the cover letter for those results.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 7 changed files with 27 additions and 5 deletions Side-by-side Diff

... ... @@ -2359,7 +2359,7 @@
2359 2359 if (unlikely(ret < 0))
2360 2360 goto out_unlock;
2361 2361 set_page_dirty(page);
2362   - wait_on_page_writeback(page);
  2362 + wait_for_stable_page(page);
2363 2363 return 0;
2364 2364 out_unlock:
2365 2365 unlock_page(page);
... ... @@ -4968,7 +4968,7 @@
4968 4968 0, len, NULL,
4969 4969 ext4_bh_unmapped)) {
4970 4970 /* Wait so that we don't change page under IO */
4971   - wait_on_page_writeback(page);
  4971 + wait_for_stable_page(page);
4972 4972 ret = VM_FAULT_LOCKED;
4973 4973 goto out;
4974 4974 }
... ... @@ -483,7 +483,7 @@
483 483 gfs2_holder_uninit(&gh);
484 484 if (ret == 0) {
485 485 set_page_dirty(page);
486   - wait_on_page_writeback(page);
  486 + wait_for_stable_page(page);
487 487 }
488 488 sb_end_pagefault(inode->i_sb);
489 489 return block_page_mkwrite_return(ret);
... ... @@ -126,7 +126,7 @@
126 126 nilfs_transaction_commit(inode->i_sb);
127 127  
128 128 mapped:
129   - wait_on_page_writeback(page);
  129 + wait_for_stable_page(page);
130 130 out:
131 131 sb_end_pagefault(inode->i_sb);
132 132 return block_page_mkwrite_return(ret);
include/linux/pagemap.h
... ... @@ -414,6 +414,7 @@
414 414 }
415 415  
416 416 extern void end_page_writeback(struct page *page);
  417 +void wait_for_stable_page(struct page *page);
417 418  
418 419 /*
419 420 * Add an arbitrary waiter to a page's wait queue
... ... @@ -1728,6 +1728,7 @@
1728 1728 * see the dirty page and writeprotect it again.
1729 1729 */
1730 1730 set_page_dirty(page);
  1731 + wait_for_stable_page(page);
1731 1732 out:
1732 1733 sb_end_pagefault(inode->i_sb);
1733 1734 return ret;
... ... @@ -2274,7 +2275,7 @@
2274 2275 return NULL;
2275 2276 }
2276 2277 found:
2277   - wait_on_page_writeback(page);
  2278 + wait_for_stable_page(page);
2278 2279 return page;
2279 2280 }
2280 2281 EXPORT_SYMBOL(grab_cache_page_write_begin);
... ... @@ -2290,4 +2290,24 @@
2290 2290 return radix_tree_tagged(&mapping->page_tree, tag);
2291 2291 }
2292 2292 EXPORT_SYMBOL(mapping_tagged);
  2293 +
  2294 +/**
  2295 + * wait_for_stable_page() - wait for writeback to finish, if necessary.
  2296 + * @page: The page to wait on.
  2297 + *
  2298 + * This function determines if the given page is related to a backing device
  2299 + * that requires page contents to be held stable during writeback. If so, then
  2300 + * it will wait for any pending writeback to complete.
  2301 + */
  2302 +void wait_for_stable_page(struct page *page)
  2303 +{
  2304 + struct address_space *mapping = page_mapping(page);
  2305 + struct backing_dev_info *bdi = mapping->backing_dev_info;
  2306 +
  2307 + if (!bdi_cap_stable_pages_required(bdi))
  2308 + return;
  2309 +
  2310 + wait_on_page_writeback(page);
  2311 +}
  2312 +EXPORT_SYMBOL_GPL(wait_for_stable_page);