Commit e3db7691e9f3dff3289f64e3d98583e28afe03db

Authored by Trond Myklebust
Committed by Linus Torvalds
1 parent 07031e14c1

[PATCH] NFS: Fix race in nfs_release_page()

NFS: Fix race in nfs_release_page()

    invalidate_inode_pages2() may find the dirty bit has been set on a page
    owing to the fact that the page may still be mapped after it was locked.
    Only after the call to unmap_mapping_range() are we sure that the page
    can no longer be dirtied.
    In order to fix this, NFS has hooked the releasepage() method and tries
    to write the page out between the call to unmap_mapping_range() and the
    call to remove_mapping(). This, however leads to deadlocks in the page
    reclaim code, where the page may be locked without holding a reference
    to the inode or dentry.

    Fix is to add a new address_space_operation, launder_page(), which will
    attempt to write out a dirty page without releasing the page lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

    Also, the bare SetPageDirty() can skew all sort of accounting leading to
    other nasties.

[akpm@osdl.org: cleanup]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Showing 4 changed files with 28 additions and 9 deletions Side-by-side Diff

Documentation/filesystems/Locking
... ... @@ -171,6 +171,7 @@
171 171 int (*releasepage) (struct page *, int);
172 172 int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
173 173 loff_t offset, unsigned long nr_segs);
  174 + int (*launder_page) (struct page *);
174 175  
175 176 locking rules:
176 177 All except set_page_dirty may block
... ... @@ -188,6 +189,7 @@
188 189 invalidatepage: no yes
189 190 releasepage: no yes
190 191 direct_IO: no
  192 +launder_page: no yes
191 193  
192 194 ->prepare_write(), ->commit_write(), ->sync_page() and ->readpage()
193 195 may be called from the request handler (/dev/loop).
... ... @@ -280,6 +282,12 @@
280 282 buffers from the page in preparation for freeing it. It returns zero to
281 283 indicate that the buffers are (or may be) freeable. If ->releasepage is zero,
282 284 the kernel assumes that the fs has no private interest in the buffers.
  285 +
  286 + ->launder_page() may be called prior to releasing a page if
  287 +it is still found to be dirty. It returns zero if the page was successfully
  288 +cleaned, or an error value if not. Note that in order to prevent the page
  289 +getting mapped back in and redirtied, it needs to be kept locked
  290 +across the entire operation.
283 291  
284 292 Note: currently almost all instances of address_space methods are
285 293 using BKL for internal serialization and that's one of the worst sources
... ... @@ -315,16 +315,15 @@
315 315  
316 316 static int nfs_release_page(struct page *page, gfp_t gfp)
317 317 {
318   - /*
319   - * Avoid deadlock on nfs_wait_on_request().
320   - */
321   - if (!(gfp & __GFP_FS))
322   - return 0;
323   - /* Hack... Force nfs_wb_page() to write out the page */
324   - SetPageDirty(page);
325   - return !nfs_wb_page(page->mapping->host, page);
  318 + /* If PagePrivate() is set, then the page is not freeable */
  319 + return 0;
326 320 }
327 321  
  322 +static int nfs_launder_page(struct page *page)
  323 +{
  324 + return nfs_wb_page(page->mapping->host, page);
  325 +}
  326 +
328 327 const struct address_space_operations nfs_file_aops = {
329 328 .readpage = nfs_readpage,
330 329 .readpages = nfs_readpages,
... ... @@ -338,6 +337,7 @@
338 337 #ifdef CONFIG_NFS_DIRECTIO
339 338 .direct_IO = nfs_direct_IO,
340 339 #endif
  340 + .launder_page = nfs_launder_page,
341 341 };
342 342  
343 343 static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov,
... ... @@ -426,6 +426,7 @@
426 426 /* migrate the contents of a page to the specified target */
427 427 int (*migratepage) (struct address_space *,
428 428 struct page *, struct page *);
  429 + int (*launder_page) (struct page *);
429 430 };
430 431  
431 432 struct backing_dev_info;
... ... @@ -341,6 +341,15 @@
341 341 return 0;
342 342 }
343 343  
  344 +static int do_launder_page(struct address_space *mapping, struct page *page)
  345 +{
  346 + if (!PageDirty(page))
  347 + return 0;
  348 + if (page->mapping != mapping || mapping->a_ops->launder_page == NULL)
  349 + return 0;
  350 + return mapping->a_ops->launder_page(page);
  351 +}
  352 +
344 353 /**
345 354 * invalidate_inode_pages2_range - remove range of pages from an address_space
346 355 * @mapping: the address_space
... ... @@ -405,7 +414,8 @@
405 414 PAGE_CACHE_SIZE, 0);
406 415 }
407 416 }
408   - if (!invalidate_complete_page2(mapping, page))
  417 + ret = do_launder_page(mapping, page);
  418 + if (ret == 0 && !invalidate_complete_page2(mapping, page))
409 419 ret = -EIO;
410 420 unlock_page(page);
411 421 }