Commit 479db0bf408e65baa14d2a9821abfcbc0804b847

Authored by Nick Piggin
Committed by Linus Torvalds
1 parent 2d70b68d42

mm: dirty page tracking race fix

There is a race with dirty page accounting where a page may not properly
be accounted for.

clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty.

page_mkclean walks the rmaps for that page, and for each one it cleans and
write protects the pte if it was dirty.  It uses page_check_address to
find the pte.  That function has a shortcut to avoid the ptl if the pte is
not present.  Unfortunately, the pte can be switched to not-present then
back to present by other code while holding the page table lock -- this
should not be a signal for page_mkclean to ignore that pte, because it may
be dirty.

For example, powerpc64's set_pte_at will clear a previously present pte
before setting it to the desired value.  There may also be other code in
core mm or in arch which do similar things.

The consequence of the bug is loss of data integrity due to msync, and
loss of dirty page accounting accuracy.  XIP's __xip_unmap could easily
also be unreliable (depending on the exact XIP locking scheme), which can
lead to data corruption.

Fix this by having an option to always take ptl to check the pte in
page_check_address.

It's possible to retain this optimization for page_referenced and
try_to_unmap.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Carsten Otte <cotte@freenet.de>
Cc: Hugh Dickins <hugh@veritas.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 3 changed files with 11 additions and 7 deletions Side-by-side Diff

include/linux/rmap.h
... ... @@ -102,7 +102,7 @@
102 102 * Called from mm/filemap_xip.c to unmap empty zero page
103 103 */
104 104 pte_t *page_check_address(struct page *, struct mm_struct *,
105   - unsigned long, spinlock_t **);
  105 + unsigned long, spinlock_t **, int);
106 106  
107 107 /*
108 108 * Used by swapoff to help locate where page is expected in vma.
... ... @@ -185,7 +185,7 @@
185 185 address = vma->vm_start +
186 186 ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
187 187 BUG_ON(address < vma->vm_start || address >= vma->vm_end);
188   - pte = page_check_address(page, mm, address, &ptl);
  188 + pte = page_check_address(page, mm, address, &ptl, 1);
189 189 if (pte) {
190 190 /* Nuke the page table entry. */
191 191 flush_cache_page(vma, address, pte_pfn(*pte));
... ... @@ -224,10 +224,14 @@
224 224 /*
225 225 * Check that @page is mapped at @address into @mm.
226 226 *
  227 + * If @sync is false, page_check_address may perform a racy check to avoid
  228 + * the page table lock when the pte is not present (helpful when reclaiming
  229 + * highly shared pages).
  230 + *
227 231 * On success returns with pte mapped and locked.
228 232 */
229 233 pte_t *page_check_address(struct page *page, struct mm_struct *mm,
230   - unsigned long address, spinlock_t **ptlp)
  234 + unsigned long address, spinlock_t **ptlp, int sync)
231 235 {
232 236 pgd_t *pgd;
233 237 pud_t *pud;
... ... @@ -249,7 +253,7 @@
249 253  
250 254 pte = pte_offset_map(pmd, address);
251 255 /* Make a quick check before getting the lock */
252   - if (!pte_present(*pte)) {
  256 + if (!sync && !pte_present(*pte)) {
253 257 pte_unmap(pte);
254 258 return NULL;
255 259 }
... ... @@ -281,7 +285,7 @@
281 285 if (address == -EFAULT)
282 286 goto out;
283 287  
284   - pte = page_check_address(page, mm, address, &ptl);
  288 + pte = page_check_address(page, mm, address, &ptl, 0);
285 289 if (!pte)
286 290 goto out;
287 291  
... ... @@ -450,7 +454,7 @@
450 454 if (address == -EFAULT)
451 455 goto out;
452 456  
453   - pte = page_check_address(page, mm, address, &ptl);
  457 + pte = page_check_address(page, mm, address, &ptl, 1);
454 458 if (!pte)
455 459 goto out;
456 460  
... ... @@ -704,7 +708,7 @@
704 708 if (address == -EFAULT)
705 709 goto out;
706 710  
707   - pte = page_check_address(page, mm, address, &ptl);
  711 + pte = page_check_address(page, mm, address, &ptl, 0);
708 712 if (!pte)
709 713 goto out;
710 714