Commit 033193275b3ffcfe7f3fde7b569f3d207f6cd6a0

Authored by Dave Hansen
Committed by Linus Torvalds
1 parent 278df9f451

pagewalk: only split huge pages when necessary

Right now, if a mm_walk has either ->pte_entry or ->pmd_entry set, it will
unconditionally split any transparent huge pages it runs in to.  In
practice, that means that anyone doing a

	cat /proc/$pid/smaps

will unconditionally break down every huge page in the process and depend
on khugepaged to re-collapse it later.  This is fairly suboptimal.

This patch changes that behavior.  It teaches each ->pmd_entry handler
(there are five) that they must break down the THPs themselves.  Also, the
_generic_ code will never break down a THP unless a ->pte_entry handler is
actually set.

This means that the ->pmd_entry handlers can now choose to deal with THPs
without breaking them down.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Eric B Munson <emunson@mgebm.net>
Tested-by: Eric B Munson <emunson@mgebm.net>
Cc: Michael J Wolf <mjwolf@us.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 4 changed files with 32 additions and 6 deletions Side-by-side Diff

... ... @@ -343,6 +343,8 @@
343 343 struct page *page;
344 344 int mapcount;
345 345  
  346 + split_huge_page_pmd(walk->mm, pmd);
  347 +
346 348 pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
347 349 for (; addr != end; pte++, addr += PAGE_SIZE) {
348 350 ptent = *pte;
... ... @@ -467,6 +469,8 @@
467 469 spinlock_t *ptl;
468 470 struct page *page;
469 471  
  472 + split_huge_page_pmd(walk->mm, pmd);
  473 +
470 474 pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
471 475 for (; addr != end; pte++, addr += PAGE_SIZE) {
472 476 ptent = *pte;
... ... @@ -622,6 +626,8 @@
622 626 struct pagemapread *pm = walk->private;
623 627 pte_t *pte;
624 628 int err = 0;
  629 +
  630 + split_huge_page_pmd(walk->mm, pmd);
625 631  
626 632 /* find the first VMA at or above 'addr' */
627 633 vma = find_vma(walk->mm, addr);
... ... @@ -914,6 +914,9 @@
914 914 * @pgd_entry: if set, called for each non-empty PGD (top-level) entry
915 915 * @pud_entry: if set, called for each non-empty PUD (2nd-level) entry
916 916 * @pmd_entry: if set, called for each non-empty PMD (3rd-level) entry
  917 + * this handler is required to be able to handle
  918 + * pmd_trans_huge() pmds. They may simply choose to
  919 + * split_huge_page() instead of handling it explicitly.
917 920 * @pte_entry: if set, called for each non-empty PTE (4th-level) entry
918 921 * @pte_hole: if set, called for each hole at all levels
919 922 * @hugetlb_entry: if set, called for each hugetlb entry
... ... @@ -4763,7 +4763,8 @@
4763 4763 pte_t *pte;
4764 4764 spinlock_t *ptl;
4765 4765  
4766   - VM_BUG_ON(pmd_trans_huge(*pmd));
  4766 + split_huge_page_pmd(walk->mm, pmd);
  4767 +
4767 4768 pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
4768 4769 for (; addr != end; pte++, addr += PAGE_SIZE)
4769 4770 if (is_target_pte_for_mc(vma, addr, *pte, NULL))
4770 4771  
... ... @@ -4925,8 +4926,8 @@
4925 4926 pte_t *pte;
4926 4927 spinlock_t *ptl;
4927 4928  
  4929 + split_huge_page_pmd(walk->mm, pmd);
4928 4930 retry:
4929   - VM_BUG_ON(pmd_trans_huge(*pmd));
4930 4931 pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
4931 4932 for (; addr != end; addr += PAGE_SIZE) {
4932 4933 pte_t ptent = *(pte++);
... ... @@ -33,19 +33,35 @@
33 33  
34 34 pmd = pmd_offset(pud, addr);
35 35 do {
  36 +again:
36 37 next = pmd_addr_end(addr, end);
37   - split_huge_page_pmd(walk->mm, pmd);
38   - if (pmd_none_or_clear_bad(pmd)) {
  38 + if (pmd_none(*pmd)) {
39 39 if (walk->pte_hole)
40 40 err = walk->pte_hole(addr, next, walk);
41 41 if (err)
42 42 break;
43 43 continue;
44 44 }
  45 + /*
  46 + * This implies that each ->pmd_entry() handler
  47 + * needs to know about pmd_trans_huge() pmds
  48 + */
45 49 if (walk->pmd_entry)
46 50 err = walk->pmd_entry(pmd, addr, next, walk);
47   - if (!err && walk->pte_entry)
48   - err = walk_pte_range(pmd, addr, next, walk);
  51 + if (err)
  52 + break;
  53 +
  54 + /*
  55 + * Check this here so we only break down trans_huge
  56 + * pages when we _need_ to
  57 + */
  58 + if (!walk->pte_entry)
  59 + continue;
  60 +
  61 + split_huge_page_pmd(walk->mm, pmd);
  62 + if (pmd_none_or_clear_bad(pmd))
  63 + goto again;
  64 + err = walk_pte_range(pmd, addr, next, walk);
49 65 if (err)
50 66 break;
51 67 } while (pmd++, addr = next, addr != end);