Commit 7512102cf64d36e3c7444480273623c7aab3563f

Authored by Hugh Dickins
Committed by Linus Torvalds
1 parent 9f78ff005a

memcg: fix GPF when cgroup removal races with last exit

When moving tasks from old memcg (with move_charge_at_immigrate on new
memcg), followed by removal of old memcg, hit General Protection Fault in
mem_cgroup_lru_del_list() (called from release_pages called from
free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
exit_mmap from mmput from exit_mm from do_exit).

Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
still trying to update its stats, and take page off lru before freeing.

A task, or a charge, or a page on lru: each secures a memcg against
removal.  In this case, the last task has been moved out of the old memcg,
and it is exiting: anonymous pages are uncharged one by one from the
memcg, as they are zapped from its pagetables, so the charge gets down to
0; but the pages themselves are queued in an mmu_gather for freeing.

Most of those pages will be on lru (and force_empty is careful to
lru_add_drain_all, to add pages from pagevec to lru first), but not
necessarily all: perhaps some have been isolated for page reclaim, perhaps
some isolated for other reasons.  So, force_empty may find no task, no
charge and no page on lru, and let the removal proceed.

There would still be no problem if these pages were immediately freed; but
typically (and the put_page_testzero protocol demands it) they have to be
added back to lru before they are found freeable, then removed from lru
and freed.  We don't see the issue when adding, because the
mem_cgroup_iter() loops keep their own reference to the memcg being
scanned; but when it comes to mem_cgroup_lru_del_list().

I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
38c5d72f3ebe ("memcg: simplify LRU handling by new rule") mercifully
removed those convolutions, but left this General Protection Fault.

But it's surprisingly easy to restore the old behaviour: just check
PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
to add), and reset pc to root_mem_cgroup if page is uncharged.  A risky
change?  just going back to how it worked before; testing, and an audit of
uses of pc->mem_cgroup, show no problem.

And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
longer has any purpose, and we can safely revert 4e5f01c2b9b9 ("memcg:
clear pc->mem_cgroup if necessary").

Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
is not strictly necessary: the lru_lock there, with RCU before memcg
structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
without that; but it seems cleaner to rely on one dependency less.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 6 changed files with 18 additions and 48 deletions Side-by-side Diff

include/linux/memcontrol.h
... ... @@ -129,7 +129,6 @@
129 129 extern void mem_cgroup_replace_page_cache(struct page *oldpage,
130 130 struct page *newpage);
131 131  
132   -extern void mem_cgroup_reset_owner(struct page *page);
133 132 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
134 133 extern int do_swap_account;
135 134 #endif
... ... @@ -390,10 +389,6 @@
390 389 }
391 390 static inline void mem_cgroup_replace_page_cache(struct page *oldpage,
392 391 struct page *newpage)
393   -{
394   -}
395   -
396   -static inline void mem_cgroup_reset_owner(struct page *page)
397 392 {
398 393 }
399 394 #endif /* CONFIG_CGROUP_MEM_CONT */
... ... @@ -28,7 +28,6 @@
28 28 #include <linux/kthread.h>
29 29 #include <linux/wait.h>
30 30 #include <linux/slab.h>
31   -#include <linux/memcontrol.h>
32 31 #include <linux/rbtree.h>
33 32 #include <linux/memory.h>
34 33 #include <linux/mmu_notifier.h>
... ... @@ -1572,16 +1571,6 @@
1572 1571  
1573 1572 new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
1574 1573 if (new_page) {
1575   - /*
1576   - * The memcg-specific accounting when moving
1577   - * pages around the LRU lists relies on the
1578   - * page's owner (memcg) to be valid. Usually,
1579   - * pages are assigned to a new owner before
1580   - * being put on the LRU list, but since this
1581   - * is not the case here, the stale owner from
1582   - * a previous allocation cycle must be reset.
1583   - */
1584   - mem_cgroup_reset_owner(new_page);
1585 1574 copy_user_highpage(new_page, page, address, vma);
1586 1575  
1587 1576 SetPageDirty(new_page);
... ... @@ -1042,6 +1042,19 @@
1042 1042  
1043 1043 pc = lookup_page_cgroup(page);
1044 1044 memcg = pc->mem_cgroup;
  1045 +
  1046 + /*
  1047 + * Surreptitiously switch any uncharged page to root:
  1048 + * an uncharged page off lru does nothing to secure
  1049 + * its former mem_cgroup from sudden removal.
  1050 + *
  1051 + * Our caller holds lru_lock, and PageCgroupUsed is updated
  1052 + * under page_cgroup lock: between them, they make all uses
  1053 + * of pc->mem_cgroup safe.
  1054 + */
  1055 + if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup)
  1056 + pc->mem_cgroup = memcg = root_mem_cgroup;
  1057 +
1045 1058 mz = page_cgroup_zoneinfo(memcg, page);
1046 1059 /* compound_order() is stabilized through lru_lock */
1047 1060 MEM_CGROUP_ZSTAT(mz, lru) += 1 << compound_order(page);
... ... @@ -3027,23 +3040,6 @@
3027 3040 memcg_oom_recover(batch->memcg);
3028 3041 /* forget this pointer (for sanity check) */
3029 3042 batch->memcg = NULL;
3030   -}
3031   -
3032   -/*
3033   - * A function for resetting pc->mem_cgroup for newly allocated pages.
3034   - * This function should be called if the newpage will be added to LRU
3035   - * before start accounting.
3036   - */
3037   -void mem_cgroup_reset_owner(struct page *newpage)
3038   -{
3039   - struct page_cgroup *pc;
3040   -
3041   - if (mem_cgroup_disabled())
3042   - return;
3043   -
3044   - pc = lookup_page_cgroup(newpage);
3045   - VM_BUG_ON(PageCgroupUsed(pc));
3046   - pc->mem_cgroup = root_mem_cgroup;
3047 3043 }
3048 3044  
3049 3045 #ifdef CONFIG_SWAP
... ... @@ -839,8 +839,6 @@
839 839 if (!newpage)
840 840 return -ENOMEM;
841 841  
842   - mem_cgroup_reset_owner(newpage);
843   -
844 842 if (page_count(page) == 1) {
845 843 /* page was freed from under us. So we are done. */
846 844 goto out;
... ... @@ -652,7 +652,7 @@
652 652 void lru_add_page_tail(struct zone* zone,
653 653 struct page *page, struct page *page_tail)
654 654 {
655   - int active;
  655 + int uninitialized_var(active);
656 656 enum lru_list lru;
657 657 const int file = 0;
658 658  
... ... @@ -672,7 +672,6 @@
672 672 active = 0;
673 673 lru = LRU_INACTIVE_ANON;
674 674 }
675   - update_page_reclaim_stat(zone, page_tail, file, active);
676 675 } else {
677 676 SetPageUnevictable(page_tail);
678 677 lru = LRU_UNEVICTABLE;
... ... @@ -693,6 +692,9 @@
693 692 list_head = page_tail->lru.prev;
694 693 list_move_tail(&page_tail->lru, list_head);
695 694 }
  695 +
  696 + if (!PageUnevictable(page))
  697 + update_page_reclaim_stat(zone, page_tail, file, active);
696 698 }
697 699 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
698 700  
699 701  
... ... @@ -710,8 +712,8 @@
710 712 SetPageLRU(page);
711 713 if (active)
712 714 SetPageActive(page);
713   - update_page_reclaim_stat(zone, page, file, active);
714 715 add_page_to_lru_list(zone, page, lru);
  716 + update_page_reclaim_stat(zone, page, file, active);
715 717 }
716 718  
717 719 /*
... ... @@ -300,16 +300,6 @@
300 300 new_page = alloc_page_vma(gfp_mask, vma, addr);
301 301 if (!new_page)
302 302 break; /* Out of memory */
303   - /*
304   - * The memcg-specific accounting when moving
305   - * pages around the LRU lists relies on the
306   - * page's owner (memcg) to be valid. Usually,
307   - * pages are assigned to a new owner before
308   - * being put on the LRU list, but since this
309   - * is not the case here, the stale owner from
310   - * a previous allocation cycle must be reset.
311   - */
312   - mem_cgroup_reset_owner(new_page);
313 303 }
314 304  
315 305 /*