mm, thp: fix infinite loop on memcg OOM

Masayoshi Mizuma reported a bug with the hang of an application under the memcg limit. It happens on write-protection fault to huge zero page If we successfully allocate a huge page to replace zero page but hit the memcg limit we need to split the zero page with split_huge_page_pmd() and fallback to small pages. The other part of the problem is that VM_FAULT_OOM has special meaning in do_huge_pmd_wp_page() context. __handle_mm_fault() expects the page to be split if it sees VM_FAULT_OOM and it will will retry page fault handling. This causes an infinite loop if the page was not split. do_huge_pmd_wp_zero_page_fallback() can return VM_FAULT_OOM if it failed to allocate one small page, so fallback to small pages will not help. The solution for this part is to replace VM_FAULT_OOM with VM_FAULT_FALLBACK is fallback required. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, thp: fix infinite loop on memcg OOM
Masayoshi Mizuma reported a bug with the hang of an application under the memcg limit. It happens on write-protection fault to huge zero page If we successfully allocate a huge page to replace zero page but hit the memcg limit we need to split the zero page with split_huge_page_pmd() and fallback to small pages. The other part of the problem is that VM_FAULT_OOM has special meaning in do_huge_pmd_wp_page() context. __handle_mm_fault() expects the page to be split if it sees VM_FAULT_OOM and it will will retry page fault handling. This causes an infinite loop if the page was not split. do_huge_pmd_wp_zero_page_fallback() can return VM_FAULT_OOM if it failed to allocate one small page, so fallback to small pages will not help. The solution for this part is to replace VM_FAULT_OOM with VM_FAULT_FALLBACK is fallback required. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov · Linus Torvalds
1 parent 01412886b7
Showing 2 changed files with 9 additions and 14 deletions Side-by-side Diff
mm/huge_memory.c
mm/memory.c
@@ -1166,8 +1166,10 @@
 		} else {
 			ret = do_huge_pmd_wp_page_fallback(mm, vma, address,
 					pmd, orig_pmd, page, haddr);
-			if (ret & VM_FAULT_OOM)
+			if (ret & VM_FAULT_OOM) {
 				split_huge_page(page);
+				ret |= VM_FAULT_FALLBACK;
+			}
 			put_page(page);
 		}
 		count_vm_event(THP_FAULT_FALLBACK);
  
@@ -1179,9 +1181,10 @@
 		if (page) {
 			split_huge_page(page);
 			put_page(page);
-		}
+		} else
+			split_huge_page_pmd(vma, address, pmd);
+		ret |= VM_FAULT_FALLBACK;
 		count_vm_event(THP_FAULT_FALLBACK);
-		ret |= VM_FAULT_OOM;
 		goto out;
 	}
  
@@ -3704,7 +3704,6 @@
 	if (unlikely(is_vm_hugetlb_page(vma)))
 		return hugetlb_fault(mm, vma, address, flags);
  
-retry:
 	pgd = pgd_offset(mm, address);
 	pud = pud_alloc(mm, pgd, address);
 	if (!pud)
  
  
@@ -3742,20 +3741,13 @@
 			if (dirty && !pmd_write(orig_pmd)) {
 				ret = do_huge_pmd_wp_page(mm, vma, address, pmd,
 							  orig_pmd);
-				/*
-				 * If COW results in an oom, the huge pmd will
-				 * have been split, so retry the fault on the
-				 * pte for a smaller charge.
-				 */
-				if (unlikely(ret & VM_FAULT_OOM))
-					goto retry;
-				return ret;
+				if (!(ret & VM_FAULT_FALLBACK))
+					return ret;
 			} else {
 				huge_pmd_set_accessed(mm, vma, address, pmd,
 						      orig_pmd, dirty);
+				return 0;
 			}
-
-			return 0;
 		}
 	}
...	...	@@ -1166,8 +1166,10 @@
1166	1166	} else {
1167	1167	ret = do_huge_pmd_wp_page_fallback(mm, vma, address,
1168	1168	pmd, orig_pmd, page, haddr);
1169		- if (ret & VM_FAULT_OOM)
	1169	+ if (ret & VM_FAULT_OOM) {
1170	1170	split_huge_page(page);
	1171	+ ret \|= VM_FAULT_FALLBACK;
	1172	+ }
1171	1173	put_page(page);
1172	1174	}
1173	1175	count_vm_event(THP_FAULT_FALLBACK);
1174	1176
...	...	@@ -1179,9 +1181,10 @@
1179	1181	if (page) {
1180	1182	split_huge_page(page);
1181	1183	put_page(page);
1182		- }
	1184	+ } else
	1185	+ split_huge_page_pmd(vma, address, pmd);
	1186	+ ret \|= VM_FAULT_FALLBACK;
1183	1187	count_vm_event(THP_FAULT_FALLBACK);
1184		- ret \|= VM_FAULT_OOM;
1185	1188	goto out;
1186	1189	}
1187	1190
...	...	@@ -3704,7 +3704,6 @@
3704	3704	if (unlikely(is_vm_hugetlb_page(vma)))
3705	3705	return hugetlb_fault(mm, vma, address, flags);
3706	3706
3707		-retry:
3708	3707	pgd = pgd_offset(mm, address);
3709	3708	pud = pud_alloc(mm, pgd, address);
3710	3709	if (!pud)
3711	3710
3712	3711
...	...	@@ -3742,20 +3741,13 @@
3742	3741	if (dirty && !pmd_write(orig_pmd)) {
3743	3742	ret = do_huge_pmd_wp_page(mm, vma, address, pmd,
3744	3743	orig_pmd);
3745		- /*
3746		- * If COW results in an oom, the huge pmd will
3747		- * have been split, so retry the fault on the
3748		- * pte for a smaller charge.
3749		- */
3750		- if (unlikely(ret & VM_FAULT_OOM))
3751		- goto retry;
3752		- return ret;
	3744	+ if (!(ret & VM_FAULT_FALLBACK))
	3745	+ return ret;
3753	3746	} else {
3754	3747	huge_pmd_set_accessed(mm, vma, address, pmd,
3755	3748	orig_pmd, dirty);
	3749	+ return 0;
3756	3750	}
3757		-
3758		- return 0;
3759	3751	}
3760	3752	}
3761	3753