memcg: fix mis-accounting of file mapped racy with migration

FILE_MAPPED per memcg of migrated file cache is not properly updated, because our hook in page_add_file_rmap() can't know to which memcg FILE_MAPPED should be counted. Basically, this patch is for fixing the bug but includes some big changes to fix up other messes. Now, at migrating mapped file, events happen in following sequence. 1. allocate a new page. 2. get memcg of an old page. 3. charge ageinst a new page before migration. But at this point, no changes to new page's page_cgroup, no commit for the charge. (IOW, PCG_USED bit is not set.) 4. page migration replaces radix-tree, old-page and new-page. 5. page migration remaps the new page if the old page was mapped. 6. Here, the new page is unlocked. 7. memcg commits the charge for newpage, Mark the new page's page_cgroup as PCG_USED. Because "commit" happens after page-remap, we can count FILE_MAPPED at "5", because we should avoid to trust page_cgroup->mem_cgroup. if PCG_USED bit is unset. (Note: memcg's LRU removal code does that but LRU-isolation logic is used for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is not on LRU or page_cgroup->mem_cgroup is NULL.) We can lose file_mapped accounting information at 5 because FILE_MAPPED is updated only when mapcount changes 0->1. So we should catch it. BTW, historically, above implemntation comes from migration-failure of anonymous page. Because we charge both of old page and new page with mapcount=0, we can't catch - the page is really freed before remap. - migration fails but it's freed before remap or .....corner cases. New migration sequence with memcg is: 1. allocate a new page. 2. mark PageCgroupMigration to the old page. 3. charge against a new page onto the old page's memcg. (here, new page's pc is marked as PageCgroupUsed.) 4. page migration replaces radix-tree, page table, etc... 5. At remapping, new page's page_cgroup is now makrked as "USED" We can catch 0->1 event and FILE_MAPPED will be properly updated. And we can catch SWAPOUT event after unlock this and freeing this page by unmap() can be caught. 7. Clear PageCgroupMigration of the old page. So, FILE_MAPPED will be correctly updated. Then, for what MIGRATION flag is ? Without it, at migration failure, we may have to charge old page again because it may be fully unmapped. "charge" means that we have to dive into memory reclaim or something complated. So, it's better to avoid charge it again. Before this patch, __commit_charge() was working for both of the old/new page and fixed up all. But this technique has some racy condtion around FILE_MAPPED and SWAPOUT etc... Now, the kernel use MIGRATION flag and don't uncharge old page until the end of migration. I hope this change will make memcg's page migration much simpler. This page migration has caused several troubles. Worth to add a flag for simplification. Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: fix mis-accounting of file mapped racy with migration
FILE_MAPPED per memcg of migrated file cache is not properly updated, because our hook in page_add_file_rmap() can't know to which memcg FILE_MAPPED should be counted. Basically, this patch is for fixing the bug but includes some big changes to fix up other messes. Now, at migrating mapped file, events happen in following sequence. 1. allocate a new page. 2. get memcg of an old page. 3. charge ageinst a new page before migration. But at this point, no changes to new page's page_cgroup, no commit for the charge. (IOW, PCG_USED bit is not set.) 4. page migration replaces radix-tree, old-page and new-page. 5. page migration remaps the new page if the old page was mapped. 6. Here, the new page is unlocked. 7. memcg commits the charge for newpage, Mark the new page's page_cgroup as PCG_USED. Because "commit" happens after page-remap, we can count FILE_MAPPED at "5", because we should avoid to trust page_cgroup->mem_cgroup. if PCG_USED bit is unset. (Note: memcg's LRU removal code does that but LRU-isolation logic is used for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is not on LRU or page_cgroup->mem_cgroup is NULL.) We can lose file_mapped accounting information at 5 because FILE_MAPPED is updated only when mapcount changes 0->1. So we should catch it. BTW, historically, above implemntation comes from migration-failure of anonymous page. Because we charge both of old page and new page with mapcount=0, we can't catch - the page is really freed before remap. - migration fails but it's freed before remap or .....corner cases. New migration sequence with memcg is: 1. allocate a new page. 2. mark PageCgroupMigration to the old page. 3. charge against a new page onto the old page's memcg. (here, new page's pc is marked as PageCgroupUsed.) 4. page migration replaces radix-tree, page table, etc... 5. At remapping, new page's page_cgroup is now makrked as "USED" We can catch 0->1 event and FILE_MAPPED will be properly updated. And we can catch SWAPOUT event after unlock this and freeing this page by unmap() can be caught. 7. Clear PageCgroupMigration of the old page. So, FILE_MAPPED will be correctly updated. Then, for what MIGRATION flag is ? Without it, at migration failure, we may have to charge old page again because it may be fully unmapped. "charge" means that we have to dive into memory reclaim or something complated. So, it's better to avoid charge it again. Before this patch, __commit_charge() was working for both of the old/new page and fixed up all. But this technique has some racy condtion around FILE_MAPPED and SWAPOUT etc... Now, the kernel use MIGRATION flag and don't uncharge old page until the end of migration. I hope this change will make memcg's page migration much simpler. This page migration has caused several troubles. Worth to add a flag for simplification. Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
akpm@linux-foundation.org · Linus Torvalds
1 parent 315c1998e1
Showing 4 changed files with 107 additions and 41 deletions Side-by-side Diff
include/linux/memcontrol.h
include/linux/page_cgroup.h
mm/memcontrol.c
mm/migrate.c
@@ -90,7 +90,8 @@
 extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *mem);
  
 extern int
-mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr);
+mem_cgroup_prepare_migration(struct page *page,
+	struct page *newpage, struct mem_cgroup **ptr);
 extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
 	struct page *oldpage, struct page *newpage);
  
@@ -227,7 +228,8 @@
 }
  
 static inline int
-mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
+mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
+	struct mem_cgroup **ptr)
 {
 	return 0;
 }
@@ -40,6 +40,7 @@
 	PCG_USED, /* this object is in use. */
 	PCG_ACCT_LRU, /* page has been accounted for */
 	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
+	PCG_MIGRATION, /* under page migration */
 };
  
 #define TESTPCGFLAG(uname, lname)			\
@@ -78,6 +79,10 @@
 SETPCGFLAG(FileMapped, FILE_MAPPED)
 CLEARPCGFLAG(FileMapped, FILE_MAPPED)
 TESTPCGFLAG(FileMapped, FILE_MAPPED)
+
+SETPCGFLAG(Migration, MIGRATION)
+CLEARPCGFLAG(Migration, MIGRATION)
+TESTPCGFLAG(Migration, MIGRATION)
  
 static inline int page_cgroup_nid(struct page_cgroup *pc)
 {
@@ -2258,7 +2258,8 @@
 	switch (ctype) {
 	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
 	case MEM_CGROUP_CHARGE_TYPE_DROP:
-		if (page_mapped(page))
+		/* See mem_cgroup_prepare_migration() */
+		if (page_mapped(page) || PageCgroupMigration(pc))
 			goto unlock_out;
 		break;
 	case MEM_CGROUP_CHARGE_TYPE_SWAPOUT:
  
@@ -2481,10 +2482,12 @@
  * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
  * page belongs to.
  */
-int mem_cgroup_prepare_migration(struct page *page, struct mem_cgroup **ptr)
+int mem_cgroup_prepare_migration(struct page *page,
+	struct page *newpage, struct mem_cgroup **ptr)
 {
 	struct page_cgroup *pc;
 	struct mem_cgroup *mem = NULL;
+	enum charge_type ctype;
 	int ret = 0;
  
 	if (mem_cgroup_disabled())
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
@@ -2495,69 +2498,125 @@
 	if (PageCgroupUsed(pc)) {
 		mem = pc->mem_cgroup;
 		css_get(&mem->css);
+		/*
+		 * At migrating an anonymous page, its mapcount goes down
+		 * to 0 and uncharge() will be called. But, even if it's fully
+		 * unmapped, migration may fail and this page has to be
+		 * charged again. We set MIGRATION flag here and delay uncharge
+		 * until end_migration() is called
+		 *
+		 * Corner Case Thinking
+		 * A)
+		 * When the old page was mapped as Anon and it's unmap-and-freed
+		 * while migration was ongoing.
+		 * If unmap finds the old page, uncharge() of it will be delayed
+		 * until end_migration(). If unmap finds a new page, it's
+		 * uncharged when it make mapcount to be 1->0. If unmap code
+		 * finds swap_migration_entry, the new page will not be mapped
+		 * and end_migration() will find it(mapcount==0).
+		 *
+		 * B)
+		 * When the old page was mapped but migraion fails, the kernel
+		 * remaps it. A charge for it is kept by MIGRATION flag even
+		 * if mapcount goes down to 0. We can do remap successfully
+		 * without charging it again.
+		 *
+		 * C)
+		 * The "old" page is under lock_page() until the end of
+		 * migration, so, the old page itself will not be swapped-out.
+		 * If the new page is swapped out before end_migraton, our
+		 * hook to usual swap-out path will catch the event.
+		 */
+		if (PageAnon(page))
+			SetPageCgroupMigration(pc);
 	}
 	unlock_page_cgroup(pc);
+	/*
+	 * If the page is not charged at this point,
+	 * we return here.
+	 */
+	if (!mem)
+		return 0;
  
 	*ptr = mem;
-	if (mem) {
-		ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
-		css_put(&mem->css);
+	ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
+	css_put(&mem->css);/* drop extra refcnt */
+	if (ret || *ptr == NULL) {
+		if (PageAnon(page)) {
+			lock_page_cgroup(pc);
+			ClearPageCgroupMigration(pc);
+			unlock_page_cgroup(pc);
+			/*
+			 * The old page may be fully unmapped while we kept it.
+			 */
+			mem_cgroup_uncharge_page(page);
+		}
+		return -ENOMEM;
 	}
+	/*
+	 * We charge new page before it's used/mapped. So, even if unlock_page()
+	 * is called before end_migration, we can catch all events on this new
+	 * page. In the case new page is migrated but not remapped, new page's
+	 * mapcount will be finally 0 and we call uncharge in end_migration().
+	 */
+	pc = lookup_page_cgroup(newpage);
+	if (PageAnon(page))
+		ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
+	else if (page_is_file_cache(page))
+		ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
+	else
+		ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
+	__mem_cgroup_commit_charge(mem, pc, ctype);
 	return ret;
 }
  
 /* remove redundant charge if migration failed*/
 void mem_cgroup_end_migration(struct mem_cgroup *mem,
-		struct page *oldpage, struct page *newpage)
+	struct page *oldpage, struct page *newpage)
 {
-	struct page *target, *unused;
+	struct page *used, *unused;
 	struct page_cgroup *pc;
-	enum charge_type ctype;
  
 	if (!mem)
 		return;
+	/* blocks rmdir() */
 	cgroup_exclude_rmdir(&mem->css);
 	/* at migration success, oldpage->mapping is NULL. */
 	if (oldpage->mapping) {
-		target = oldpage;
-		unused = NULL;
+		used = oldpage;
+		unused = newpage;
 	} else {
-		target = newpage;
+		used = newpage;
 		unused = oldpage;
 	}
-
-	if (PageAnon(target))
-		ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
-	else if (page_is_file_cache(target))
-		ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
-	else
-		ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
-
-	/* unused page is not on radix-tree now. */
-	if (unused)
-		__mem_cgroup_uncharge_common(unused, ctype);
-
-	pc = lookup_page_cgroup(target);
 	/*
-	 * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup.
-	 * So, double-counting is effectively avoided.
+	 * We disallowed uncharge of pages under migration because mapcount
+	 * of the page goes down to zero, temporarly.
+	 * Clear the flag and check the page should be charged.
 	 */
-	__mem_cgroup_commit_charge(mem, pc, ctype);
+	pc = lookup_page_cgroup(oldpage);
+	lock_page_cgroup(pc);
+	ClearPageCgroupMigration(pc);
+	unlock_page_cgroup(pc);
  
+	if (unused != oldpage)
+		pc = lookup_page_cgroup(unused);
+	__mem_cgroup_uncharge_common(unused, MEM_CGROUP_CHARGE_TYPE_FORCE);
+
+	pc = lookup_page_cgroup(used);
 	/*
-	 * Both of oldpage and newpage are still under lock_page().
-	 * Then, we don't have to care about race in radix-tree.
-	 * But we have to be careful that this page is unmapped or not.
-	 *
-	 * There is a case for !page_mapped(). At the start of
-	 * migration, oldpage was mapped. But now, it's zapped.
-	 * But we know *target* page is not freed/reused under us.
-	 * mem_cgroup_uncharge_page() does all necessary checks.
+	 * If a page is a file cache, radix-tree replacement is very atomic
+	 * and we can skip this check. When it was an Anon page, its mapcount
+	 * goes down to 0. But because we added MIGRATION flage, it's not
+	 * uncharged yet. There are several case but page->mapcount check
+	 * and USED bit check in mem_cgroup_uncharge_page() will do enough
+	 * check. (see prepare_charge() also)
 	 */
-	if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
-		mem_cgroup_uncharge_page(target);
+	if (PageAnon(used))
+		mem_cgroup_uncharge_page(used);
 	/*
-	 * At migration, we may charge account against cgroup which has no tasks
+	 * At migration, we may charge account against cgroup which has no
+	 * tasks.
 	 * So, rmdir()->pre_destroy() can be called while we do this charge.
 	 * In that case, we need to call pre_destroy() again. check it here.
 	 */
@@ -590,7 +590,7 @@
 	}
  
 	/* charge against new page */
-	charge = mem_cgroup_prepare_migration(page, &mem);
+	charge = mem_cgroup_prepare_migration(page, newpage, &mem);
 	if (charge == -ENOMEM) {
 		rc = -ENOMEM;
 		goto unlock;
...	...	@@ -90,7 +90,8 @@
90	90	extern struct cgroup_subsys_state mem_cgroup_css(struct mem_cgroup mem);
91	91
92	92	extern int
93		-mem_cgroup_prepare_migration(struct page page, struct mem_cgroup *ptr);
	93	+mem_cgroup_prepare_migration(struct page *page,
	94	+ struct page newpage, struct mem_cgroup *ptr);
94	95	extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
95	96	struct page oldpage, struct page newpage);
96	97
...	...	@@ -227,7 +228,8 @@
227	228	}
228	229
229	230	static inline int
230		-mem_cgroup_prepare_migration(struct page page, struct mem_cgroup *ptr)
	231	+mem_cgroup_prepare_migration(struct page page, struct page newpage,
	232	+ struct mem_cgroup **ptr)
231	233	{
232	234	return 0;
233	235	}
...	...	@@ -40,6 +40,7 @@
40	40	PCG_USED, /* this object is in use. */
41	41	PCG_ACCT_LRU, /* page has been accounted for */
42	42	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
	43	+ PCG_MIGRATION, /* under page migration */
43	44	};
44	45
45	46	#define TESTPCGFLAG(uname, lname) \
...	...	@@ -78,6 +79,10 @@
78	79	SETPCGFLAG(FileMapped, FILE_MAPPED)
79	80	CLEARPCGFLAG(FileMapped, FILE_MAPPED)
80	81	TESTPCGFLAG(FileMapped, FILE_MAPPED)
	82	+
	83	+SETPCGFLAG(Migration, MIGRATION)
	84	+CLEARPCGFLAG(Migration, MIGRATION)
	85	+TESTPCGFLAG(Migration, MIGRATION)
81	86
82	87	static inline int page_cgroup_nid(struct page_cgroup *pc)
83	88	{
...	...	@@ -2258,7 +2258,8 @@
2258	2258	switch (ctype) {
2259	2259	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
2260	2260	case MEM_CGROUP_CHARGE_TYPE_DROP:
2261		- if (page_mapped(page))
	2261	+ /* See mem_cgroup_prepare_migration() */
	2262	+ if (page_mapped(page) \|\| PageCgroupMigration(pc))
2262	2263	goto unlock_out;
2263	2264	break;
2264	2265	case MEM_CGROUP_CHARGE_TYPE_SWAPOUT:
2265	2266
...	...	@@ -2481,10 +2482,12 @@
2481	2482	* Before starting migration, account PAGE_SIZE to mem_cgroup that the old
2482	2483	* page belongs to.
2483	2484	*/
2484		-int mem_cgroup_prepare_migration(struct page page, struct mem_cgroup *ptr)
	2485	+int mem_cgroup_prepare_migration(struct page *page,
	2486	+ struct page newpage, struct mem_cgroup *ptr)
2485	2487	{
2486	2488	struct page_cgroup *pc;
2487	2489	struct mem_cgroup *mem = NULL;
	2490	+ enum charge_type ctype;
2488	2491	int ret = 0;
2489	2492
2490	2493	if (mem_cgroup_disabled())
2491	2494
2492	2495
2493	2496
2494	2497
2495	2498
2496	2499
2497	2500
2498	2501
2499	2502
2500	2503
2501	2504
2502	2505
2503	2506
2504	2507
2505	2508
2506	2509
...	...	@@ -2495,69 +2498,125 @@
2495	2498	if (PageCgroupUsed(pc)) {
2496	2499	mem = pc->mem_cgroup;
2497	2500	css_get(&mem->css);
	2501	+ /*
	2502	+ * At migrating an anonymous page, its mapcount goes down
	2503	+ * to 0 and uncharge() will be called. But, even if it's fully
	2504	+ * unmapped, migration may fail and this page has to be
	2505	+ * charged again. We set MIGRATION flag here and delay uncharge
	2506	+ * until end_migration() is called
	2507	+ *
	2508	+ * Corner Case Thinking
	2509	+ * A)
	2510	+ * When the old page was mapped as Anon and it's unmap-and-freed
	2511	+ * while migration was ongoing.
	2512	+ * If unmap finds the old page, uncharge() of it will be delayed
	2513	+ * until end_migration(). If unmap finds a new page, it's
	2514	+ * uncharged when it make mapcount to be 1->0. If unmap code
	2515	+ * finds swap_migration_entry, the new page will not be mapped
	2516	+ * and end_migration() will find it(mapcount==0).
	2517	+ *
	2518	+ * B)
	2519	+ * When the old page was mapped but migraion fails, the kernel
	2520	+ * remaps it. A charge for it is kept by MIGRATION flag even
	2521	+ * if mapcount goes down to 0. We can do remap successfully
	2522	+ * without charging it again.
	2523	+ *
	2524	+ * C)
	2525	+ * The "old" page is under lock_page() until the end of
	2526	+ * migration, so, the old page itself will not be swapped-out.
	2527	+ * If the new page is swapped out before end_migraton, our
	2528	+ * hook to usual swap-out path will catch the event.
	2529	+ */
	2530	+ if (PageAnon(page))
	2531	+ SetPageCgroupMigration(pc);
2498	2532	}
2499	2533	unlock_page_cgroup(pc);
	2534	+ /*
	2535	+ * If the page is not charged at this point,
	2536	+ * we return here.
	2537	+ */
	2538	+ if (!mem)
	2539	+ return 0;
2500	2540
2501	2541	*ptr = mem;
2502		- if (mem) {
2503		- ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
2504		- css_put(&mem->css);
	2542	+ ret = __mem_cgroup_try_charge(NULL, GFP_KERNEL, ptr, false);
	2543	+ css_put(&mem->css);/* drop extra refcnt */
	2544	+ if (ret \|\| *ptr == NULL) {
	2545	+ if (PageAnon(page)) {
	2546	+ lock_page_cgroup(pc);
	2547	+ ClearPageCgroupMigration(pc);
	2548	+ unlock_page_cgroup(pc);
	2549	+ /*
	2550	+ * The old page may be fully unmapped while we kept it.
	2551	+ */
	2552	+ mem_cgroup_uncharge_page(page);
	2553	+ }
	2554	+ return -ENOMEM;
2505	2555	}
	2556	+ /*
	2557	+ * We charge new page before it's used/mapped. So, even if unlock_page()
	2558	+ * is called before end_migration, we can catch all events on this new
	2559	+ * page. In the case new page is migrated but not remapped, new page's
	2560	+ * mapcount will be finally 0 and we call uncharge in end_migration().
	2561	+ */
	2562	+ pc = lookup_page_cgroup(newpage);
	2563	+ if (PageAnon(page))
	2564	+ ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
	2565	+ else if (page_is_file_cache(page))
	2566	+ ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
	2567	+ else
	2568	+ ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
	2569	+ __mem_cgroup_commit_charge(mem, pc, ctype);
2506	2570	return ret;
2507	2571	}
2508	2572
2509	2573	/* remove redundant charge if migration failed*/
2510	2574	void mem_cgroup_end_migration(struct mem_cgroup *mem,
2511		- struct page oldpage, struct page newpage)
	2575	+ struct page oldpage, struct page newpage)
2512	2576	{
2513		- struct page target, unused;
	2577	+ struct page used, unused;
2514	2578	struct page_cgroup *pc;
2515		- enum charge_type ctype;
2516	2579
2517	2580	if (!mem)
2518	2581	return;
	2582	+ /* blocks rmdir() */
2519	2583	cgroup_exclude_rmdir(&mem->css);
2520	2584	/* at migration success, oldpage->mapping is NULL. */
2521	2585	if (oldpage->mapping) {
2522		- target = oldpage;
2523		- unused = NULL;
	2586	+ used = oldpage;
	2587	+ unused = newpage;
2524	2588	} else {
2525		- target = newpage;
	2589	+ used = newpage;
2526	2590	unused = oldpage;
2527	2591	}
2528		-
2529		- if (PageAnon(target))
2530		- ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
2531		- else if (page_is_file_cache(target))
2532		- ctype = MEM_CGROUP_CHARGE_TYPE_CACHE;
2533		- else
2534		- ctype = MEM_CGROUP_CHARGE_TYPE_SHMEM;
2535		-
2536		- /* unused page is not on radix-tree now. */
2537		- if (unused)
2538		- __mem_cgroup_uncharge_common(unused, ctype);
2539		-
2540		- pc = lookup_page_cgroup(target);
2541	2592	/*
2542		- * __mem_cgroup_commit_charge() check PCG_USED bit of page_cgroup.
2543		- * So, double-counting is effectively avoided.
	2593	+ * We disallowed uncharge of pages under migration because mapcount
	2594	+ * of the page goes down to zero, temporarly.
	2595	+ * Clear the flag and check the page should be charged.
2544	2596	*/
2545		- __mem_cgroup_commit_charge(mem, pc, ctype);
	2597	+ pc = lookup_page_cgroup(oldpage);
	2598	+ lock_page_cgroup(pc);
	2599	+ ClearPageCgroupMigration(pc);
	2600	+ unlock_page_cgroup(pc);
2546	2601
	2602	+ if (unused != oldpage)
	2603	+ pc = lookup_page_cgroup(unused);
	2604	+ __mem_cgroup_uncharge_common(unused, MEM_CGROUP_CHARGE_TYPE_FORCE);
	2605	+
	2606	+ pc = lookup_page_cgroup(used);
2547	2607	/*
2548		- * Both of oldpage and newpage are still under lock_page().
2549		- * Then, we don't have to care about race in radix-tree.
2550		- * But we have to be careful that this page is unmapped or not.
2551		- *
2552		- * There is a case for !page_mapped(). At the start of
2553		- * migration, oldpage was mapped. But now, it's zapped.
2554		- * But we know target page is not freed/reused under us.
2555		- * mem_cgroup_uncharge_page() does all necessary checks.
	2608	+ * If a page is a file cache, radix-tree replacement is very atomic
	2609	+ * and we can skip this check. When it was an Anon page, its mapcount
	2610	+ * goes down to 0. But because we added MIGRATION flage, it's not
	2611	+ * uncharged yet. There are several case but page->mapcount check
	2612	+ * and USED bit check in mem_cgroup_uncharge_page() will do enough
	2613	+ * check. (see prepare_charge() also)
2556	2614	*/
2557		- if (ctype == MEM_CGROUP_CHARGE_TYPE_MAPPED)
2558		- mem_cgroup_uncharge_page(target);
	2615	+ if (PageAnon(used))
	2616	+ mem_cgroup_uncharge_page(used);
2559	2617	/*
2560		- * At migration, we may charge account against cgroup which has no tasks
	2618	+ * At migration, we may charge account against cgroup which has no
	2619	+ * tasks.
2561	2620	* So, rmdir()->pre_destroy() can be called while we do this charge.
2562	2621	* In that case, we need to call pre_destroy() again. check it here.
2563	2622	*/
...	...	@@ -590,7 +590,7 @@
590	590	}
591	591
592	592	/* charge against new page */
593		- charge = mem_cgroup_prepare_migration(page, &mem);
	593	+ charge = mem_cgroup_prepare_migration(page, newpage, &mem);
594	594	if (charge == -ENOMEM) {
595	595	rc = -ENOMEM;
596	596	goto unlock;