Commit c1a71504e9715812a2d15e7c03b5aa147ae70ded

Authored by Li Zefan
Committed by Tejun Heo
1 parent 266ccd505e

cgroup: don't recycle cgroup id until all csses' have been destroyed

Hugh reported this bug:

> CONFIG_MEMCG_SWAP is broken in 3.13-rc.  Try something like this:
>
> mkdir -p /tmp/tmpfs /tmp/memcg
> mount -t tmpfs -o size=1G tmpfs /tmp/tmpfs
> mount -t cgroup -o memory memcg /tmp/memcg
> mkdir /tmp/memcg/old
> echo 512M >/tmp/memcg/old/memory.limit_in_bytes
> echo $$ >/tmp/memcg/old/tasks
> cp /dev/zero /tmp/tmpfs/zero 2>/dev/null
> echo $$ >/tmp/memcg/tasks
> rmdir /tmp/memcg/old
> sleep 1	# let rmdir work complete
> mkdir /tmp/memcg/new
> umount /tmp/tmpfs
> dmesg | grep WARNING
> rmdir /tmp/memcg/new
> umount /tmp/memcg
>
> Shows lots of WARNING: CPU: 1 PID: 1006 at kernel/res_counter.c:91
>                            res_counter_uncharge_locked+0x1f/0x2f()
>
> Breakage comes from 34c00c319ce7 ("memcg: convert to use cgroup id").
>
> The lifetime of a cgroup id is different from the lifetime of the
> css id it replaced: memsw's css_get()s do nothing to hold on to the
> old cgroup id, it soon gets recycled to a new cgroup, which then
> mysteriously inherits the old's swap, without any charge for it.

Instead of removing cgroup id right after all the csses have been
offlined, we should do that after csses have been destroyed.

To make sure an invalid css pointer won't be returned after the css
is destroyed, make sure css_from_id() returns NULL in this case.

tj: Updated comment to note planned changes for cgrp->id.

Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Tejun Heo <tj@kernel.org>

Showing 1 changed file with 11 additions and 8 deletions Side-by-side Diff

... ... @@ -890,6 +890,16 @@
890 890 struct cgroup *cgrp = dentry->d_fsdata;
891 891  
892 892 BUG_ON(!(cgroup_is_dead(cgrp)));
  893 +
  894 + /*
  895 + * XXX: cgrp->id is only used to look up css's. As cgroup
  896 + * and css's lifetimes will be decoupled, it should be made
  897 + * per-subsystem and moved to css->id so that lookups are
  898 + * successful until the target css is released.
  899 + */
  900 + idr_remove(&cgrp->root->cgroup_idr, cgrp->id);
  901 + cgrp->id = -1;
  902 +
893 903 call_rcu(&cgrp->rcu_head, cgroup_free_rcu);
894 904 } else {
895 905 struct cfent *cfe = __d_cfe(dentry);
... ... @@ -4268,6 +4278,7 @@
4268 4278 struct cgroup_subsys_state *css =
4269 4279 container_of(ref, struct cgroup_subsys_state, refcnt);
4270 4280  
  4281 + rcu_assign_pointer(css->cgroup->subsys[css->ss->subsys_id], NULL);
4271 4282 call_rcu(&css->rcu_head, css_free_rcu_fn);
4272 4283 }
4273 4284  
... ... @@ -4732,14 +4743,6 @@
4732 4743  
4733 4744 /* delete this cgroup from parent->children */
4734 4745 list_del_rcu(&cgrp->sibling);
4735   -
4736   - /*
4737   - * We should remove the cgroup object from idr before its grace
4738   - * period starts, so we won't be looking up a cgroup while the
4739   - * cgroup is being freed.
4740   - */
4741   - idr_remove(&cgrp->root->cgroup_idr, cgrp->id);
4742   - cgrp->id = -1;
4743 4746  
4744 4747 dput(d);
4745 4748