Eric Lee / smarc-fsl-linux-kernel

03 Dec, 2015

1 commit

1f7dd3e5a cgroup: fix handling of multi-destination migration from subtree_control enabling ... Browse Code »

Consider the following v2 hierarchy.

P0 (+memory) --- P1 (-memory) --- A
\- B

P0 has memory enabled in its subtree_control while P1 doesn't. If
both A and B contain processes, they would belong to the memory css of
P1. Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter. IOW, enabling controllers
can cause atomic migrations into different csses.

The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses. pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.

WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
Modules linked in:
CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
...
ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
Call Trace:
[] dump_stack+0x4e/0x82
[] warn_slowpath_common+0x82/0xc0
[] warn_slowpath_null+0x1a/0x20
[] pids_cancel.constprop.6+0x31/0x40
[] pids_can_attach+0x6d/0xf0
[] cgroup_taskset_migrate+0x6c/0x330
[] cgroup_migrate+0xf5/0x190
[] cgroup_attach_task+0x176/0x200
[] __cgroup_procs_write+0x2ad/0x460
[] cgroup_procs_write+0x14/0x20
[] cgroup_file_write+0x35/0x1c0
[] kernfs_fop_write+0x141/0x190
[] __vfs_write+0x28/0xe0
[] vfs_write+0xac/0x1a0
[] SyS_write+0x49/0xb0
[] entry_SYSCALL_64_fastpath+0x12/0x76

This patch fixes the bug by removing @css parameter from the three
migration methods, ->can_attach, ->cancel_attach() and ->attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated. All controllers are
updated accordingly.

* Controllers which don't care whether there are one or multiple
target csses can be converted trivially. cpu, io, freezer, perf,
netclassid and netprio fall in this category.

* cpuset's current implementation assumes that there's single source
and destination and thus doesn't support v2 hierarchy already. The
only change made by this patchset is how that single destination css
is obtained.

* memory migration path already doesn't do anything on v2. How the
single destination css is obtained is updated and the prep stage of
mem_cgroup_can_attach() is reordered to accomodate the change.

* pids is the only controller which was affected by this bug. It now
correctly handles multi-destination migrations and no longer causes
counter underflow from incorrect accounting.

Signed-off-by: Tejun Heo
Reported-and-tested-by: Daniel Wagner
Cc: Aleksa Sarai

Tejun Heo
2015-12-03 23:18:21 +0800

16 Nov, 2015

1 commit

34c06254f cgroup: fix cftype->file_offset handling ... Browse Code »

6f60eade2433 ("cgroup: generalize obtaining the handles of and
notifying cgroup files") introduced cftype->file_offset so that the
handles for per-css file instances can be recorded. These handles
then can be used, for example, to generate file modified
notifications.

Unfortunately, it made the wrong assumption that files are created
once for a given css and removed on its destruction. Due to the
dependencies among subsystems, a css may be hidden from userland and
then later shown again. This is implemented by removing and
re-creating the affected files, so the associated kernfs_node for a
given cgroup file may change over time. This incorrect assumption led
to the corruption of css->files lists.

Reimplement cftype->file_offset handling so that cgroup_file->kn is
protected by a lock and updated as files are created and destroyed.
This also makes keeping them on per-cgroup list unnecessary.

Signed-off-by: Tejun Heo
Reported-by: James Sedgwick
Fixes: 6f60eade2433 ("cgroup: generalize obtaining the handles of and notifying cgroup files")
Acked-by: Johannes Weiner
Acked-by: Zefan Li

Tejun Heo
2015-11-16 23:58:26 +0800

16 Oct, 2015

4 commits

2e91fa7f6 cgroup: keep zombies associated with their original cgroups ... Browse Code »

cgroup_exit() is called when a task exits and disassociates the
exiting task from its cgroups and half-attach it to the root cgroup.
This is unnecessary and undesirable.

No controller actually needs an exiting task to be disassociated with
non-root cgroups. Both cpu and perf_event controllers update the
association to the root cgroup from their exit callbacks just to keep
consistent with the cgroup core behavior.

Also, this disassociation makes it difficult to track resources held
by zombies or determine where the zombies came from. Currently, pids
controller is completely broken as it uncharges on exit and zombies
always escape the resource restriction. With cgroup association being
reset on exit, fixing it is pretty painful.

There's no reason to reset cgroup membership on exit. The zombie can
be removed from its css_set so that it doesn't show up on
"cgroup.procs" and thus can't be migrated or interfere with cgroup
removal. It can still pin and point to the css_set so that its cgroup
membership is maintained. This patch makes cgroup core keep zombies
associated with their cgroups at the time of exit.

* Previous patches decoupled populated_cnt tracking from css_set
lifetime, so a dying task can be simply unlinked from its css_set
while pinning and pointing to the css_set. This keeps css_set
association from task side alive while hiding it from "cgroup.procs"
and populated_cnt tracking. The css_set reference is dropped when
the task_struct is freed.

* ->exit() callback no longer needs the css arguments as the
associated css never changes once PF_EXITING is set. Removed.

* cpu and perf_events controllers no longer need ->exit() callbacks.
There's no reason to explicitly switch away on exit. The final
schedule out is enough. The callbacks are removed.

* On traditional hierarchies, nothing changes. "/proc/PID/cgroup"
still reports "/" for all zombies. On the default hierarchy,
"/proc/PID/cgroup" keeps reporting the cgroup that the task belonged
to at the time of exit. If the cgroup gets removed before the task
is reaped, " (deleted)" is appended.

v2: Build brekage due to missing dummy cgroup_free() when
!CONFIG_CGROUP fixed.

Signed-off-by: Tejun Heo
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo

Tejun Heo
2015-10-16 04:41:53 +0800
f0d9a5f17 cgroup: make css_set_rwsem a spinlock and rename it to css_set_lock ... Browse Code »

css_set_rwsem is the inner lock protecting css_sets and is accessed
from hot paths such as fork and exit. Internally, it has no reason to
be a rwsem or even mutex. There are no internal blocking operations
while holding it. This was rwsem because css task iteration used to
expose it to external iterator users. As the previous patch updated
css task iteration such that the locking is not leaked to its users,
there's no reason to keep it a rwsem.

This patch converts css_set_rwsem to a spinlock and rename it to
css_set_lock. It uses bh-safe operations as a planned usage needs to
access it from RCU callback context.

Signed-off-by: Tejun Heo

Tejun Heo
2015-10-16 04:41:53 +0800
ed27b9f7a cgroup: don't hold css_set_rwsem across css task iteration ... Browse Code »

css_sets are synchronized through css_set_rwsem but the locking scheme
is kinda bizarre. The hot paths - fork and exit - have to write lock
the rwsem making the rw part pointless; furthermore, many readers
already hold cgroup_mutex.

One of the readers is css task iteration. It read locks the rwsem
over the entire duration of iteration. This leads to silly locking
behavior. When cpuset tries to migrate processes of a cgroup to a
different NUMA node, css_set_rwsem is held across the entire migration
attempt which can take a long time locking out forking, exiting and
other cgroup operations.

This patch updates css task iteration so that it locks css_set_rwsem
only while the iterator is being advanced. css task iteration
involves two levels - css_set and task iteration. As css_sets in use
are practically immutable, simply pinning the current one is enough
for resuming iteration afterwards. Task iteration is tricky as tasks
may leave their css_set while iteration is in progress. This is
solved by keeping track of active iterators and advancing them if
their next task leaves its css_set.

v2: put_task_struct() in css_task_iter_next() moved outside
css_set_rwsem. A later patch will add cgroup operations to
task_struct free path which may grab the same lock and this avoids
deadlock possibilities.

css_set_move_task() updated to use list_for_each_entry_safe() when
walking task_iters and advancing them. This is necessary as
advancing an iter may remove it from the list.

Signed-off-by: Tejun Heo

Tejun Heo
2015-10-16 04:41:52 +0800
27bd4dbb8 cgroup: replace cgroup_has_tasks() with cgroup_is_populated() ... Browse Code »

Currently, cgroup_has_tasks() tests whether the target cgroup has any
css_set linked to it. This works because a css_set's refcnt converges
with the number of tasks linked to it and thus there's no css_set
linked to a cgroup if it doesn't have any live tasks.

To help tracking resource usage of zombie tasks, putting the ref of
css_set will be separated from disassociating the task from the
css_set which means that a cgroup may have css_sets linked to it even
when it doesn't have any live tasks.

This patch replaces cgroup_has_tasks() with cgroup_is_populated()
which tests cgroup->nr_populated instead which locally counts the
number of populated css_sets. Unlike cgroup_has_tasks(),
cgroup_is_populated() is recursive - if any of the descendants is
populated, the cgroup is populated too. While this changes the
meaning of the test, all the existing users are okay with the change.

While at it, replace the open-coded ->populated_cnt test in
cgroup_events_show() with cgroup_is_populated().

Signed-off-by: Tejun Heo
Cc: Li Zefan
Cc: Johannes Weiner
Cc: Michal Hocko

Tejun Heo
2015-10-16 04:41:50 +0800

23 Sep, 2015

1 commit

4530eddb5 cgroup, memcg, cpuset: implement cgroup_taskset_for_each_leader() ... Browse Code »

It wasn't explicitly documented but, when a process is being migrated,
cpuset and memcg depend on cgroup_taskset_first() returning the
threadgroup leader; however, this approach is somewhat ghetto and
would no longer work for the planned multi-process migration.

This patch introduces explicit cgroup_taskset_for_each_leader() which
iterates over only the threadgroup leaders and replaces
cgroup_taskset_first() usages for accessing the leader with it.

This prepares both memcg and cpuset for multi-process migration. This
patch also updates the documentation for cgroup_taskset_for_each() to
clarify the iteration rules and removes comments mentioning task
ordering in tasksets.

v2: A previous patch which added threadgroup leader test was dropped.
Patch updated accordingly.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li
Acked-by: Michal Hocko
Cc: Johannes Weiner

Tejun Heo
2015-09-23 00:46:53 +0800

19 Sep, 2015

1 commit

6f60eade2 cgroup: generalize obtaining the handles of and notifying cgroup files ... Browse Code »

cgroup core handles creations and removals of cgroup interface files
as described by cftypes. There are cases where the handle for a given
file instance is necessary, for example, to generate a file modified
event. Currently, this is handled by explicitly matching the callback
method pointer and storing the file handle manually in
cgroup_add_file(). While this simple approach works for cgroup core
files, it can't for controller interface files.

This patch generalizes cgroup interface file handle handling. struct
cgroup_file is defined and each cftype can optionally tell cgroup core
to store the file handle by setting ->file_offset. A file handle
remains accessible as long as the containing css is accessible.

Both "cgroup.procs" and "cgroup.events" are converted to use the new
generic mechanism instead of hooking directly into cgroup_add_file().
Also, cgroup_file_notify() which takes a struct cgroup_file and
generates a file modified event on it is added and replaces explicit
kernfs_notify() invocations.

This generalizes cgroup file handle handling and allows controllers to
generate file modified notifications.

Signed-off-by: Tejun Heo
Cc: Li Zefan
Cc: Johannes Weiner

Tejun Heo
2015-09-19 05:54:23 +0800

18 Sep, 2015

2 commits

9e10a130d cgroup: replace cgroup_on_dfl() tests in controllers with cgroup_subsys_on_dfl() ... Browse Code »

cgroup_on_dfl() tests whether the cgroup's root is the default
hierarchy; however, an individual controller is only interested in
whether the controller is attached to the default hierarchy and never
tests a cgroup which doesn't belong to the hierarchy that the
controller is attached to.

This patch replaces cgroup_on_dfl() tests in controllers with faster
static_key based cgroup_subsys_on_dfl(). This leaves cgroup core as
the only user of cgroup_on_dfl() and the function is moved from the
header file to cgroup.c.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li
Cc: Vivek Goyal
Cc: Jens Axboe
Cc: Johannes Weiner
Cc: Michal Hocko

Tejun Heo
2015-09-18 23:56:28 +0800
49d1dc4b8 cgroup: implement static_key based cgroup_subsys_enabled() and cgroup_subsys_on_dfl() ... Browse Code »

Whether a subsys is enabled and attached to the default hierarchy
seldom changes and may be tested in the hot paths. This patch
implements static_key based cgroup_subsys_enabled() and
cgroup_subsys_on_dfl() tests.

The following patches will update the users and remove duplicate
mechanisms.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2015-09-18 23:56:28 +0800

26 Aug, 2015

1 commit

20f1f4b5f Merge branch 'for-4.3-unified-base' into for-4.3 Browse Code »

Tejun Heo
2015-08-26 02:19:29 +0800

05 Aug, 2015

1 commit

6abc8ca19 cgroup: define controller file conventions ... Browse Code »

Traditionally, each cgroup controller implemented whatever interface
it wanted leading to interfaces which are widely inconsistent.
Examining the requirements of the controllers readily yield that there
are only a few control schemes shared among all.

Two major controllers already had to implement new interface for the
unified hierarchy due to significant structural changes. Let's take
the chance to establish common conventions throughout all controllers.

This patch defines CGROUP_WEIGHT_MIN/DFL/MAX to be used on all weight
based control knobs and documents the conventions that controllers
should follow on the unified hierarchy. Except for io.weight knob,
all existing unified hierarchy knobs are already compliant. A
follow-up patch will update io.weight.

v2: Added descriptions of min, low and high knobs.

Signed-off-by: Tejun Heo
Acked-by: Johannes Weiner
Cc: Li Zefan
Cc: Peter Zijlstra

Tejun Heo
2015-08-05 03:20:55 +0800

15 Jul, 2015

1 commit

7e47682ea cgroup: allow a cgroup subsystem to reject a fork ... Browse Code »

Add a new cgroup subsystem callback can_fork that conditionally
states whether or not the fork is accepted or rejected by a cgroup
policy. In addition, add a cancel_fork callback so that if an error
occurs later in the forking process, any state modified by can_fork can
be reverted.

Allow for a private opaque pointer to be passed from cgroup_can_fork to
cgroup_post_fork, allowing for the fork state to be stored by each
subsystem separately.

Also add a tagging system for cgroup_subsys.h to allow for CGROUP_
enumerations to be be defined and used. In addition, explicitly add a
CGROUP_CANFORK_COUNT macro to make arrays easier to define.

This is in preparation for implementing the pids cgroup subsystem.

Signed-off-by: Aleksa Sarai
Signed-off-by: Tejun Heo

Aleksa Sarai
2015-07-15 05:29:23 +0800

27 Jun, 2015

1 commit

bbe179f88 Merge branch 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:

- threadgroup_lock got reorganized so that its users can pick the
actual locking mechanism to use. Its only user - cgroups - is
updated to use a percpu_rwsem instead of per-process rwsem.

This makes things a bit lighter on hot paths and allows cgroups to
perform and fail multi-task (a process) migrations atomically.
Multi-task migrations are used in several places including the
unified hierarchy.

- Delegation rule and documentation added to unified hierarchy. This
will likely be the last interface update from the cgroup core side
for unified hierarchy before lifting the devel mask.

- Some groundwork for the pids controller which is scheduled to be
merged in the coming devel cycle.

* 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: add delegation section to unified hierarchy documentation
cgroup: require write perm on common ancestor when moving processes on the default hierarchy
cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()
kernfs: make kernfs_get_inode() public
MAINTAINERS: add a cgroup core co-maintainer
cgroup: fix uninitialised iterator in for_each_subsys_which
cgroup: replace explicit ss_mask checking with for_each_subsys_which
cgroup: use bitmask to filter for_each_subsys
cgroup: add seq_file forward declaration for struct cftype
cgroup: simplify threadgroup locking
sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem
sched, cgroup: reorganize threadgroup locking
cgroup: switch to unsigned long for bitmasks
cgroup: reorganize include/linux/cgroup.h
cgroup: separate out include/linux/cgroup-defs.h
cgroup: fix some comment typos

Linus Torvalds
2015-06-27 10:50:04 +0800

02 Jun, 2015

1 commit

ec438699a cgroup, block: implement task_get_css() and use it in bio_associate_current() ... Browse Code »

bio_associate_current() currently open codes task_css() and
css_tryget_online() to find and pin $current's blkcg css. Abstract it
into task_get_css() which is implemented from cgroup side. As a task
is always associated with an online css for every subsystem except
while the css_set update is propagating, task_get_css() retries till
css_tryget_online() succeeds.

This is a cleanup and shouldn't lead to noticeable behavior changes.

Signed-off-by: Tejun Heo
Cc: Li Zefan
Cc: Jens Axboe
Cc: Vivek Goyal
Signed-off-by: Jens Axboe

Tejun Heo
2015-06-02 22:33:34 +0800

19 May, 2015

2 commits

c326aa2bb cgroup: reorganize include/linux/cgroup.h ... Browse Code »

From c4d440938b5e2015c70594fe6666a099c844f929 Mon Sep 17 00:00:00 2001
From: Tejun Heo
Date: Wed, 13 May 2015 16:21:40 -0400

Over time, cgroup.h grew organically and doesn't have much logical
structure at this point. Separation of cgroup-defs.h in the previous
patch gives us a good chance for reorganizing cgroup.h as changes to
the header are likely to cause conflicts anyway.

This patch reorganizes cgroup.h so that it has consistent logical
grouping.

This is pure reorganization.

v2: Relocating #ifdef CONFIG_CGROUPS caused build failure when cgroup
is disabled. Dropped.

Signed-off-by: Tejun Heo

Tejun Heo
2015-05-19 03:52:20 +0800
b4a04ab7a cgroup: separate out include/linux/cgroup-defs.h ... Browse Code »

From 2d728f74bfc071df06773e2fd7577dd5dab6425d Mon Sep 17 00:00:00 2001
From: Tejun Heo
Date: Wed, 13 May 2015 15:37:01 -0400

This patch separates out cgroup-defs.h from cgroup.h which has grown a
lot of dependencies. cgroup-defs.h currently only contains constant
and type definitions and can be used to break circular include
dependency. While moving, definitions are reordered so that
cgroup-defs.h has consistent logical structure.

This patch is pure reorganization.

Signed-off-by: Tejun Heo

Tejun Heo
2015-05-19 03:52:16 +0800

07 Jan, 2015

1 commit

f3ba53802 cgroup: add dummy css_put() for !CONFIG_CGROUPS ... Browse Code »

This will later be depended upon by the scheduled cgroup writeback
support.

Signed-off-by: Tejun Heo

Tejun Heo
2015-01-07 01:02:46 +0800

12 Dec, 2014

1 commit

2756d373a Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup update from Tejun Heo:
"cpuset got simplified a bit. cgroup core got a fix on unified
hierarchy and grew some effective css related interfaces which will be
used for blkio support for writeback IO traffic which is currently
being worked on"

* 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: implement cgroup_get_e_css()
cgroup: add cgroup_subsys->css_e_css_changed()
cgroup: add cgroup_subsys->css_released()
cgroup: fix the async css offline wait logic in cgroup_subtree_control_write()
cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write()
cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask()
cpuset: lock vs unlock typo
cpuset: simplify cpuset_node_allowed API
cpuset: convert callback_mutex to a spinlock

Linus Torvalds
2014-12-12 10:57:19 +0800

11 Dec, 2014

2 commits

b6da0076b Merge branch 'akpm' (patchbomb from Andrew) ... Browse Code »

Merge first patchbomb from Andrew Morton:
- a few minor cifs fixes
- dma-debug upadtes
- ocfs2
- slab
- about half of MM
- procfs
- kernel/exit.c
- panic.c tweaks
- printk upates
- lib/ updates
- checkpatch updates
- fs/binfmt updates
- the drivers/rtc tree
- nilfs
- kmod fixes
- more kernel/exit.c
- various other misc tweaks and fixes

* emailed patches from Andrew Morton : (190 commits)
exit: pidns: fix/update the comments in zap_pid_ns_processes()
exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
exit: exit_notify: re-use "dead" list to autoreap current
exit: reparent: call forget_original_parent() under tasklist_lock
exit: reparent: avoid find_new_reaper() if no children
exit: reparent: introduce find_alive_thread()
exit: reparent: introduce find_child_reaper()
exit: reparent: document the ->has_child_subreaper checks
exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()
exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting
exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting
exit: proc: don't try to flush /proc/tgid/task/tgid
exit: release_task: fix the comment about group leader accounting
exit: wait: drop tasklist_lock before psig->c* accounting
exit: wait: don't use zombie->real_parent
exit: wait: cleanup the ptrace_reparented() checks
usermodehelper: kill the kmod_thread_locker logic
usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper()
fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp
nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
...

Linus Torvalds
2014-12-11 10:34:42 +0800
e8ea14cc6 mm: memcontrol: take a css reference for each charged page ... Browse Code »

Charges currently pin the css indirectly by playing tricks during
css_offline(): user pages stall the offlining process until all of them
have been reparented, whereas kmemcg acquires a keep-alive reference if
outstanding kernel pages are detected at that point.

In preparation for removing all this complexity, make the pinning explicit
and acquire a css references for every charged page.

Signed-off-by: Johannes Weiner
Reviewed-by: Vladimir Davydov
Acked-by: Michal Hocko
Cc: David Rientjes
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2014-12-11 09:41:05 +0800

20 Nov, 2014

1 commit

b583043e9 kill f_dentry uses ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-11-20 02:01:25 +0800

18 Nov, 2014

3 commits

eeecbd197 cgroup: implement cgroup_get_e_css() ... Browse Code »

Implement cgroup_get_e_css() which finds and gets the effective css
for the specified cgroup and subsystem combination. This function
always returns a valid pinned css. This will be used by cgroup
writeback support.

While at it, add comment to cgroup_e_css() to explain why that
function is different from cgroup_get_e_css() and has to test
cgrp->child_subsys_mask instead of cgroup_css(cgrp, ss).

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:52 +0800
56c807ba4 cgroup: add cgroup_subsys->css_e_css_changed() ... Browse Code »

Add a new cgroup_subsys operatoin ->css_e_css_changed(). This is
invoked if any of the effective csses seen from the css's cgroup may
have changed. This will be used to implement cgroup writeback
support.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:51 +0800
7d172cc89 cgroup: add cgroup_subsys->css_released() ... Browse Code »

Add a new cgroup subsys callback css_released(). This is called when
the reference count of the css (cgroup_subsys_state) reaches zero
before RCU scheduling free.

Signed-off-by: Tejun Heo
Acked-by: Zefan Li

Tejun Heo
2014-11-18 15:49:51 +0800

19 Sep, 2014

4 commits

a25eb52e8 cgroup: remove CGRP_RELEASABLE flag ... Browse Code »

We call put_css_set() after setting CGRP_RELEASABLE flag in
cgroup_task_migrate(), but in other places we call it without setting
the flag. I don't see the necessity of this flag.

Moreover once the flag is set, it will never be cleared, unless writing
to the notify_on_release control file, so it can be quite confusing
if we look at the output of debug.releasable.

# mount -t cgroup -o debug xxx /cgroup
# mkdir /cgroup/child
# cat /cgroup/child/debug.releasable
0 /cgroup/child/tasks
# cat /cgroup/child/debug.releasable
0
# echo $$ > /cgroup/tasks && echo $$ > /cgroup/child/tasks
# cat /proc/child/debug.releasable
1
Signed-off-by: Tejun Heo

Zefan Li
2014-09-19 21:29:32 +0800
f29374b14 cgroup: remove redundant check in cgroup_ino() ... Browse Code »

After we implemented default unified hierarchy, cgrp->kn can never
be NULL.

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-19 21:16:23 +0800
006f4ac49 cgroup: simplify proc_cgroup_show() ... Browse Code »

Use the ONE macro instead of REG, and we can simplify proc_cgroup_show().

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-19 01:27:23 +0800
971ff4935 cgroup: use a per-cgroup work for release agent ... Browse Code »

Instead of using a global work to schedule release agent on removable
cgroups, we change to use a per-cgroup work to do this, which makes
the code much simpler.

v2: use a dedicated work instead of reusing css->destroy_work. (Tejun)

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Zefan Li
2014-09-19 01:14:22 +0800

18 Sep, 2014

1 commit

6213daab2 cgroup: remove some useless forward declarations ... Browse Code »

Signed-off-by: Zefan Li
Signed-off-by: Tejun Heo

Li Zefan
2014-09-18 05:34:15 +0800

15 Jul, 2014

4 commits

05ebb6e60 cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core ... Browse Code »

cgroup now distinguishes cftypes for the default and legacy
hierarchies more explicitly by using separate arrays and
CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE should be and are used only
inside cgroup core proper. Let's make it clear that the flags are
internal by prefixing them with double underscores.

CFTYPE_INSANE is renamed to __CFTYPE_NOT_ON_DFL for consistency. The
two flags are also collected and assigned bits >= 16 so that they
aren't mixed with the published flags.

v2: Convert the extra ones in cgroup_exit_cftypes() which are added by
revision to the previous patch.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2014-07-15 23:05:10 +0800
a8ddc8215 cgroup: distinguish the default and legacy hierarchies when handling cftypes ... Browse Code »

Until now, cftype arrays carried files for both the default and legacy
hierarchies and the files which needed to be used on only one of them
were flagged with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE. This
gets confusing very quickly and we may end up exposing interface files
to the default hierarchy without thinking it through.

This patch makes cgroup core provide separate sets of interfaces for
cftype handling so that the cftypes for the default and legacy
hierarchies are clearly distinguished. The previous two patches
renamed the existing ones so that they clearly indicate that they're
for the legacy hierarchies. This patch adds the interface for the
default hierarchy and apply them selectively depending on the
hierarchy type.

* cftypes added through cgroup_subsys->dfl_cftypes and
cgroup_add_dfl_cftypes() only show up on the default hierarchy.

* cftypes added through cgroup_subsys->legacy_cftypes and
cgroup_add_legacy_cftypes() only show up on the legacy hierarchies.

* cgroup_subsys->dfl_cftypes and ->legacy_cftypes can point to the
same array for the cases where the interface files are identical on
both types of hierarchies.

* This makes all the existing subsystem interface files legacy-only by
default and all subsystems will have no interface file created when
enabled on the default hierarchy. Each subsystem should explicitly
review and compose the interface for the default hierarchy.

* A boot param "cgroup__DEVEL__legacy_files_on_dfl" is added which
makes subsystems which haven't decided the interface files for the
default hierarchy to present the legacy files on the default
hierarchy so that its behavior on the default hierarchy can be
tested. As the awkward name suggests, this is for development only.

* memcg's CFTYPE_INSANE on "use_hierarchy" is noop now as the whole
array isn't used on the default hierarchy. The flag is removed.

v2: Updated documentation for cgroup__DEVEL__legacy_files_on_dfl.

v3: Clear CFTYPE_ONLY_ON_DFL and CFTYPE_INSANE when cfts are removed
as suggested by Li.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Acked-by: Li Zefan
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Vivek Goyal
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Aristeu Rozanski
Cc: Aneesh Kumar K.V

Tejun Heo
2014-07-15 23:05:10 +0800
2cf669a58 cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() ... Browse Code »

Currently, cftypes added by cgroup_add_cftypes() are used for both the
unified default hierarchy and legacy ones and subsystems can mark each
file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to
appear only on one of them. This is quite hairy and error-prone.
Also, we may end up exposing interface files to the default hierarchy
without thinking it through.

cgroup_subsys will grow two separate cftype addition functions and
apply each only on the hierarchies of the matching type. This will
allow organizing cftypes in a lot clearer way and encourage subsystems
to scrutinize the interface which is being exposed in the new default
hierarchy.

In preparation, this patch adds cgroup_add_legacy_cftypes() which
currently is a simple wrapper around cgroup_add_cftypes() and replaces
all cgroup_add_cftypes() usages with it.

While at it, this patch drops a completely spurious return from
__hugetlb_cgroup_file_init().

This patch doesn't introduce any functional differences.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Acked-by: Li Zefan
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Aneesh Kumar K.V

Tejun Heo
2014-07-15 23:05:09 +0800
5577964e6 cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes ... Browse Code »

Currently, cgroup_subsys->base_cftypes is used for both the unified
default hierarchy and legacy ones and subsystems can mark each file
with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear
only on one of them. This is quite hairy and error-prone. Also, we
may end up exposing interface files to the default hierarchy without
thinking it through.

cgroup_subsys will grow two separate cftype arrays and apply each only
on the hierarchies of the matching type. This will allow organizing
cftypes in a lot clearer way and encourage subsystems to scrutinize
the interface which is being exposed in the new default hierarchy.

In preparation, this patch renames cgroup_subsys->base_cftypes to
cgroup_subsys->legacy_cftypes. This patch is pure rename.

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Acked-by: Li Zefan
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Vivek Goyal
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Aristeu Rozanski
Cc: Aneesh Kumar K.V

Tejun Heo
2014-07-15 23:05:09 +0800

09 Jul, 2014

6 commits

aa6ec29be cgroup: remove sane_behavior support on non-default hierarchies ... Browse Code »

sane_behavior has been used as a development vehicle for the default
unified hierarchy. Now that the default hierarchy is in place, the
flag became redundant and confusing as its usage is allowed on all
hierarchies. There are gonna be either the default hierarchy or
legacy ones. Let's make that clear by removing sane_behavior support
on non-default hierarchies.

This patch replaces cgroup_sane_behavior() with cgroup_on_dfl(). The
comment on top of CGRP_ROOT_SANE_BEHAVIOR is moved to on top of
cgroup_on_dfl() with sane_behavior specific part dropped.

On the default and legacy hierarchies w/o sane_behavior, this
shouldn't cause any behavior differences.

Signed-off-by: Tejun Heo
Acked-by: Vivek Goyal
Acked-by: Li Zefan
Cc: Johannes Weiner
Cc: Michal Hocko

Tejun Heo
2014-07-09 22:08:08 +0800
7450e90bb cgroup: remove CGRP_ROOT_OPTION_MASK ... Browse Code »

cgroup_root->flags only contains CGRP_ROOT_* flags and there's no
reason to mask the flags. Remove CGRP_ROOT_OPTION_MASK.

This doesn't cause any behavior differences.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2014-07-09 22:08:07 +0800
af0ba6789 cgroup: implement cgroup_subsys->depends_on ... Browse Code »

Currently, the blkio subsystem attributes all of writeback IOs to the
root. One of the issues is that there's no way to tell who originated
a writeback IO from block layer. Those IOs are usually issued
asynchronously from a task which didn't have anything to do with
actually generating the dirty pages. The memory subsystem, when
enabled, already keeps track of the ownership of each dirty page and
it's desirable for blkio to piggyback instead of adding its own
per-page tag.

blkio piggybacking on memory is an implementation detail which
preferably should be handled automatically without requiring explicit
userland action. To achieve that, this patch implements
cgroup_subsys->depends_on which contains the mask of subsystems which
should be enabled together when the subsystem is enabled.

The previous patches already implemented the support for enabled but
invisible subsystems and cgroup_subsys->depends_on can be easily
implemented by updating cgroup_refresh_child_subsys_mask() so that it
calculates cgroup->child_subsys_mask considering
cgroup_subsys->depends_on of the explicitly enabled subsystems.

Documentation/cgroups/unified-hierarchy.txt is updated to explain that
subsystems may not become immediately available after being unused
from userland and that dependency could be a factor in it. As
subsystems may already keep residual references, this doesn't
significantly change how subsystem rebinding can be used.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Johannes Weiner

Tejun Heo
2014-07-09 06:02:57 +0800
b4536f0ca cgroup: implement cgroup_subsys->css_reset() ... Browse Code »

cgroup is implementing support for subsystem dependency which would
require a way to enable a subsystem even when it's not directly
configured through "cgroup.subtree_control".

The previous patches added support for explicitly and implicitly
enabled subsystems and showing/hiding their interface files. An
explicitly enabled subsystem may become implicitly enabled if it's
turned off through "cgroup.subtree_control" but there are subsystems
depending on it. In such cases, the subsystem, as it's turned off
when seen from userland, shouldn't enforce any resource control.
Also, the subsystem may be explicitly turned on later again and its
interface files should be as close to the intial state as possible.

This patch adds cgroup_subsys->css_reset() which is invoked when a css
is hidden. The callback should disable resource control and reset the
state to the vanilla state.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Johannes Weiner

Tejun Heo
2014-07-09 06:02:57 +0800
f63070d35 cgroup: make interface files visible iff enabled on cgroup->subtree_control ... Browse Code »

cgroup is implementing support for subsystem dependency which would
require a way to enable a subsystem even when it's not directly
configured through "cgroup.subtree_control".

The preceding patch distinguished cgroup->subtree_control and
->child_subsys_mask where the former is the subsystems explicitly
configured by the userland and the latter is all enabled subsystems
currently is equal to the former but will include subsystems
implicitly enabled through dependency.

Subsystems which are enabled due to dependency shouldn't be visible to
userland. This patch updates cgroup_subtree_control_write() and
create_css() such that interface files are not created for implicitly
enabled subsytems.

* @visible paramter is added to create_css(). Interface files are
created only when true.

* If an already implicitly enabled subsystem is turned on through
"cgroup.subtree_control", the existing css should be used. css
draining is skipped.

* cgroup_subtree_control_write() computes the new target
cgroup->child_subsys_mask and create/kill or show/hide csses
accordingly.

As the two subsystem masks are still kept identical, this patch
doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Johannes Weiner

Tejun Heo
2014-07-09 06:02:57 +0800
667c24917 cgroup: introduce cgroup->subtree_control ... Browse Code »

cgroup is implementing support for subsystem dependency which would
require a way to enable a subsystem even when it's not directly
configured through "cgroup.subtree_control".

Previously, cgroup->child_subsys_mask directly reflected
"cgroup.subtree_control" and the enabled subsystems in the child
cgroups. This patch adds cgroup->subtree_control which
"cgroup.subtree_control" operates on. cgroup->child_subsys_mask is
now calculated from cgroup->subtree_control by
cgroup_refresh_child_subsys_mask(), which sets it identical to
cgroup->subtree_control for now.

This will allow using cgroup->child_subsys_mask for all the enabled
subsystems including the implicit ones and ->subtree_control for
tracking the explicitly requested ones. This patch keeps the two
masks identical and doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Johannes Weiner

Tejun Heo
2014-07-09 06:02:56 +0800