Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

19 Mar, 2014

1 commit

4d3bb511b cgroup: drop const from @buffer of cftype->write_string() ... Browse Code »

cftype->write_string() just passes on the writeable buffer from kernfs
and there's no reason to add const restriction on the buffer. The
only thing const achieves is unnecessarily complicating parsing of the
buffer. Drop const from @buffer.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Daniel Borkmann
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki

Tejun Heo
2014-03-19 22:23:54 +0800

25 Feb, 2014

1 commit

a60bed296 cgroup_freezer: document freezer_fork() subtleties ... Browse Code »

cgroup_subsys->fork() callback is special in that it's called outside
the usual cgroup locking and may race with on-going migration.
freezer_fork() currently doesn't consider such race condition;
however, it is still correct thanks to the fact that freeze_task() may
be called spuriously.

This is quite subtle. Let's explain what's going on and add test to
detect racing and losing to task migration and skip freeze_task() in
such cases for documentation.

This doesn't make any behavior difference meaningful to userland.

Signed-off-by: Tejun Heo
Cc: Li Zefan
Cc: "Rafael J. Wysocki"

Tejun Heo
2014-02-25 23:04:40 +0800

13 Feb, 2014

1 commit

924f0d9a2 cgroup: drop @skip_css from cgroup_taskset_for_each() ... Browse Code »

If !NULL, @skip_css makes cgroup_taskset_for_each() skip the matching
css. The intention of the interface is to make it easy to skip css's
(cgroup_subsys_states) which already match the migration target;
however, this is entirely unnecessary as migration taskset doesn't
include tasks which are already in the target cgroup. Drop @skip_css
from cgroup_taskset_for_each().

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Daniel Borkmann

Tejun Heo
2014-02-13 19:58:41 +0800

08 Feb, 2014

1 commit

073219e99 cgroup: clean up cgroup_subsys names and initialization ... Browse Code »

cgroup_subsys is a bit messier than it needs to be.

* The name of a subsys can be different from its internal identifier
defined in cgroup_subsys.h. Most subsystems use the matching name
but three - cpu, memory and perf_event - use different ones.

* cgroup_subsys_id enums are postfixed with _subsys_id and each
cgroup_subsys is postfixed with _subsys. cgroup.h is widely
included throughout various subsystems, it doesn't and shouldn't
have claim on such generic names which don't have any qualifier
indicating that they belong to cgroup.

* cgroup_subsys->subsys_id should always equal the matching
cgroup_subsys_id enum; however, we require each controller to
initialize it and then BUG if they don't match, which is a bit
silly.

This patch cleans up cgroup_subsys names and initialization by doing
the followings.

* cgroup_subsys_id enums are now postfixed with _cgrp_id, and each
cgroup_subsys with _cgrp_subsys.

* With the above, renaming subsys identifiers to match the userland
visible names doesn't cause any naming conflicts. All non-matching
identifiers are renamed to match the official names.

cpu_cgroup -> cpu
mem_cgroup -> memory
perf -> perf_event

* controllers no longer need to initialize ->subsys_id and ->name.
They're generated in cgroup core and set automatically during boot.

* Redundant cgroup_subsys declarations removed.

* While updating BUG_ON()s in cgroup_init_early(), convert them to
WARN()s. BUGging that early during boot is stupid - the kernel
can't print anything, even through serial console and the trap
handler doesn't even link stack frame properly for back-tracing.

This patch doesn't introduce any behavior changes.

v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs
classid handling into core").

Signed-off-by: Tejun Heo
Acked-by: Neil Horman
Acked-by: "David S. Miller"
Acked-by: "Rafael J. Wysocki"
Acked-by: Michal Hocko
Acked-by: Peter Zijlstra
Acked-by: Aristeu Rozanski
Acked-by: Ingo Molnar
Acked-by: Li Zefan
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Cc: Serge E. Hallyn
Cc: Vivek Goyal
Cc: Thomas Graf

Tejun Heo
2014-02-08 23:36:58 +0800

06 Dec, 2013

1 commit

2da8ca822 cgroup: replace cftype->read_seq_string() with cftype->seq_show() ... Browse Code »

In preparation of conversion to kernfs, cgroup file handling is
updated so that it can be easily mapped to kernfs. This patch
replaces cftype->read_seq_string() with cftype->seq_show() which is
not limited to single_open() operation and will map directcly to
kernfs seq_file interface.

The conversions are mechanical. As ->seq_show() doesn't have @css and
@cft, the functions which make use of them are converted to use
seq_css() and seq_cft() respectively. In several occassions, e.f. if
it has seq_string in its name, the function name is updated to fit the
new method better.

This patch does not introduce any behavior changes.

Signed-off-by: Tejun Heo
Acked-by: Aristeu Rozanski
Acked-by: Vivek Goyal
Acked-by: Michal Hocko
Acked-by: Daniel Wagner
Acked-by: Li Zefan
Cc: Jens Axboe
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Cc: Neil Horman

Tejun Heo
2013-12-06 01:28:04 +0800

09 Aug, 2013

11 commits

bd8815a6d cgroup: make css_for_each_descendant() and friends include the origin css in the iteration ... Browse Code »

Previously, all css descendant iterators didn't include the origin
(root of subtree) css in the iteration. The reasons were maintaining
consistency with css_for_each_child() and that at the time of
introduction more use cases needed skipping the origin anyway;
however, given that css_is_descendant() considers self to be a
descendant, omitting the origin css has become more confusing and
looking at the accumulated use cases rather clearly indicates that
including origin would result in simpler code overall.

While this is a change which can easily lead to subtle bugs, cgroup
API including the iterators has recently gone through major
restructuring and no out-of-tree changes will be applicable without
adjustments making this a relatively acceptable opportunity for this
type of change.

The conversions are mostly straight-forward. If the iteration block
had explicit origin handling before or after, it's moved inside the
iteration. If not, if (pos == origin) continue; is added. Some
conversions add extra reference get/put around origin handling by
consolidating origin handling and the rest. While the extra ref
operations aren't strictly necessary, this shouldn't cause any
noticeable difference.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Vivek Goyal
Acked-by: Aristeu Rozanski
Acked-by: Michal Hocko
Cc: Jens Axboe
Cc: Matt Helsley
Cc: Johannes Weiner
Cc: Balbir Singh

Tejun Heo
2013-08-09 08:11:27 +0800
d99c8727e cgroup: make cgroup_taskset deal with cgroup_subsys_state instead of cgroup ... Browse Code »

cgroup is in the process of converting to css (cgroup_subsys_state)
from cgroup as the principal subsystem interface handle. This is
mostly to prepare for the unified hierarchy support where css's will
be created and destroyed dynamically but also helps cleaning up
subsystem implementations as css is usually what they are interested
in anyway.

cgroup_taskset which is used by the subsystem attach methods is the
last cgroup subsystem API which isn't using css as the handle. Update
cgroup_taskset_cur_cgroup() to cgroup_taskset_cur_css() and
cgroup_taskset_for_each() to take @skip_css instead of @skip_cgrp.

The conversions are pretty mechanical. One exception is
cpuset::cgroup_cs(), which lost its last user and got removed.

This patch shouldn't introduce any functional changes.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Daniel Wagner
Cc: Ingo Molnar
Cc: Matt Helsley
Cc: Steven Rostedt

Tejun Heo
2013-08-09 08:11:27 +0800
72ec70299 cgroup: make task iterators deal with cgroup_subsys_state instead of cgroup ... Browse Code »

cgroup is in the process of converting to css (cgroup_subsys_state)
from cgroup as the principal subsystem interface handle. This is
mostly to prepare for the unified hierarchy support where css's will
be created and destroyed dynamically but also helps cleaning up
subsystem implementations as css is usually what they are interested
in anyway.

This patch converts task iterators to deal with css instead of cgroup.
Note that under unified hierarchy, different sets of tasks will be
considered belonging to a given cgroup depending on the subsystem in
question and making the iterators deal with css instead cgroup
provides them with enough information about the iteration.

While at it, fix several function comment formats in cpuset.c.

This patch doesn't introduce any behavior differences.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Michal Hocko
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: Matt Helsley

Tejun Heo
2013-08-09 08:11:26 +0800
c59cd3d84 cgroup: make cgroup_task_iter remember the cgroup being iterated ... Browse Code »

Currently all cgroup_task_iter functions require @cgrp to be passed
in, which is superflous and increases chance of usage error. Make
cgroup_task_iter remember the cgroup being iterated and drop @cgrp
argument from next and end functions.

This patch doesn't introduce any behavior differences.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Michal Hocko
Cc: Matt Helsley
Cc: Johannes Weiner
Cc: Balbir Singh

Tejun Heo
2013-08-09 08:11:26 +0800
0942eeeef cgroup: rename cgroup_iter to cgroup_task_iter ... Browse Code »

cgroup now has multiple iterators and it's quite confusing to have
something which walks over tasks of a single cgroup named cgroup_iter.
Let's rename it to cgroup_task_iter.

While at it, reformat / update comments and replace the overview
comment above the interface function decls with proper function
comments. Such overview can be useful but function comments should be
more than enough here.

This is pure rename and doesn't introduce any functional changes.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Michal Hocko
Cc: Matt Helsley
Cc: Johannes Weiner
Cc: Balbir Singh

Tejun Heo
2013-08-09 08:11:26 +0800
492eb21b9 cgroup: make hierarchy iterators deal with cgroup_subsys_state instead of cgroup ... Browse Code »

cgroup is currently in the process of transitioning to using css
(cgroup_subsys_state) as the primary handle instead of cgroup in
subsystem API. For hierarchy iterators, this is beneficial because

* In most cases, css is the only thing subsystems care about anyway.

* On the planned unified hierarchy, iterations for different
subsystems will need to skip over different subtrees of the
hierarchy depending on which subsystems are enabled on each cgroup.
Passing around css makes it unnecessary to explicitly specify the
subsystem in question as css is intersection between cgroup and
subsystem

* For the planned unified hierarchy, css's would need to be created
and destroyed dynamically independent from cgroup hierarchy. Having
cgroup core manage css iteration makes enforcing deref rules a lot
easier.

Most subsystem conversions are straight-forward. Noteworthy changes
are

* blkio: cgroup_to_blkcg() is no longer used. Removed.

* freezer: cgroup_freezer() is no longer used. Removed.

* devices: cgroup_to_devcgroup() is no longer used. Removed.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Michal Hocko
Acked-by: Vivek Goyal
Acked-by: Aristeu Rozanski
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: Matt Helsley
Cc: Jens Axboe

Tejun Heo
2013-08-09 08:11:25 +0800
182446d08 cgroup: pass around cgroup_subsys_state instead of cgroup in file methods ... Browse Code »

cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup.
Please see the previous commit which converts the subsystem methods
for rationale.

This patch converts all cftype file operations to take @css instead of
@cgroup. cftypes for the cgroup core files don't have their subsytem
pointer set. These will automatically use the dummy_css added by the
previous patch and can be converted the same way.

Most subsystem conversions are straight forwards but there are some
interesting ones.

* freezer: update_if_frozen() is also converted to take @css instead
of @cgroup for consistency. This will make the code look simpler
too once iterators are converted to use css.

* memory/vmpressure: mem_cgroup_from_css() needs to be exported to
vmpressure while mem_cgroup_from_cont() can be made static.
Updated accordingly.

* cpu: cgroup_tg() doesn't have any user left. Removed.

* cpuacct: cgroup_ca() doesn't have any user left. Removed.

* hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left.
Removed.

* net_cls: cgrp_cls_state() doesn't have any user left. Removed.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Michal Hocko
Acked-by: Vivek Goyal
Acked-by: Aristeu Rozanski
Acked-by: Daniel Wagner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: Matt Helsley
Cc: Jens Axboe
Cc: Steven Rostedt

Tejun Heo
2013-08-09 08:11:24 +0800
eb95419b0 cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methods ... Browse Code »

cgroup is currently in the process of transitioning to using struct
cgroup_subsys_state * as the primary handle instead of struct cgroup *
in subsystem implementations for the following reasons.

* With unified hierarchy, subsystems will be dynamically bound and
unbound from cgroups and thus css's (cgroup_subsys_state) may be
created and destroyed dynamically over the lifetime of a cgroup,
which is different from the current state where all css's are
allocated and destroyed together with the associated cgroup. This
in turn means that cgroup_css() should be synchronized and may
return NULL, making it more cumbersome to use.

* Differing levels of per-subsystem granularity in the unified
hierarchy means that the task and descendant iterators should behave
differently depending on the specific subsystem the iteration is
being performed for.

* In majority of the cases, subsystems only care about its part in the
cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods
often obtain the matching css pointer from the cgroup and don't
bother with the cgroup pointer itself. Passing around css fits
much better.

This patch converts all cgroup_subsys methods to take @css instead of
@cgroup. The conversions are mostly straight-forward. A few
noteworthy changes are

* ->css_alloc() now takes css of the parent cgroup rather than the
pointer to the new cgroup as the css for the new cgroup doesn't
exist yet. Knowing the parent css is enough for all the existing
subsystems.

* In kernel/cgroup.c::offline_css(), unnecessary open coded css
dereference is replaced with local variable access.

This patch shouldn't cause any behavior differences.

v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced
with local variable @css as suggested by Li Zefan.

Rebased on top of new for-3.12 which includes for-3.11-fixes so
that ->css_free() invocation added by da0a12caff ("cgroup: fix a
leak when percpu_ref_init() fails") is converted too. Suggested
by Li Zefan.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Acked-by: Michal Hocko
Acked-by: Vivek Goyal
Acked-by: Aristeu Rozanski
Acked-by: Daniel Wagner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: Matt Helsley
Cc: Jens Axboe
Cc: Steven Rostedt

Tejun Heo
2013-08-09 08:11:23 +0800
638769869 cgroup: add css_parent() ... Browse Code »

Currently, controllers have to explicitly follow the cgroup hierarchy
to find the parent of a given css. cgroup is moving towards using
cgroup_subsys_state as the main controller interface construct, so
let's provide a way to climb the hierarchy using just csses.

This patch implements css_parent() which, given a css, returns its
parent. The function is guarnateed to valid non-NULL parent css as
long as the target css is not at the top of the hierarchy.

freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
are converted to use css_parent() instead of accessing cgroup->parent
directly.

* __parent_ca() is dropped from cpuacct and its usage is replaced with
parent_ca(). The only difference between the two was NULL test on
cgroup->parent which is now embedded in css_parent() making the
distinction moot. Note that eventually a css->parent field will be
added to css and the NULL check in css_parent() will go away.

This patch shouldn't cause any behavior differences.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2013-08-09 08:11:23 +0800
a7c6d554a cgroup: add/update accessors which obtain subsys specific data from css ... Browse Code »

css (cgroup_subsys_state) is usually embedded in a subsys specific
data structure. Subsystems either use container_of() directly to cast
from css to such data structure or has an accessor function wrapping
such cast. As cgroup as whole is moving towards using css as the main
interface handle, add and update such accessors to ease dealing with
css's.

All accessors explicitly handle NULL input and return NULL in those
cases. While this looks like an extra branch in the code, as all
controllers specific data structures have css as the first field, the
casting doesn't involve any offsetting and the compiler can trivially
optimize out the branch.

* blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
accessor. Added.

* memory, hugetlb and devices already had one but didn't explicitly
handle NULL input. Updated.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2013-08-09 08:11:23 +0800
8af01f56a cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/ ... Browse Code »

The names of the two struct cgroup_subsys_state accessors -
cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
The former clashes with the type name and the latter doesn't even
indicate it's somehow related to cgroup.

We're about to revamp large portion of cgroup API, so, let's rename
them so that they're less awkward. Most per-controller usages of the
accessors are localized in accessor wrappers and given the amount of
scheduled changes, this isn't gonna add any noticeable headache.

Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
to task_css(). This patch is pure rename.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2013-08-09 08:11:22 +0800

20 Nov, 2012

2 commits

92fb97487 cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() ... Browse Code »

Rename cgroup_subsys css lifetime related callbacks to better describe
what their roles are. Also, update documentation.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan

Tejun Heo
2012-11-20 00:13:38 +0800
b1929db42 cgroup: allow ->post_create() to fail ... Browse Code »

There could be cases where controllers want to do initialization
operations which may fail from ->post_create(). This patch makes
->post_create() return -errno to indicate failure and online_css()
relay such failures.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Cc: Glauber Costa

Tejun Heo
2012-11-20 00:13:38 +0800

10 Nov, 2012

6 commits

ef9fe980c cgroup_freezer: implement proper hierarchy support ... Browse Code »

Up until now, cgroup_freezer didn't implement hierarchy properly.
cgroups could be arranged in hierarchy but it didn't make any
difference in how each cgroup_freezer behaved. They all operated
separately.

This patch implements proper hierarchy support. If a cgroup is
frozen, all its descendants are frozen. A cgroup is thawed iff it and
all its ancestors are THAWED. freezer.self_freezing shows the current
freezing state for the cgroup itself. freezer.parent_freezing shows
whether the cgroup is freezing because any of its ancestors is
freezing.

freezer_post_create() locks the parent and new cgroup and inherits the
parent's state and freezer_change_state() applies new state top-down
using cgroup_for_each_descendant_pre() which guarantees that no child
can escape its parent's state. update_if_frozen() uses
cgroup_for_each_descendant_post() to propagate frozen states
bottom-up.

Synchronization could be coarser and easier by using a single mutex to
protect all hierarchy operations. Finer grained approach was used
because it wasn't too difficult for cgroup_freezer and I think it's
beneficial to have an example implementation and cgroup_freezer is
rather simple and can serve a good one.

As this makes cgroup_freezer properly hierarchical,
freezer_subsys.broken_hierarchy marking is removed.

Note that this patch changes userland visible behavior - freezing a
cgroup now freezes all its descendants too. This behavior change is
intended and has been warned via .broken_hierarchy.

v2: Michal spotted a bug in freezer_change_state() - descendants were
inheriting from the wrong ancestor. Fixed.

v3: Documentation/cgroups/freezer-subsystem.txt updated.

Signed-off-by: Tejun Heo
Reviewed-by: Michal Hocko

Tejun Heo
2012-11-10 02:52:30 +0800
5300a9b34 cgroup_freezer: add ->post_create() and ->pre_destroy() and track online state ... Browse Code »

A cgroup is online and visible to iteration between ->post_create()
and ->pre_destroy(). This patch introduces CGROUP_FREEZER_ONLINE and
toggles it from the newly added freezer_post_create() and
freezer_pre_destroy() while holding freezer->lock such that a
cgroup_freezer can be reilably distinguished to be online. This will
be used by full hierarchy support.

ONLINE test is added to freezer_apply_state() but it currently doesn't
make any difference as freezer_write() can only be called for an
online cgroup.

Adjusting system_freezing_cnt on destruction is moved from
freezer_destroy() to the new freezer_pre_destroy() for consistency.

This patch doesn't introduce any noticeable behavior change.

Signed-off-by: Tejun Heo
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Michal Hocko

Tejun Heo
2012-11-10 01:12:30 +0800
a22521806 cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT] ... Browse Code »

Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of
the two flags. This is to prepare for full hierarchy support.

freezer_apply_date() is updated such that it can handle setting and
clearing of both flags. The two flags are also exposed to userland
via read-only files self_freezing and parent_freezing.

Other than the added cgroupfs files, this patch doesn't introduce any
behavior change.

Signed-off-by: Tejun Heo
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Michal Hocko

Tejun Heo
2012-11-10 01:12:30 +0800
d6a2fe134 cgroup_freezer: make freezer->state mask of flags ... Browse Code »

freezer->state was an enum value - one of THAWED, FREEZING and FROZEN.
As the scheduled full hierarchy support requires more than one
freezing condition, switch it to mask of flags. If FREEZING is not
set, it's thawed. FREEZING is set if freezing or frozen. If frozen,
both FREEZING and FROZEN are set. Now that tasks can be attached to
an already frozen cgroup, this also makes freezing condition checks
more natural.

This patch doesn't introduce any behavior change.

Signed-off-by: Tejun Heo
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Michal Hocko

Tejun Heo
2012-11-10 01:12:30 +0800
04a4ec325 cgroup_freezer: prepare freezer_change_state() for full hierarchy support ... Browse Code »

* Make freezer_change_state() take bool @freeze instead of enum
freezer_state.

* Separate out freezer_apply_state() out of freezer_change_state().
This makes freezer_change_state() a rather silly thin wrapper. It
will be filled with hierarchy handling later on.

This patch doesn't introduce any behavior change.

Signed-off-by: Tejun Heo
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Michal Hocko

Tejun Heo
2012-11-10 01:12:30 +0800
bcd66c894 cgroup_freezer: trivial cleanups ... Browse Code »

* Clean-up indentation and line-breaks. Drop the invalid comment
about freezer->lock.

* Make all internal functions take @freezer instead of both @cgroup
and @freezer.

Signed-off-by: Tejun Heo
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Michal Hocko

Tejun Heo
2012-11-10 01:12:29 +0800

27 Oct, 2012

1 commit

5d8f72b55 freezer: change ptrace_stop/do_signal_stop to use freezable_schedule() ... Browse Code »

try_to_freeze_tasks() and cgroup_freezer rely on scheduler locks
to ensure that a task doing STOPPED/TRACED -> RUNNING transition
can't escape freezing. This mostly works, but ptrace_stop() does
not necessarily call schedule(), it can change task->state back to
RUNNING and check freezing() without any lock/barrier in between.

We could add the necessary barrier, but this patch changes
ptrace_stop() and do_signal_stop() to use freezable_schedule().
This fixes the race, freezer_count() and freezer_should_skip()
carefully avoid the race.

And this simplifies the code, try_to_freeze_tasks/update_if_frozen
no longer need to use task_is_stopped_or_traced() checks with the
non trivial assumptions. We can rely on the mechanism which was
specially designed to mark the sleeping task as "frozen enough".

v2: As Tejun pointed out, we can also change get_signal_to_deliver()
and move try_to_freeze() up before 'relock' label.

Signed-off-by: Oleg Nesterov
Signed-off-by: Tejun Heo

Oleg Nesterov
2012-10-27 05:27:49 +0800

21 Oct, 2012

3 commits

ead5c4737 cgroup_freezer: don't use cgroup_lock_live_group() ... Browse Code »

freezer_read/write() used cgroup_lock_live_group() to synchronize
against task migration into and out of the target cgroup.
cgroup_lock_live_group() grabs the internal cgroup lock and using it
from outside cgroup core leads to complex and fragile locking
dependency issues which are difficult to resolve.

Now that freezer_can_attach() is replaced with freezer_attach() and
update_if_frozen() updated, nothing requires excluding migration
against freezer state reads and changes.

This patch removes cgroup_lock_live_group() and the matching
cgroup_unlock() usages. The prone-to-bitrot, already outdated and
unnecessary global lock hierarchy documentation is replaced with
documentation in local scope.

Signed-off-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Rafael J. Wysocki
Cc: Li Zefan

Tejun Heo
2012-10-21 07:33:12 +0800
b4d18311d cgroup_freezer: prepare update_if_frozen() for locking change ... Browse Code »

Locking will change such that migration can happen while
freezer_read/write() is in progress. This means that
update_if_frozen() can no longer assume that all tasks in the cgroup
coform to the current freezer state - newly migrated tasks which
haven't finished freezer_attach() yet might be in any state.

This patch updates update_if_frozen() such that it no longer verifies
task states against freezer state. It now simply decides whether
FREEZING stage is complete.

This removal of verification makes it meaningless to call from
freezer_change_state(). Drop it and move the fast exit test from
freezer_read() - the only left caller - to update_if_frozen().

Signed-off-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Rafael J. Wysocki
Cc: Li Zefan

Tejun Heo
2012-10-21 07:33:08 +0800
8755ade68 cgroup_freezer: allow moving tasks in and out of a frozen cgroup ... Browse Code »

cgroup_freezer is one of the few users of cgroup_subsys->can_attach()
and uses it to prevent tasks from being migrated into or out of a
frozen cgroup. This makes cgroup_freezer cumbersome to use especially
when co-mounted with other controllers.

->can_attach() is problematic in general as it can make co-mounting
multiple cgroups difficult - migrating tasks may fail for reasons
completely irrelevant for other controllers. freezer_can_attach() in
particular is more problematic because it messes with cgroup internal
locking to ensure that the state verification performed at
freezer_can_attach() stays valid until migration is complete.

This patch replaces freezer_can_attach() with freezer_attach() so that
tasks are always allowed to migrate - they are nudged into the
conforming state from freezer_attach(). This means that there can be
tasks which are being migrated which don't conform to the current
cgroup_freezer state until freezer_attach() is complete. Under the
current locking scheme, the only such place is freezer_fork() which is
updated to handle such window.

While this patch doesn't remove the use of internal cgroup locking
from freezer_read/write() paths, it removes the requirement to keep
the freezer state constant while migrating and enables such change.

Note that this creates a userland visible behavior change - FROZEN
cgroup can no longer be used to lock migrations in and out of the
cgroup. This behavior change is intended. I don't think the feature
is necessary - userland should coordinate accesses to cgroup fs anyway
- and even if the feature is needed cgroup_freezer is the completely
wrong place to implement it.

Signed-off-by: Tejun Heo
LKML-Reference:
Cc: Matt Helsley
Cc: Oleg Nesterov
Cc: Rafael J. Wysocki
Cc: Li Zefan

Tejun Heo
2012-10-21 07:28:56 +0800

17 Oct, 2012

3 commits

3c426d5e1 cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks ... Browse Code »

cgroup_freezer doesn't transition from FREEZING to FROZEN if the
cgroup contains PF_NOFREEZE tasks or tasks sleeping with
PF_FREEZER_SKIP set.

Only kernel tasks can be non-freezable (PF_NOFREEZE) and there's
nothing cgroup_freezer or userland can do about or to it. It's
pointless to stall the transition for PF_NOFREEZE tasks.

PF_FREEZER_SKIP indicates that the task can be skipped when
determining whether frozen state is reached. A task with
PF_FREEZER_SKIP is guaranteed to perform try_to_freeze() after it
wakes up and can be considered frozen much like stopped or traced
tasks. Note that a vfork parent uses PF_FREEZER_SKIP while waiting
for the child.

This updates update_if_frozen() such that it only considers freezable
tasks and treats %true freezer_should_skip() tasks as frozen.

This allows cgroups w/ kthreads and vfork parents successfully reach
FROZEN state.

Signed-off-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Rafael J. Wysocki

Tejun Heo
2012-10-17 06:03:14 +0800
51f246ed9 cgroup_freezer: make it official that writes to freezer.state don't fail ... Browse Code »

try_to_freeze_cgroup() has condition checks which are intended to fail
the write operation to freezer.state if there are tasks which can't be
frozen. The condition checks have been broken for quite some time
now. freeze_task() returns %false if the target task can't be frozen,
so num_cant_freeze_now is never incremented.

In addition, strangely, cgroup freezing proceeds even after the write
is failed, which is rather broken.

This patch rips out the non-working code intended to fail the write to
freezer.state when the cgroup contains non-freezable tasks and makes
it official that writes to freezer.state succeed whether there are
non-freezable tasks in the cgroup or not.

This leaves is_task_frozen_enough() with only one user -
upste_if_frozen(). Collapse it into the caller. Note that this
removes an extra call to freezing().

This doesn't cause any userland behavior changes.

Signed-off-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Rafael J. Wysocki

Tejun Heo
2012-10-17 06:03:14 +0800
5edee61ed cgroup: cgroup_subsys->fork() should be called after the task is added to css_set ... Browse Code »

cgroup core has a bug which violates a basic rule about event
notifications - when a new entity needs to be added, you add that to
the notification list first and then make the new entity conform to
the current state. If done in the reverse order, an event happening
inbetween will be lost.

cgroup_subsys->fork() is invoked way before the new task is added to
the css_set. Currently, cgroup_freezer is the only user of ->fork()
and uses it to make new tasks conform to the current state of the
freezer. If FROZEN state is requested while fork is in progress
between cgroup_fork_callbacks() and cgroup_post_fork(), the child
could escape freezing - the cgroup isn't frozen when ->fork() is
called and the freezer couldn't see the new task on the css_set.

This patch moves cgroup_subsys->fork() invocation to
cgroup_post_fork() after the new task is added to the css_set.
cgroup_fork_callbacks() is removed.

Because now a task may be migrated during cgroup_subsys->fork(),
freezer_fork() is updated so that it adheres to the usual RCU locking
and the rather pointless comment on why locking can be different there
is removed (if it doesn't make anything simpler, why even bother?).

Signed-off-by: Tejun Heo
Cc: Oleg Nesterov
Cc: Rafael J. Wysocki
Cc: stable@vger.kernel.org

Tejun Heo
2012-10-17 06:03:14 +0800

15 Sep, 2012

1 commit

8c7f6edbd cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them ... Browse Code »

Currently, cgroup hierarchy support is a mess. cpu related subsystems
behave correctly - configuration, accounting and control on a parent
properly cover its children. blkio and freezer completely ignore
hierarchy and treat all cgroups as if they're directly under the root
cgroup. Others show yet different behaviors.

These differing interpretations of cgroup hierarchy make using cgroup
confusing and it impossible to co-mount controllers into the same
hierarchy and obtain sane behavior.

Eventually, we want full hierarchy support from all subsystems and
probably a unified hierarchy. Users using separate hierarchies
expecting completely different behaviors depending on the mounted
subsystem is deterimental to making any progress on this front.

This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
for controllers which are lacking in hierarchy support. The goal of
this patch is two-fold.

* Move users away from using hierarchy on currently non-hierarchical
subsystems, so that implementing proper hierarchy support on those
doesn't surprise them.

* Keep track of which controllers are broken how and nudge the
subsystems to implement proper hierarchy support.

For now, start with a single warning message. We can whine louder
later on.

v2: Fixed a typo spotted by Michal. Warning message updated.

v3: Updated memcg part so that it doesn't generate warning in the
cases where .use_hierarchy=false doesn't make the behavior
different from root.use_hierarchy=true. Fixed a typo spotted by
Glauber.

v4: Check ->broken_hierarchy after cgroup creation is complete so that
->create() can affect the result per Michal. Dropped unnecessary
memcg root handling per Michal.

Signed-off-by: Tejun Heo
Acked-by: Michal Hocko
Acked-by: Li Zefan
Acked-by: Serge E. Hallyn
Cc: Glauber Costa
Cc: Peter Zijlstra
Cc: Paul Turner
Cc: Johannes Weiner
Cc: Thomas Graf
Cc: Vivek Goyal
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Neil Horman
Cc: Aneesh Kumar K.V

Tejun Heo
2012-09-15 03:01:16 +0800

02 Apr, 2012

1 commit

4baf6e332 cgroup: convert all non-memcg controllers to the new cftype interface ... Browse Code »

Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio,
net_cls and device controllers to use the new cftype based interface.
Termination entry is added to cftype arrays and populate callbacks are
replaced with cgroup_subsys->base_cftypes initializations.

This is functionally identical transformation. There shouldn't be any
visible behavior change.

memcg is rather special and will be converted separately.

Signed-off-by: Tejun Heo
Acked-by: Li Zefan
Cc: Paul Menage
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: "David S. Miller"
Cc: Vivek Goyal

Tejun Heo
2012-04-02 03:09:55 +0800

03 Feb, 2012

1 commit

761b3ef50 cgroup: remove cgroup_subsys argument from callbacks ... Browse Code »

The argument is not used at all, and it's not necessary, because
a specific callback handler of course knows which subsys it
belongs to.

Now only ->pupulate() takes this argument, because the handlers of
this callback always call cgroup_add_file()/cgroup_add_files().

So we reduce a few lines of code, though the shrinking of object size
is minimal.

16 files changed, 113 insertions(+), 162 deletions(-)

text data bss dec hex filename
5486240 656987 7039960 13183187 c928d3 vmlinux.o.orig
5486170 656987 7039960 13183117 c9288d vmlinux.o

Signed-off-by: Li Zefan
Signed-off-by: Tejun Heo

Li Zefan
2012-02-03 01:20:22 +0800

10 Jan, 2012

1 commit

db0c2bf69 Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

* 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cgroup: fix to allow mounting a hierarchy by name
cgroup: move assignement out of condition in cgroup_attach_proc()
cgroup: Remove task_lock() from cgroup_post_fork()
cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
cgroup: only need to check oldcgrp==newgrp once
cgroup: remove redundant get/put of task struct
cgroup: remove redundant get/put of old css_set from migrate
cgroup: Remove unnecessary task_lock before fetching css_set on migration
cgroup: Drop task_lock(parent) on cgroup_fork()
cgroups: remove redundant get/put of css_set from css_set_check_fetched()
resource cgroups: remove bogus cast
cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
cgroup, cpuset: don't use ss->pre_attach()
cgroup: don't use subsys->can_attach_task() or ->attach_task()
cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
cgroup: improve old cgroup handling in cgroup_attach_proc()
cgroup: always lock threadgroup during migration
threadgroup: extend threadgroup_lock() to cover exit and exec
threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
...

Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups:
fix a css_set not found bug in cgroup_attach_proc" that already
mentioned that the bug is fixed (differently) in Tejun's cgroup
patchset. This one, in other words.

Linus Torvalds
2012-01-10 04:59:24 +0800

22 Dec, 2011

1 commit

b00f4dc5f Merge branch 'master' into pm-sleep ... Browse Code »

* master: (848 commits)
SELinux: Fix RCU deref check warning in sel_netport_insert()
binary_sysctl(): fix memory leak
mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
ipmi_watchdog: restore settings when BMC reset
oom: fix integer overflow of points in oom_badness
memcg: keep root group unchanged if creation fails
nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
nilfs2: unbreak compat ioctl
cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
evm: prevent racing during tfm allocation
evm: key must be set once during initialization
mmc: vub300: fix type of firmware_rom_wait_states module parameter
Revert "mmc: enable runtime PM by default"
mmc: sdhci: remove "state" argument from sdhci_suspend_host
x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
IB/qib: Correct sense on freectxts increment and decrement
RDMA/cma: Verify private data length
cgroups: fix a css_set not found bug in cgroup_attach_proc
oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
...

Conflicts:
kernel/cgroup_freezer.c

Rafael J. Wysocki
2011-12-22 04:59:45 +0800

13 Dec, 2011

2 commits

bb9d97b6d cgroup: don't use subsys->can_attach_task() or ->attach_task() ... Browse Code »

Now that subsys->can_attach() and attach() take @tset instead of
@task, they can handle per-task operations. Convert
->can_attach_task() and ->attach_task() users to use ->can_attach()
and attach() instead. Most converions are straight-forward.
Noteworthy changes are,

* In cgroup_freezer, remove unnecessary NULL assignments to unused
methods. It's useless and very prone to get out of sync, which
already happened.

* In cpuset, PF_THREAD_BOUND test is checked for each task. This
doesn't make any practical difference but is conceptually cleaner.

Signed-off-by: Tejun Heo
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Frederic Weisbecker
Acked-by: Li Zefan
Cc: Paul Menage
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: James Morris
Cc: Ingo Molnar
Cc: Peter Zijlstra

Tejun Heo
2011-12-13 10:12:21 +0800
2f7ee5691 cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach() ... Browse Code »

Currently, there's no way to pass multiple tasks to cgroup_subsys
methods necessitating the need for separate per-process and per-task
methods. This patch introduces cgroup_taskset which can be used to
pass multiple tasks and their associated cgroups to cgroup_subsys
methods.

Three methods - can_attach(), cancel_attach() and attach() - are
converted to use cgroup_taskset. This unifies passed parameters so
that all methods have access to all information. Conversions in this
patchset are identical and don't introduce any behavior change.

-v2: documentation updated as per Paul Menage's suggestion.

Signed-off-by: Tejun Heo
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Frederic Weisbecker
Acked-by: Paul Menage
Acked-by: Li Zefan
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: KAMEZAWA Hiroyuki
Cc: James Morris

Tejun Heo
2011-12-13 10:12:21 +0800

25 Nov, 2011

1 commit

884a45d96 cgroup_freezer: fix freezing groups with stopped tasks ... Browse Code »

2d3cbf8b (cgroup_freezer: update_freezer_state() does incorrect state
transitions) removed is_task_frozen_enough and replaced it with a simple
frozen call. This, however, breaks freezing for a group with stopped tasks
because those cannot be frozen and so the group remains in CGROUP_FREEZING
state (update_if_frozen doesn't count stopped tasks) and never reaches
CGROUP_FROZEN.

Let's add is_task_frozen_enough back and use it at the original locations
(update_if_frozen and try_to_freeze_cgroup). Semantically we consider
stopped tasks as frozen enough so we should consider both cases when
testing frozen tasks.

Testcase:
mkdir /dev/freezer
mount -t cgroup -o freezer none /dev/freezer
mkdir /dev/freezer/foo
sleep 1h &
pid=$!
kill -STOP $pid
echo $pid > /dev/freezer/foo/tasks
echo FROZEN > /dev/freezer/foo/freezer.state
while true
do
cat /dev/freezer/foo/freezer.state
[ "`cat /dev/freezer/foo/freezer.state`" = "FROZEN" ] && break
sleep 1
done
echo OK

Signed-off-by: Michal Hocko
Acked-by: Li Zefan
Cc: Tomasz Buchert
Cc: Paul Menage
Cc: Andrew Morton
Cc: stable@kernel.org
Signed-off-by: Tejun Heo

Michal Hocko
2011-11-25 03:58:22 +0800

22 Nov, 2011

1 commit

839e3407d freezer: remove unused @sig_only from freeze_task() ... Browse Code »

After "freezer: make freezing() test freeze conditions in effect
instead of TIF_FREEZE", freezing() returns authoritative answer on
whether the current task should freeze or not and freeze_task()
doesn't need or use @sig_only. Remove it.

While at it, rewrite function comment for freeze_task() and rename
@sig_only to @user_only in try_to_freeze_tasks().

This patch doesn't cause any functional change.

Signed-off-by: Tejun Heo
Acked-by: Oleg Nesterov

Tejun Heo
2011-11-22 04:32:26 +0800