Eric Lee / linux-smarc-t335x-v3.2

26 Jul, 2008

5 commits

e885dcde7 cgroup_clone: use pid of newly created task for new cgroup ... Browse Code »

cgroup_clone creates a new cgroup with the pid of the task. This works
correctly for unshare, but for clone cgroup_clone is called from
copy_namespaces inside copy_process, which happens before the new pid is
created. As a result, the new cgroup was created with current's pid.
This patch:

1. Moves the call inside copy_process to after the new pid
is created
2. Passes the struct pid into ns_cgroup_clone (as it is not
yet attached to the task)
3. Passes a name from ns_cgroup_clone() into cgroup_clone()
so as to keep cgroup_clone() itself simpler
4. Uses pid_vnr() to get the process id value, so that the
pid used to name the new cgroup is always the pid as it
would be known to the task which did the cloning or
unsharing. I think that is the most intuitive thing to
do. This way, task t1 does clone(CLONE_NEWPID) to get
t2, which does clone(CLONE_NEWPID) to get t3, then the
cgroup for t3 will be named for the pid by which t2 knows
t3.

(Thanks to Dan Smith for finding the main bug)

Changelog:
June 11: Incorporate Paul Menage's feedback: don't pass
NULL to ns_cgroup_clone from unshare, and reduce
patch size by using 'nodename' in cgroup_clone.
June 10: Original version

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge Hallyn
Acked-by: Paul Menage
Tested-by: Dan Smith
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2008-07-26 01:53:37 +0800
84eea8428 cgroups: misc cleanups to write_string patchset ... Browse Code »

This patch contains cleanups suggested by reviewers for the recent
write_string() patchset:

- pair cgroup_lock_live_group() with cgroup_unlock() in cgroup.c for
clarity, rather than directly unlocking cgroup_mutex.

- make the return type of cgroup_lock_live_group() a bool

- use a #define'd constant for the local buffer size in read/write functions

Signed-off-by: Paul Menage
Cc: Paul Jackson
Cc: Pavel Emelyanov
Cc: Balbir Singh
Acked-by: Serge Hallyn
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-07-26 01:53:35 +0800
e788e066c cgroup files: move the release_agent file to use typed handlers ... Browse Code »

Adds cgroup_release_agent_write() and cgroup_release_agent_show()
methods to handle writing/reading the path to a cgroup hierarchy's
release agent. As a result, cgroup_common_file_read() is now unnecessary.

As part of the change, a previously-tolerated race in
cgroup_release_agent() is avoided by copying the current
release_agent_path prior to calling call_usermode_helper().

Signed-off-by: Paul Menage
Cc: Paul Jackson
Cc: Pavel Emelyanov
Cc: Balbir Singh
Acked-by: Serge Hallyn
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-07-26 01:53:35 +0800
db3b14978 cgroup files: add write_string cgroup control file method ... Browse Code »

This patch adds a write_string() method for cgroups control files. The
semantics are that a buffer is copied from userspace to kernelspace
and the handler function invoked on that buffer. The buffer is
guaranteed to be nul-terminated, and no longer than max_write_len
(defaulting to 64 bytes if unspecified). Later patches will convert
existing raw file write handlers in control group subsystems to use
this method.

Signed-off-by: Paul Menage
Cc: Paul Jackson
Cc: Pavel Emelyanov
Acked-by: Balbir Singh
Acked-by: Serge Hallyn
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-07-26 01:53:35 +0800
ce16b49d3 cgroup files: clean up whitespace in struct cftype ... Browse Code »

This patch removes some extraneous spaces from method declarations in
struct cftype, to fit in with conventional kernel style.

Signed-off-by: Paul Menage
Cc: Paul Jackson
Cc: Pavel Emelyanov
Cc: Balbir Singh
Cc: Serge Hallyn
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-07-26 01:53:35 +0800

29 Apr, 2008

9 commits

cf475ad28 cgroups: add an owner to the mm_struct ... Browse Code »

Remove the mem_cgroup member from mm_struct and instead adds an owner.

This approach was suggested by Paul Menage. The advantage of this approach
is that, once the mm->owner is known, using the subsystem id, the cgroup
can be determined. It also allows several control groups that are
virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller, to
mm_struct.

A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.

This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes. The mm_cgroup_changed callback is called with the task_lock() of
the new task held and is called just prior to changing the mm->owner.

I am indebted to Paul Menage for the several reviews of this patchset and
helping me make it lighter and simpler.

This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.

After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would be
redirected to the init_css_set's subsystem.

Signed-off-by: Balbir Singh
Cc: Pavel Emelianov
Cc: Hugh Dickins
Cc: Sudhir Kumar
Cc: YAMAMOTO Takashi
Cc: Hirokazu Takahashi
Cc: David Rientjes ,
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Pekka Enberg
Reviewed-by: Paul Menage
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2008-04-29 23:06:10 +0800
29486df32 cgroups: introduce cft->read_seq() ... Browse Code »

Introduce a read_seq() helper in cftype, which uses seq_file to print out
lists. Use it in the devices cgroup. Also split devices.allow into two
files, so now devices.deny and devices.allow are the ones to use to manipulate
the whitelist, while devices.list outputs the cgroup's current whitelist.

Signed-off-by: Serge E. Hallyn
Acked-by: Paul Menage
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2008-04-29 23:06:10 +0800
28fd5dfc1 cgroups: remove the css_set linked-list ... Browse Code »

Now we can run through the hash table instead of running through the
linked-list.

Signed-off-by: Li Zefan
Reviewed-by: Paul Menage
Cc: Balbir Singh
Cc: Pavel Emelyanov
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-04-29 23:06:10 +0800
472b1053f cgroups: use a hash table for css_set finding ... Browse Code »

When we attach a process to a different cgroup, the css_set linked-list will
be run through to find a suitable existing css_set to use. This patch
implements a hash table for better performance.

The following benchmarks have been tested:

For N in 1, 5, 10, 50, 100, 500, 1000, create N cgroups with one sleeping
task in each, and then move an additional task through each cgroup in
turn.

Here is a test result:

N Loop orig - Time(s) hash - Time(s)
----------------------------------------------
1 10000 1.201231728 1.196311177
5 2000 1.065743872 1.040566424
10 1000 0.991054735 0.986876440
50 200 0.976554203 0.969608733
100 100 0.998504680 0.969218270
500 20 1.157347764 0.962602963
1000 10 1.619521852 1.085140172

Signed-off-by: Li Zefan
Reviewed-by: Paul Menage
Cc: Balbir Singh
Cc: Pavel Emelyanov
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-04-29 23:06:09 +0800
d447ea2f3 cgroups: add the trigger callback to struct cftype ... Browse Code »

Trigger callback can be used to receive a kick-up from the user space. The
string written is ignored.

The cftype->private is used for multiplexing events.

Signed-off-by: Pavel Emelyanov
Acked-by: Paul Menage
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-04-29 23:06:09 +0800
e73d2c61d CGroups _s64 files: add cgroups read_s64/write_s64 file methods ... Browse Code »

These patches add cgroups read_s64 and write_s64 control file methods (the
signed equivalent of read_u64/write_u64) and use them to implement the
cpu.rt_runtime_us control file in the CFS cgroup subsystem.

This patch:

These are the signed equivalents of the read_u64/write_u64 methods

Signed-off-by: Paul Menage
Acked-by: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-04-29 23:06:09 +0800
3116f0e3d CGroup API files: move "releasable" to cgroup_debug subsystem ... Browse Code »

The "releasable" control file provided by the cgroup framework exports the
state of a per-cgroup flag that's related to the notify-on-release feature.
This isn't really generally useful, unless you're trying to debug this
particular feature of cgroups.

This patch moves the "releasable" file to the cgroup_debug subsystem.

Signed-off-by: Paul Menage
Cc: "Li Zefan"
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Pavel Emelyanov
Cc: KAMEZAWA Hiroyuki
Cc: "YAMAMOTO Takashi"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-04-29 23:06:09 +0800
917965696 CGroup API files: add cgroup map data type ... Browse Code »

Adds a new type of supported control file representation, a map from strings
to u64 values.

Each map entry is printed as a line in a similar format to /proc/vmstat, i.e.
"$key $value\n"

Signed-off-by: Paul Menage
Cc: "Li Zefan"
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Pavel Emelyanov
Cc: KAMEZAWA Hiroyuki
Cc: "YAMAMOTO Takashi"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-04-29 23:06:08 +0800
f4c753b7e CGroup API files: rename read/write_uint methods to read_write_u64 ... Browse Code »

Several people have justifiably complained that the "_uint" suffix is
inappropriate for functions that handle u64 values, so this patch just renames
all these functions and their users to have the suffic _u64.

[peterz@infradead.org: build fix]
Signed-off-by: Paul Menage
Cc: "Li Zefan"
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Pavel Emelyanov
Cc: KAMEZAWA Hiroyuki
Cc: "YAMAMOTO Takashi"
Signed-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-04-29 23:06:07 +0800

05 Apr, 2008

1 commit

8bab8dded cgroups: add cgroup support for enabling controllers at boot time ... Browse Code »

The effects of cgroup_disable=foo are:

- foo isn't auto-mounted if you mount all cgroups in a single hierarchy
- foo isn't visible as an individually mountable subsystem

As a result there will only ever be one call to foo->create(), at init time;
all processes will stay in this group, and the group will never be mounted on
a visible hierarchy. Any additional effects (e.g. not allocating metadata)
are up to the foo subsystem.

This doesn't handle early_init subsystems (their "disabled" bit isn't set be,
but it could easily be extended to do so if any of the early_init systems
wanted it - I think it would just involve some nastier parameter processing
since it would occur before the command-line argument parser had been run.

Hugh said:

Ballpark figures, I'm trying to get this question out rather than
processing the exact numbers: CONFIG_CGROUP_MEM_RES_CTLR adds 15% overhead
to the affected paths, booting with cgroup_disable=memory cuts that back to
1% overhead (due to slightly bigger struct page).

I'm no expert on distros, they may have no interest whatever in
CONFIG_CGROUP_MEM_RES_CTLR=y; and the rest of us can easily build with or
without it, or apply the cgroup_disable=memory patches.

Unix bench's execl test result on x86_64 was

== just after boot without mounting any cgroup fs.==
mem_cgorup=off : Execl Throughput 43.0 3150.1 732.6
mem_cgroup=on : Execl Throughput 43.0 2932.6 682.0
==

[lizf@cn.fujitsu.com: fix boot option parsing]
Signed-off-by: Balbir Singh
Cc: Paul Menage
Cc: Balbir Singh
Cc: Pavel Emelyanov
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc: Sudhir Kumar
Cc: YAMAMOTO Takashi
Cc: David Rientjes
Signed-off-by: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-04-05 05:46:26 +0800

24 Feb, 2008

2 commits

ffd2d8833 cgroup: clean up cgroup.h ... Browse Code »

- replace old name 'cont' with 'cgrp' (Paul Menage did this cleanup for
cgroup.c in commit bd89aabc6761de1c35b154fe6f914a445d301510)
- remove a duplicate declaration of cgroup_path()

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-02-24 09:13:24 +0800
a043e3b2c cgroup: fix comments ... Browse Code »

fix:
- comments about need_forkexit_callback
- comments about release agent
- typo and comment style, etc.

Signed-off-by: Li Zefan
Acked-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-02-24 09:13:24 +0800

08 Feb, 2008

3 commits

956db3ca0 hotplug cpu: move tasks in empty cpusets to parent ... Browse Code »

This patch corrects a situation that occurs when one disables all the cpus in
a cpuset.

Currently, the disabled (cpu-less) cpuset inherits the cpus of its parent,
which is incorrect because it may then overlap its cpu-exclusive sibling.

Tasks of an empty cpuset should be moved to the cpuset which is the parent of
their current cpuset. Or if the parent cpuset has no cpus, to its parent,
etc.

And the empty cpuset should be released (if it is flagged notify_on_release).

Depends on the cgroup_scan_tasks() function (proposed by David Rientjes) to
iterate through all tasks in the cpu-less cpuset. We are deliberately
avoiding a walk of the tasklist.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Cliff Wickman
Cc: Paul Menage
Cc: Paul Jackson
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cliff Wickman
2008-02-08 00:42:22 +0800
31a7df01f cgroups: mechanism to process each task in a cgroup ... Browse Code »

Provide cgroup_scan_tasks(), which iterates through every task in a cgroup,
calling a test function and a process function for each. And call the process
function without holding the css_set_lock lock.

The idea is David Rientjes', predicting that such a function will make it much
easier in the future to extend things that require access to each task in a
cgroup without holding the lock,

[akpm@linux-foundation.org: cleanup]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Cliff Wickman
Cc: Paul Menage
Cc: Paul Jackson
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cliff Wickman
2008-02-08 00:42:22 +0800
4fca88c87 memory cgroup enhancements: add- pre_destroy() handler ... Browse Code »

Add a handler "pre_destroy" to cgroup_subsys. It is called before
cgroup_rmdir() checks all subsys's refcnt.

I think this is useful for subsys which have some extra refs even if there
are no tasks in cgroup. By adding pre_destroy(), the kernel keeps the rule
"destroy() against subsystem is called only when refcnt=0." and allows css
ref to be used by other objects than tasks.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Eric W. Biederman"
Cc: Balbir Singh
Cc: David Rientjes
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Nick Piggin
Cc: Paul Menage
Cc: Pavel Emelianov
Cc: Peter Zijlstra
Cc: Vaidyanathan Srinivasan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:20 +0800

20 Oct, 2007

9 commits

846c7bb05 Add cgroupstats ... Browse Code »

This patch is inspired by the discussion at
http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263. The
patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
ported)

This patch implements per cgroup statistics infrastructure and re-uses
code from the taskstats interface. A new set of cgroup operations are
registered with commands and attributes. It should be very easy to
*extend* per cgroup statistics, by adding members to the cgroupstats
structure.

The current model for cgroupstats is a pull, a push model (to post
statistics on interesting events), should be very easy to add. Currently
user space requests for statistics by passing the cgroup file
descriptor. Statistics about the state of all the tasks in the cgroup
is returned to user space.

TODO's/NOTE:

This patch provides an infrastructure for implementing cgroup statistics.
Based on the needs of each controller, we can incrementally add more statistics,
event based support for notification of statistics, accumulation of taskstats
into cgroup statistics in the future.

Sample output

# ./cgroupstats -C /cgroup/a
sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0

# ./cgroupstats -C /cgroup/
sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0

If the approach looks good, I'll enhance and post the user space utility for
the same

Feedback, comments, test results are always welcome!

[akpm@linux-foundation.org: build fix]
Signed-off-by: Balbir Singh
Cc: Paul Menage
Cc: Jay Lan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2007-10-20 02:53:36 +0800
81a6a5cdd Task Control Groups: automatic userspace notification of idle cgroups ... Browse Code »

Add the following files to the cgroup filesystem:

notify_on_release - configures/reports whether the cgroup subsystem should
attempt to run a release script when this cgroup becomes unused

release_agent - configures/reports the release agent to be used for this
hierarchy (top level in each hierarchy only)

releasable - reports whether this cgroup would have been auto-released if
notify_on_release was true and a release agent was configured (mainly useful
for debugging)

To avoid locking issues, invoking the userspace release agent is done via a
workqueue task; cgroups that need to have their release agents invoked by
the workqueue task are linked on to a list.

[pj@sgi.com: Need to include kmod.h]
Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2007-10-20 02:53:36 +0800
817929ec2 Task Control Groups: shared cgroup subsystem group arrays ... Browse Code »

Replace the struct css_set embedded in task_struct with a pointer; all tasks
that have the same set of memberships across all hierarchies will share a
css_set object, and will be linked via their css_sets field to the "tasks"
list_head in the css_set.

Assuming that many tasks share the same cgroup assignments, this reduces
overall space usage and keeps the size of the task_struct down (three pointers
added to task_struct compared to a non-cgroups kernel, no matter how many
subsystems are registered).

[akpm@linux-foundation.org: fix a printk]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2007-10-20 02:53:36 +0800
a424316ca Task Control Groups: add procfs interface ... Browse Code »

Add:

/proc/cgroups - general system info

/proc/*/cgroup - per-task cgroup membership info

[a.p.zijlstra@chello.nl: cgroups: bdi init hooks]
Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Signed-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2007-10-20 02:53:36 +0800
697f41610 Task Control Groups: add cgroup_clone() interface ... Browse Code »

Add support for cgroup_clone(), a way to create new cgroups intended to
be used for systems such as namespace unsharing. A new subsystem callback,
post_clone(), is added to allow subsystems to automatically configure cloned
cgroups.

Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2007-10-20 02:53:36 +0800
b4f48b636 Task Control Groups: add fork()/exit() hooks ... Browse Code »

This adds the necessary hooks to the fork() and exit() paths to ensure
that new children inherit their parent's cgroup assignments, and that
exiting processes release reference counts on their cgroups.

Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2007-10-20 02:53:36 +0800
355e0c48b Add cgroup write_uint() helper method ... Browse Code »

Add write_uint() helper method for cgroup subsystems

This helper is analagous to the read_uint() helper method for
reporting u64 values to userspace. It's designed to reduce the amount
of boilerplate requierd for creating new cgroup subsystems.

Signed-off-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2007-10-20 02:53:36 +0800
bbcb81d09 Task Control Groups: add tasks file interface ... Browse Code »

Add the per-directory "tasks" file for cgroupfs mounts; this allows the
user to determine which tasks are members of a cgroup by reading a
cgroup's "tasks", and to move a task into a cgroup by writing its pid to
its "tasks".

Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2007-10-20 02:53:36 +0800
ddbcc7e8e Task Control Groups: basic task cgroup framework ... Browse Code »

Generic Process Control Groups
--------------------------

There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others. These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.

This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.

The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:

- the userspace APIs are (somewhat) normalised

- it's easier to test e.g. the ResGroups CPU controller in
conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.

- the additional kernel footprint of any of the competing resource
management systems is substantially reduced, since it doesn't need
to provide process grouping/containment, hence improving their
chances of getting into the kernel

This patch:

Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.

Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2007-10-20 02:53:36 +0800