23 Nov, 2011
1 commit
-
This patch adds in the infrastructure code to create the network priority
cgroup. The cgroup, in addition to the standard processes file creates two
control files:1) prioidx - This is a read-only file that exports the index of this cgroup.
This is a value that is both arbitrary and unique to a cgroup in this subsystem,
and is used to index the per-device priority map2) priomap - This is a writeable file. On read it reports a table of 2-tuples
where name is the name of a network interface and priority is
indicates the priority assigned to frames egresessing on the named interface and
originating from a pid in this cgroupThis cgroup allows for skb priority to be set prior to a root qdisc getting
selected. This is benenficial for DCB enabled systems, in that it allows for any
application to use dcb configured priorities so without application modificationSigned-off-by: Neil Horman
Signed-off-by: John Fastabend
CC: Robert Love
CC: "David S. Miller"
Signed-off-by: David S. Miller
27 May, 2011
1 commit
-
The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:* cgroup creation is out-of-control
* cgroup name can conflict when pids are looping
* it is not possible to have a single process handling a lot of
namespaces without falling in a exponential creation time
* we may want to create a namespace without creating a cgroupThe ns_cgroup was replaced by a compatibility flag 'clone_children',
where a newly created cgroup will copy the parent cgroup values.
The userspace has to manually create a cgroup and add a task to
the 'tasks' file.This patch removes the ns_cgroup as suggested in the following thread:
https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html
The 'cgroup_clone' function is removed because it is no longer used.
This is a userspace-visible change. Commit 45531757b45c ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal. Since that
time we have heard from XXX users who were affected by this.Signed-off-by: Daniel Lezcano
Signed-off-by: Serge E. Hallyn
Cc: Eric W. Biederman
Cc: Jamal Hadi Salim
Reviewed-by: Li Zefan
Acked-by: Paul Menage
Acked-by: Matt Helsley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Feb, 2011
1 commit
-
This kernel patch adds the ability to filter monitoring based on
container groups (cgroups). This is for use in per-cpu mode only.The cgroup to monitor is passed as a file descriptor in the pid
argument to the syscall. The file descriptor must be opened to
the cgroup name in the cgroup filesystem. For instance, if the
cgroup name is foo and cgroupfs is mounted in /cgroup, then the
file descriptor is opened to /cgroup/foo. Cgroup mode is
activated by passing PERF_FLAG_PID_CGROUP in the flags argument
to the syscall.For instance to measure in cgroup foo on CPU1 assuming
cgroupfs is mounted under /cgroup:struct perf_event_attr attr;
int cgroup_fd, fd;cgroup_fd = open("/cgroup/foo", O_RDONLY);
fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP);
close(cgroup_fd);Signed-off-by: Stephane Eranian
[ added perf_cgroup_{exit,attach} ]
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
04 Dec, 2009
1 commit
-
o This is basic implementation of blkio controller cgroup interface. This is
the common interface visible to user space and should be used by different
IO control policies as we implement those.Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe
08 Nov, 2008
1 commit
-
The classifier should cover the most common use case and will work
without any special configuration.The principle of the classifier is to directly access the
task_struct via get_current(). In order for this to work,
classification requests from softirqs must be ignored. This is
not a problem because the vast majority of packets in softirq
context are not assigned to a task anyway. For this to work, a
mechanism is needed to trace softirq context.This repost goes back to the method of relying on the number of
nested bh disable calls for the sake of not adding too much
complexity and the option to come up with something more reliable
if actually needed.Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller
20 Oct, 2008
1 commit
-
This patch implements a new freezer subsystem in the control groups
framework. It provides a way to stop and resume execution of all tasks in
a cgroup by writing in the cgroup filesystem.The freezer subsystem in the container filesystem defines a file named
freezer.state. Writing "FROZEN" to the state file will freeze all tasks
in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in
the cgroup. Reading will return the current state.* Examples of usage :
# mkdir /containers/freezer
# mount -t cgroup -ofreezer freezer /containers
# mkdir /containers/0
# echo $some_pid > /containers/0/tasksto get status of the freezer subsystem :
# cat /containers/0/freezer.state
RUNNINGto freeze all tasks in the container :
# echo FROZEN > /containers/0/freezer.state
# cat /containers/0/freezer.state
FREEZING
# cat /containers/0/freezer.state
FROZENto unfreeze all tasks in the container :
# echo RUNNING > /containers/0/freezer.state
# cat /containers/0/freezer.state
RUNNINGThis is the basic mechanism which should do the right thing for user space
task in a simple scenario.It's important to note that freezing can be incomplete. In that case we
return EBUSY. This means that some tasks in the cgroup are busy doing
something that prevents us from completely freezing the cgroup at this
time. After EBUSY, the cgroup will remain partially frozen -- reflected
by freezer.state reporting "FREEZING" when read. The state will remain
"FREEZING" until one of these things happens:1) Userspace cancels the freezing operation by writing "RUNNING" to
the freezer.state file
2) Userspace retries the freezing operation by writing "FROZEN" to
the freezer.state file (writing "FREEZING" is not legal
and returns EIO)
3) The tasks that blocked the cgroup from entering the "FROZEN"
state disappear from the cgroup's set of tasks.[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: export thaw_process]
Signed-off-by: Cedric Le Goater
Signed-off-by: Matt Helsley
Acked-by: Serge E. Hallyn
Tested-by: Matt Helsley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Apr, 2008
1 commit
-
Implement a cgroup to track and enforce open and mknod restrictions on device
files. A device cgroup associates a device access whitelist with each cgroup.
A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block).
'all' means it applies to all types and all major and minor numbers. Major
and minor are either an integer or * for all. Access is a composition of r
(read), w (write), and m (mknod).The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of
the parent. Admins can then remove devices from the whitelist or add new
entries. A child cgroup can never receive a device access which is denied its
parent. However when a device access is removed from a parent it will not
also be removed from the child(ren).An entry is added using devices.allow, and removed using
devices.deny. For instanceecho 'c 1:3 mr' > /cgroups/1/devices.allow
allows cgroup 1 to read and mknod the device usually known as
/dev/null. Doingecho a > /cgroups/1/devices.deny
will remove the default 'a *:* mrw' entry.
CAP_SYS_ADMIN is needed to change permissions or move another task to a new
cgroup. A cgroup may not be granted more permissions than the cgroup's parent
has. Any task can move itself between cgroups. This won't be sufficient, but
we can decide the best way to adequately restrict movement later.[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
Signed-off-by: Serge E. Hallyn
Acked-by: James Morris
Looks-good-to: Pavel Emelyanov
Cc: Daniel Hokka Zakrisson
Cc: Li Zefan
Cc: Paul Menage
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Mar, 2008
1 commit
-
Rename Memory Controller to Memory Resource Controller. Reflect the same
changes in the CONFIG definition for the Memory Resource Controller. Group
together the config options for Resource Counters and Memory Resource
Controller.Signed-off-by: Balbir Singh
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Feb, 2008
1 commit
-
Make the rt group scheduler compile time configurable.
Keep it experimental for now.Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar
08 Feb, 2008
1 commit
-
Setup the memory cgroup and add basic hooks and controls to integrate
and work with the cgroup.Signed-off-by: Balbir Singh
Cc: Pavel Emelianov
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Nick Piggin
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: David Rientjes
Cc: Vaidyanathan Srinivasan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Dec, 2007
1 commit
-
Commit cfb5285660aad4931b2ebbfa902ea48a37dfffa1 removed a useful feature for
us, which provided a cpu accounting resource controller. This feature would be
useful if someone wants to group tasks only for accounting purpose and doesnt
really want to exercise any control over their cpu consumption.The patch below reintroduces the feature. It is based on Paul Menage's
original patch (Commit 62d0df64065e7c135d0002f069444fbdfc64768f), with
these differences:- Removed load average information. I felt it needs more thought (esp
to deal with SMP and virtualized platforms) and can be added for
2.6.25 after more discussions.
- Convert group cpu usage to be nanosecond accurate (as rest of the cfs
stats are) and invoke cpuacct_charge() from the respective scheduler
classes
- Make accounting scalable on SMP systems by splitting the usage
counter to be per-cpu
- Move the code from kernel/cpu_acct.c to kernel/sched.c (since the
code is not big enough to warrant a new file and also this rightly
needs to live inside the scheduler. Also things like accessing
rq->lock while reading cpu usage becomes easier if the code lived in
kernel/sched.c)The patch also modifies the cpu controller not to provide the same accounting
information.Tested-by: Balbir Singh
Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran
some simple tests like cpuspin (spin on the cpu), ran several tasks in
the same group and timed them. Compared their time stamps with
cpuacct.usage.Signed-off-by: Srivatsa Vaddagiri
Signed-off-by: Balbir Singh
Signed-off-by: Ingo Molnar
15 Nov, 2007
1 commit
-
Revert 62d0df64065e7c135d0002f069444fbdfc64768f.
This was originally intended as a simple initial example of how to create a
control groups subsystem; it wasn't intended for mainline, but I didn't make
this clear enough to Andrew.The CFS cgroup subsystem now has better functionality for the per-cgroup usage
accounting (based directly on CFS stats) than the "usage" status file in this
patch, and the "load" status file is rather simplistic - although having a
per-cgroup load average report would be a useful feature, I don't believe this
patch actually provides it. If it gets into the final 2.6.24 we'd probably
have to support this interface for ever.Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Oct, 2007
6 commits
-
Enable "cgroup" (formerly containers) based fair group scheduling. This
will let administrator create arbitrary groups of tasks (using "cgroup"
pseudo filesystem) and control their cpu bandwidth usage.[akpm@linux-foundation.org: fix cpp condition]
Signed-off-by: Srivatsa Vaddagiri
Signed-off-by: Dhaval Giani
Cc: Randy Dunlap
Cc: Balbir Singh
Cc: Paul Menage
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When a task enters a new namespace via a clone() or unshare(), a new cgroup
is created and the task moves into it.This version names cgroups which are automatically created using
cgroup_clone() as "node_" where pid is the pid of the unsharing or
cloned process. (Thanks Pavel for the idea) This is safe because if the
process unshares again, it will create/cgroups/(...)/node_/node_
The only possibilities (AFAICT) for a -EEXIST on unshare are
1. pid wraparound
2. a process fails an unshare, then tries again.Case 1 is unlikely enough that I ignore it (at least for now). In case 2, the
node_ will be empty and can be rmdir'ed to make the subsequent unshare()
succeed.Changelog:
Name cloned cgroups as "node_".[clg@fr.ibm.com: fix order of cgroup subsystems in init/Kconfig]
Signed-off-by: Serge E. Hallyn
Cc: Paul Menage
Signed-off-by: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This example subsystem exports debugging information as an aid to diagnosing
refcount leaks, etc, in the cgroup framework.Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This example demonstrates how to use the generic cgroup subsystem for a
simple resource tracker that counts, for the processes in a cgroup, the
total CPU time used and the %CPU used in the last complete 10 second interval.Portions contributed by Balbir Singh
Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove the filesystem support logic from the cpusets system and makes cpusets
a cgroup subsystemThe "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get
passed through to the cgroup filesystem with the appropriate options to
emulate the old cpuset filesystem behaviour.Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Generic Process Control Groups
--------------------------There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others. These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:- the userspace APIs are (somewhat) normalised
- it's easier to test e.g. the ResGroups CPU controller in
conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.- the additional kernel footprint of any of the competing resource
management systems is substantially reduced, since it doesn't need
to provide process grouping/containment, hence improving their
chances of getting into the kernelThis patch:
Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.Signed-off-by: Paul Menage
Cc: Serge E. Hallyn
Cc: "Eric W. Biederman"
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Paul Jackson
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: Srivatsa Vaddagiri
Cc: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds