20 Oct, 2007

9 commits

  • This patch is inspired by the discussion at
    http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics
    as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263. The
    patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward
    ported)

    This patch implements per cgroup statistics infrastructure and re-uses
    code from the taskstats interface. A new set of cgroup operations are
    registered with commands and attributes. It should be very easy to
    *extend* per cgroup statistics, by adding members to the cgroupstats
    structure.

    The current model for cgroupstats is a pull, a push model (to post
    statistics on interesting events), should be very easy to add. Currently
    user space requests for statistics by passing the cgroup file
    descriptor. Statistics about the state of all the tasks in the cgroup
    is returned to user space.

    TODO's/NOTE:

    This patch provides an infrastructure for implementing cgroup statistics.
    Based on the needs of each controller, we can incrementally add more statistics,
    event based support for notification of statistics, accumulation of taskstats
    into cgroup statistics in the future.

    Sample output

    # ./cgroupstats -C /cgroup/a
    sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0

    # ./cgroupstats -C /cgroup/
    sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0

    If the approach looks good, I'll enhance and post the user space utility for
    the same

    Feedback, comments, test results are always welcome!

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Add the following files to the cgroup filesystem:

    notify_on_release - configures/reports whether the cgroup subsystem should
    attempt to run a release script when this cgroup becomes unused

    release_agent - configures/reports the release agent to be used for this
    hierarchy (top level in each hierarchy only)

    releasable - reports whether this cgroup would have been auto-released if
    notify_on_release was true and a release agent was configured (mainly useful
    for debugging)

    To avoid locking issues, invoking the userspace release agent is done via a
    workqueue task; cgroups that need to have their release agents invoked by
    the workqueue task are linked on to a list.

    [pj@sgi.com: Need to include kmod.h]
    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Replace the struct css_set embedded in task_struct with a pointer; all tasks
    that have the same set of memberships across all hierarchies will share a
    css_set object, and will be linked via their css_sets field to the "tasks"
    list_head in the css_set.

    Assuming that many tasks share the same cgroup assignments, this reduces
    overall space usage and keeps the size of the task_struct down (three pointers
    added to task_struct compared to a non-cgroups kernel, no matter how many
    subsystems are registered).

    [akpm@linux-foundation.org: fix a printk]
    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Add:

    /proc/cgroups - general system info

    /proc/*/cgroup - per-task cgroup membership info

    [a.p.zijlstra@chello.nl: cgroups: bdi init hooks]
    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Add support for cgroup_clone(), a way to create new cgroups intended to
    be used for systems such as namespace unsharing. A new subsystem callback,
    post_clone(), is added to allow subsystems to automatically configure cloned
    cgroups.

    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • This adds the necessary hooks to the fork() and exit() paths to ensure
    that new children inherit their parent's cgroup assignments, and that
    exiting processes release reference counts on their cgroups.

    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Add write_uint() helper method for cgroup subsystems

    This helper is analagous to the read_uint() helper method for
    reporting u64 values to userspace. It's designed to reduce the amount
    of boilerplate requierd for creating new cgroup subsystems.

    Signed-off-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Add the per-directory "tasks" file for cgroupfs mounts; this allows the
    user to determine which tasks are members of a cgroup by reading a
    cgroup's "tasks", and to move a task into a cgroup by writing its pid to
    its "tasks".

    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Generic Process Control Groups
    --------------------------

    There have recently been various proposals floating around for
    resource management/accounting and other task grouping subsystems in
    the kernel, including ResGroups, User BeanCounters, NSProxy
    cgroups, and others. These all need the basic abstraction of being
    able to group together multiple processes in an aggregate, in order to
    track/limit the resources permitted to those processes, or control
    other behaviour of the processes, and all implement this grouping in
    different ways.

    This patchset provides a framework for tracking and grouping processes
    into arbitrary "cgroups" and assigning arbitrary state to those
    groupings, in order to control the behaviour of the cgroup as an
    aggregate.

    The intention is that the various resource management and
    virtualization/cgroup efforts can also become task cgroup
    clients, with the result that:

    - the userspace APIs are (somewhat) normalised

    - it's easier to test e.g. the ResGroups CPU controller in
    conjunction with the BeanCounters memory controller, or use either of
    them as the resource-control portion of a virtual server system.

    - the additional kernel footprint of any of the competing resource
    management systems is substantially reduced, since it doesn't need
    to provide process grouping/containment, hence improving their
    chances of getting into the kernel

    This patch:

    Add the main task cgroups framework - the cgroup filesystem, and the
    basic structures for tracking membership and associating subsystem state
    objects to tasks.

    Signed-off-by: Paul Menage
    Cc: Serge E. Hallyn
    Cc: "Eric W. Biederman"
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Srivatsa Vaddagiri
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage