16 Jun, 2011

2 commits


27 May, 2011

3 commits

  • The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
    leads to some problems:

    * cgroup creation is out-of-control
    * cgroup name can conflict when pids are looping
    * it is not possible to have a single process handling a lot of
    namespaces without falling in a exponential creation time
    * we may want to create a namespace without creating a cgroup

    The ns_cgroup was replaced by a compatibility flag 'clone_children',
    where a newly created cgroup will copy the parent cgroup values.
    The userspace has to manually create a cgroup and add a task to
    the 'tasks' file.

    This patch removes the ns_cgroup as suggested in the following thread:

    https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

    The 'cgroup_clone' function is removed because it is no longer used.

    This is a userspace-visible change. Commit 45531757b45c ("cgroup: notify
    ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
    printk warning users that the feature is planned for removal. Since that
    time we have heard from XXX users who were affected by this.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Serge E. Hallyn
    Cc: Eric W. Biederman
    Cc: Jamal Hadi Salim
    Reviewed-by: Li Zefan
    Acked-by: Paul Menage
    Acked-by: Matt Helsley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Lezcano
     
  • Make procs file writable to move all threads by tgid at once.

    Add functionality that enables users to move all threads in a threadgroup
    at once to a cgroup by writing the tgid to the 'cgroup.procs' file. This
    current implementation makes use of a per-threadgroup rwsem that's taken
    for reading in the fork() path to prevent newly forking threads within the
    threadgroup from "escaping" while the move is in progress.

    Signed-off-by: Ben Blum
    Cc: "Eric W. Biederman"
    Cc: Li Zefan
    Cc: Matt Helsley
    Reviewed-by: Paul Menage
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

    Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
    for cgroups's subsystem interface. Unlike can_attach and attach, these
    are for per-thread operations, to be called potentially many times when
    attaching an entire threadgroup.

    Also, the old "bool threadgroup" interface is removed, as replaced by
    this. All subsystems are modified for the new interface - of note is
    cpuset, which requires from/to nodemasks for attach to be globally scoped
    (though per-cpuset would work too) to persist from its pre_attach to
    attach_task and attach.

    This is a pre-patch for cgroup-procs-writable.patch.

    Signed-off-by: Ben Blum
    Cc: "Eric W. Biederman"
    Cc: Li Zefan
    Cc: Matt Helsley
    Reviewed-by: Paul Menage
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     

05 Apr, 2011

1 commit


17 Mar, 2011

1 commit

  • The cgroup documentation does not specify how a process can be removed
    from a particular group. This patch adds a note at the end of the
    simple example about how this is done. Also, some cgroups (like
    cpusets) require user input before a new group can be used. This is
    noted in the patch as well.

    Signed-off-by: Eric B Munson
    Acked-by: Paul Menage
    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Eric B Munson
     

14 Jan, 2011

1 commit


28 Oct, 2010

1 commit

  • The ns_cgroup is a control group interacting with the namespaces. When a
    new namespace is created, a corresponding cgroup is automatically created
    too. The cgroup name is the pid of the process who did 'unshare' or the
    child of 'clone'.

    This cgroup is tied with the namespace because it prevents a process to
    escape the control group and use the post_clone callback, so the child
    cgroup inherits the values of the parent cgroup.

    Unfortunately, the more we use this cgroup and the more we are facing
    problems with it:

    (1) when a process unshares, the cgroup name may conflict with a
    previous cgroup with the same pid, so unshare or clone return -EEXIST

    (2) the cgroup creation is out of control because there may have an
    application creating several namespaces where the system will
    automatically create several cgroups in his back and let them on the
    cgroupfs (eg. a vrf based on the network namespace).

    (3) the mix of (1) and (2) force an administrator to regularly check
    and clean these cgroups.

    This patchset removes the ns_cgroup by adding a new flag to the cgroup and
    the cgroupfs mount option. It enables the copy of the parent cgroup when
    a child cgroup is created. We can then safely remove the ns_cgroup as
    this flag brings a compatibility. We have now to manually create and add
    the task to a cgroup, which is consistent with the cgroup framework.

    This patch:

    Sent as an answer to a previous thread around the ns_cgroup.

    https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html

    It adds a control file 'clone_children' for a cgroup. This control file
    is a boolean specifying if the child cgroup should be a clone of the
    parent cgroup or not. The default value is 'false'.

    This flag makes the child cgroup to call the post_clone callback of all
    the subsystem, if it is available.

    At present, the cpuset is the only one which had implemented the
    post_clone callback.

    The option can be set at mount time by specifying the 'clone_children'
    mount option.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Serge E. Hallyn
    Cc: Eric W. Biederman
    Acked-by: Paul Menage
    Reviewed-by: Li Zefan
    Cc: Jamal Hadi Salim
    Cc: Matt Helsley
    Acked-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Lezcano
     

28 May, 2010

1 commit


21 May, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits)
    vlynq: make whole Kconfig-menu dependant on architecture
    add descriptive comment for TIF_MEMDIE task flag declaration.
    EEPROM: max6875: Header file cleanup
    EEPROM: 93cx6: Header file cleanup
    EEPROM: Header file cleanup
    agp: use NULL instead of 0 when pointer is needed
    rtc-v3020: make bitfield unsigned
    PCI: make bitfield unsigned
    jbd2: use NULL instead of 0 when pointer is needed
    cciss: fix shadows sparse warning
    doc: inode uses a mutex instead of a semaphore.
    uml: i386: Avoid redefinition of NR_syscalls
    fix "seperate" typos in comments
    cocbalt_lcdfb: correct sections
    doc: Change urls for sparse
    Powerpc: wii: Fix typo in comment
    i2o: cleanup some exit paths
    Documentation/: it's -> its where appropriate
    UML: Fix compiler warning due to missing task_struct declaration
    UML: add kernel.h include to signal.c
    ...

    Linus Torvalds
     

25 Apr, 2010

1 commit


16 Mar, 2010

1 commit


13 Mar, 2010

5 commits

  • This patchset introduces eventfd-based API for notifications in cgroups
    and implements memory notifications on top of it.

    It uses statistics in memory controler to track memory usage.

    Output of time(1) on building kernel on tmpfs:

    Root cgroup before changes:
    make -j2 506.37 user 60.93s system 193% cpu 4:52.77 total
    Non-root cgroup before changes:
    make -j2 507.14 user 62.66s system 193% cpu 4:54.74 total
    Root cgroup after changes (0 thresholds):
    make -j2 507.13 user 62.20s system 193% cpu 4:53.55 total
    Non-root cgroup after changes (0 thresholds):
    make -j2 507.70 user 64.20s system 193% cpu 4:55.70 total
    Root cgroup after changes (1 thresholds, never crossed):
    make -j2 506.97 user 62.20s system 193% cpu 4:53.90 total
    Non-root cgroup after changes (1 thresholds, never crossed):
    make -j2 507.55 user 64.08s system 193% cpu 4:55.63 total

    This patch:

    Introduce the write-only file "cgroup.event_control" in every cgroup.

    To register new notification handler you need:
    - create an eventfd;
    - open a control file to be monitored. Callbacks register_event() and
    unregister_event() must be defined for the control file;
    - write " " to cgroup.event_control.
    Interpretation of args is defined by control file implementation;

    eventfd will be woken up by control file implementation or when the
    cgroup is removed.

    To unregister notification handler just close eventfd.

    If you need notification functionality for a control file you have to
    implement callbacks register_event() and unregister_event() in the
    struct cftype.

    [kamezawa.hiroyu@jp.fujitsu.com: Kconfig fix]
    Signed-off-by: Kirill A. Shutemov
    Reviewed-by: KAMEZAWA Hiroyuki
    Paul Menage
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Cc: Dan Malek
    Cc: Vladislav Buzov
    Cc: Daisuke Nishimura
    Cc: Alexander Shishkin
    Cc: Davide Libenzi
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Add a forgotten item into CONTENTS.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Provides support for unloading modular subsystems.

    This patch adds a new function cgroup_unload_subsys which is to be used
    for removing a loaded subsystem during module deletion. Reference
    counting of the subsystems' modules is moved from once (at load time) to
    once per attached hierarchy (in parse_cgroupfs_options and
    rebind_subsystems) (i.e., 0 or 1).

    Signed-off-by: Ben Blum
    Acked-by: Li Zefan
    Cc: Paul Menage
    Cc: "David S. Miller"
    Cc: KAMEZAWA Hiroyuki
    Cc: Lai Jiangshan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • Add interface between cgroups subsystem management and module loading

    This patch implements rudimentary module-loading support for cgroups -
    namely, a cgroup_load_subsys (similar to cgroup_init_subsys) for use as a
    module initcall, and a struct module pointer in struct cgroup_subsys.

    Several functions that might be wanted by modules have had EXPORT_SYMBOL
    added to them, but it's unclear exactly which functions want it and which
    won't.

    Signed-off-by: Ben Blum
    Acked-by: Li Zefan
    Cc: Paul Menage
    Cc: "David S. Miller"
    Cc: KAMEZAWA Hiroyuki
    Cc: Lai Jiangshan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • Add cancel_attach() operation to struct cgroup_subsys. cancel_attach()
    can be used when can_attach() operation prepares something for the subsys,
    but we should rollback what can_attach() operation has prepared if attach
    task fails after we've succeeded in can_attach().

    Signed-off-by: Daisuke Nishimura
    Acked-by: Li Zefan
    Reviewed-by: Paul Menage
    Cc: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     

08 Oct, 2009

1 commit

  • Update documentation of cgroups tasks and procs files

    Document the cgroup.procs file.

    Clarify the semantics of the cgroup.procs and tasks files. Although the
    current cgroup.procs interface returns a sorted and uniqified list of
    pids, potential future performance enhancements could result in those
    properties being removed - explicitly document this aspect of the API.

    There are no existing users of cgroup.procs, so compatibility isn't an
    issue. There are users of the "tasks" file, but none that would appear to
    break in the event of the sorted property being broken. The standard
    "libcpuset" explicitly sorts the results of reading from the tasks file,
    and "libcg" and other users don't appear to care about ordering.

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     

24 Sep, 2009

2 commits

  • Alter the ss->can_attach and ss->attach functions to be able to deal with
    a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a
    pre-patch to cgroup-procs-writable.patch.)

    Currently, new mode of the attach function can only tell the subsystem
    about the old cgroup of the threadgroup leader. No subsystem currently
    needs that information for each thread that's being moved, but if one were
    to be added (for example, one that counts tasks within a group) this bit
    would need to be reworked a bit to tell the subsystem the right
    information.

    [hidave.darkstar@gmail.com: fix build]
    Signed-off-by: Ben Blum
    Signed-off-by: Paul Menage
    Acked-by: Li Zefan
    Reviewed-by: Matt Helsley
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Dave Young
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • To simplify referring to cgroup hierarchies in mount statements, and to
    allow disambiguation in the presence of empty hierarchies and
    multiply-bindable subsystems this patch adds support for naming a new
    cgroup hierarchy via the "name=" mount option

    A pre-existing hierarchy may be specified by either name or by subsystems;
    a hierarchy's name cannot be changed by a remount operation.

    Example usage:

    # To create a hierarchy called "foo" containing the "cpu" subsystem
    mount -t cgroup -oname=foo,cpu cgroup /mnt/cgroup1

    # To mount the "foo" hierarchy on a second location
    mount -t cgroup -oname=foo cgroup /mnt/cgroup2

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     

04 Apr, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
    trivial: Update my email address
    trivial: NULL noise: drivers/mtd/tests/mtd_*test.c
    trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h
    trivial: Fix misspelling of "Celsius".
    trivial: remove unused variable 'path' in alloc_file()
    trivial: fix a pdlfush -> pdflush typo in comment
    trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL
    trivial: wusb: Storage class should be before const qualifier
    trivial: drivers/char/bsr.c: Storage class should be before const qualifier
    trivial: h8300: Storage class should be before const qualifier
    trivial: fix where cgroup documentation is not correctly referred to
    trivial: Give the right path in Documentation example
    trivial: MTD: remove EOL from MODULE_DESCRIPTION
    trivial: Fix typo in bio_split()'s documentation
    trivial: PWM: fix of #endif comment
    trivial: fix typos/grammar errors in Kconfig texts
    trivial: Fix misspelling of firmware
    trivial: cgroups: documentation typo and spelling corrections
    trivial: Update contact info for Jochen Hein
    trivial: fix typo "resgister" -> "register"
    ...

    Linus Torvalds
     

03 Apr, 2009

2 commits

  • This won't remove cpuacct from the mounted hierachy:
    # mount -t cgroup -o cpu,cpuacct xxx /mnt
    # mount -o remount,cpu /mnt

    Because for this usage mount(8) will append the new options to the original
    options.

    And this will get you right:
    # mount [-t cgroup] -o remount,cpu xxx /mnt

    Also document how to specify or change release_agent.

    Signed-off-by: Li Zefan
    Reviewd-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • In following situation, with memory subsystem,

    /groupA use_hierarchy==1
    /01 some tasks
    /02 some tasks
    /03 some tasks
    /04 empty

    When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim
    is triggered and the kernel walks tree under groupA. In this case,
    rmdir /groupA/04 fails with -EBUSY frequently because of temporal
    refcnt from the kernel.

    In general. cgroup can be rmdir'd if there are no children groups and
    no tasks. Frequent fails of rmdir() is not useful to users.
    (And the reason for -EBUSY is unknown to users.....in most cases)

    This patch tries to modify above behavior, by
    - retries if css_refcnt is got by someone.
    - add "return value" to pre_destroy() and allows subsystem to
    say "we're really busy!"

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

30 Mar, 2009

1 commit


19 Feb, 2009

1 commit


16 Jan, 2009

1 commit

  • Move Documentation/cpusets.txt and Documentation/controllers/* to
    Documentation/cgroups/

    Signed-off-by: Li Zefan
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

09 Jan, 2009

2 commits

  • These patches introduce new locking/refcount support for cgroups to
    reduce the need for subsystems to call cgroup_lock(). This will
    ultimately allow the atomicity of cgroup_rmdir() (which was removed
    recently) to be restored.

    These three patches give:

    1/3 - introduce a per-subsystem hierarchy_mutex which a subsystem can
    use to prevent changes to its own cgroup tree

    2/3 - use hierarchy_mutex in place of calling cgroup_lock() in the
    memory controller

    3/3 - introduce a css_tryget() function similar to the one recently
    proposed by Kamezawa, but avoiding spurious refcount failures in
    the event of a race between a css_tryget() and an unsuccessful
    cgroup_rmdir()

    Future patches will likely involve:

    - using hierarchy mutex in place of cgroup_lock() in more subsystems
    where appropriate

    - restoring the atomicity of cgroup_rmdir() with respect to cgroup_create()

    This patch:

    Add a hierarchy_mutex to the cgroup_subsys object that protects changes to
    the hierarchy observed by that subsystem. It is taken by the cgroup
    subsystem (in addition to cgroup_mutex) for the following operations:

    - linking a cgroup into that subsystem's cgroup tree
    - unlinking a cgroup from that subsystem's cgroup tree
    - moving the subsystem to/from a hierarchy (including across the
    bind() callback)

    Thus if the subsystem holds its own hierarchy_mutex, it can safely
    traverse its own hierarchy.

    Signed-off-by: Paul Menage
    Tested-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • - remove 'releasable' since it has been moved to the debug subsys.
    - update lock requirements of subsys callbacks.

    Signed-off-by: Li Zefan
    Cc: Paul Menage
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

20 Oct, 2008

1 commit

  • Describe why we need the freezer subsystem and how to use it in a
    documentation file. Since the cgroups.txt file is focused on the
    subsystem-agnostic portions of cgroups make a directory and move the old
    cgroups.txt file at the same time.

    Signed-off-by: Matt Helsley
    Cc: Paul Menage
    Cc: containers@lists.linux-foundation.org
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Helsley