25 Mar, 2010

1 commit

  • commit e6a1105b ("cgroups: subsystem module loading interface") and commit
    c50cc752 ("sched, cgroups: Fix module export") result in duplicate
    including of module.h

    Signed-off-by: Li Zefan
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

13 Mar, 2010

10 commits

  • Events should be removed after rmdir of cgroup directory, but before
    destroying subsystem state objects. Let's take reference to cgroup
    directory dentry to do that.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Acked-by: Li Zefan
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Cc: Dan Malek
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Notify userspace about cgroup removing only after rmdir of cgroup
    directory to avoid race between userspace and kernelspace.

    eventfd are used to notify about two types of event:
    - control file-specific, like crossing memory threshold;
    - cgroup removing.

    To understand what really happen, userspace can check if the cgroup still
    exists. To avoid race beetween userspace and kernelspace we have to
    notify userspace about cgroup removing only after rmdir of cgroup
    directory.

    Signed-off-by: Kirill A. Shutemov
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Acked-by: Li Zefan
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Cc: Dan Malek
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • This patchset introduces eventfd-based API for notifications in cgroups
    and implements memory notifications on top of it.

    It uses statistics in memory controler to track memory usage.

    Output of time(1) on building kernel on tmpfs:

    Root cgroup before changes:
    make -j2 506.37 user 60.93s system 193% cpu 4:52.77 total
    Non-root cgroup before changes:
    make -j2 507.14 user 62.66s system 193% cpu 4:54.74 total
    Root cgroup after changes (0 thresholds):
    make -j2 507.13 user 62.20s system 193% cpu 4:53.55 total
    Non-root cgroup after changes (0 thresholds):
    make -j2 507.70 user 64.20s system 193% cpu 4:55.70 total
    Root cgroup after changes (1 thresholds, never crossed):
    make -j2 506.97 user 62.20s system 193% cpu 4:53.90 total
    Non-root cgroup after changes (1 thresholds, never crossed):
    make -j2 507.55 user 64.08s system 193% cpu 4:55.63 total

    This patch:

    Introduce the write-only file "cgroup.event_control" in every cgroup.

    To register new notification handler you need:
    - create an eventfd;
    - open a control file to be monitored. Callbacks register_event() and
    unregister_event() must be defined for the control file;
    - write " " to cgroup.event_control.
    Interpretation of args is defined by control file implementation;

    eventfd will be woken up by control file implementation or when the
    cgroup is removed.

    To unregister notification handler just close eventfd.

    If you need notification functionality for a control file you have to
    implement callbacks register_event() and unregister_event() in the
    struct cftype.

    [kamezawa.hiroyu@jp.fujitsu.com: Kconfig fix]
    Signed-off-by: Kirill A. Shutemov
    Reviewed-by: KAMEZAWA Hiroyuki
    Paul Menage
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Cc: Dan Malek
    Cc: Vladislav Buzov
    Cc: Daisuke Nishimura
    Cc: Alexander Shishkin
    Cc: Davide Libenzi
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Don't call get_pid_ns() before we locate/alloc the ns.

    Signed-off-by: Li Zefan
    Cc: Serge Hallyn
    Acked-by: Paul Menage
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • Modify the Block I/O cgroup subsystem to be able to be built as a module.
    As the CFQ disk scheduler optionally depends on blk-cgroup, config options
    in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
    enhanced to support the new module dependency.

    Signed-off-by: Ben Blum
    Cc: Li Zefan
    Cc: Paul Menage
    Cc: "David S. Miller"
    Cc: KAMEZAWA Hiroyuki
    Cc: Lai Jiangshan
    Cc: Vivek Goyal
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • Provides support for unloading modular subsystems.

    This patch adds a new function cgroup_unload_subsys which is to be used
    for removing a loaded subsystem during module deletion. Reference
    counting of the subsystems' modules is moved from once (at load time) to
    once per attached hierarchy (in parse_cgroupfs_options and
    rebind_subsystems) (i.e., 0 or 1).

    Signed-off-by: Ben Blum
    Acked-by: Li Zefan
    Cc: Paul Menage
    Cc: "David S. Miller"
    Cc: KAMEZAWA Hiroyuki
    Cc: Lai Jiangshan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • Add interface between cgroups subsystem management and module loading

    This patch implements rudimentary module-loading support for cgroups -
    namely, a cgroup_load_subsys (similar to cgroup_init_subsys) for use as a
    module initcall, and a struct module pointer in struct cgroup_subsys.

    Several functions that might be wanted by modules have had EXPORT_SYMBOL
    added to them, but it's unclear exactly which functions want it and which
    won't.

    Signed-off-by: Ben Blum
    Acked-by: Li Zefan
    Cc: Paul Menage
    Cc: "David S. Miller"
    Cc: KAMEZAWA Hiroyuki
    Cc: Lai Jiangshan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • This patch series provides the ability for cgroup subsystems to be
    compiled as modules both within and outside the kernel tree. This is
    mainly useful for classifiers and subsystems that hook into components
    that are already modules. cls_cgroup and blkio-cgroup serve as the
    example use cases for this feature.

    It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
    which modular subsystems can use to register and depart during runtime.
    The net_cls classifier subsystem serves as the example for a subsystem
    which can be converted into a module using these changes.

    Patch #1 sets up the subsys[] array so its contents can be dynamic as
    modules appear and (eventually) disappear. Iterations over the array are
    modified to handle when subsystems are absent, and the dynamic section of
    the array is protected by cgroup_mutex.

    Patch #2 implements an interface for modules to load subsystems, called
    cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
    pointer in struct cgroup_subsys.

    Patch #3 adds a mechanism for unloading modular subsystems, which includes
    a more advanced rework of the rudimentary reference counting introduced in
    patch 2.

    Patch #4 modifies the net_cls subsystem, which already had some module
    declarations, to be configurable as a module, which also serves as a
    simple proof-of-concept.

    Part of implementing patches 2 and 4 involved updating css pointers in
    each css_set when the module appears or leaves. In doing this, it was
    discovered that css_sets always remain linked to the dummy cgroup,
    regardless of whether or not any subsystems are actually bound to it
    (i.e., not mounted on an actual hierarchy). The subsystem loading and
    unloading code therefore should keep in mind the special cases where the
    added subsystem is the only one in the dummy cgroup (and therefore all
    css_sets need to be linked back into it) and where the removed subsys was
    the only one in the dummy cgroup (and therefore all css_sets should be
    unlinked from it) - however, as all css_sets always stay attached to the
    dummy cgroup anyway, these cases are ignored. Any fix that addresses this
    issue should also make sure these cases are addressed in the subsystem
    loading and unloading code.

    This patch:

    Make subsys[] able to be dynamically populated to support modular
    subsystems

    This patch reworks the way the subsys[] array is used so that subsystems
    can register themselves after boot time, and enables the internals of
    cgroups to be able to handle when subsystems are not present or may
    appear/disappear.

    Signed-off-by: Ben Blum
    Acked-by: Li Zefan
    Cc: Paul Menage
    Cc: "David S. Miller"
    Cc: KAMEZAWA Hiroyuki
    Cc: Lai Jiangshan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • Current css_get() and css_put() increment/decrement css->refcnt one by
    one.

    This patch add a new function __css_get(), which takes "count" as a arg
    and increment the css->refcnt by "count". And this patch also add a new
    arg("count") to __css_put() and change the function to decrement the
    css->refcnt by "count".

    These coalesce version of __css_get()/__css_put() will be used to improve
    performance of memcg's moving charge feature later, where instead of
    calling css_get()/css_put() repeatedly, these new functions will be used.

    No change is needed for current users of css_get()/css_put().

    Signed-off-by: Daisuke Nishimura
    Acked-by: Paul Menage
    Cc: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • Add cancel_attach() operation to struct cgroup_subsys. cancel_attach()
    can be used when can_attach() operation prepares something for the subsys,
    but we should rollback what can_attach() operation has prepared if attach
    task fails after we've succeeded in can_attach().

    Signed-off-by: Daisuke Nishimura
    Acked-by: Li Zefan
    Reviewed-by: Paul Menage
    Cc: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     

25 Feb, 2010

2 commits

  • I have exported it in d11c563 - but cgroups.c did not have module.h included ...

    Cc: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Update the rcu_dereference() usages to take advantage of the new
    lockdep-based checking.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    [ -v2: fix allmodconfig missing symbol export build failure on x86 ]
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

03 Feb, 2010

1 commit

  • In cgroup_create(), if alloc_css_id() returns failure, the errno is not
    propagated to userspace, so mkdir will fail silently.

    To trigger this bug, we mount blkio (or memory subsystem), and create more
    then 65534 cgroups. (The number of cgroups is limited to 65535 if a
    subsystem has use_id == 1)

    # mount -t cgroup -o blkio xxx /mnt
    # for ((i = 0; i < 65534; i++)); do mkdir /mnt/$i; done
    # mkdir /mnt/65534
    (should return ENOSPC)
    #

    Signed-off-by: Li Zefan
    Acked-by: Serge Hallyn
    Acked-by: Paul Menage
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

12 Jan, 2010

1 commit

  • The LTP cgroup test suite generates a "kernel BUG at kernel/cgroup.c:790!"
    here in cgroup_diput():

    /*
    * if we're getting rid of the cgroup, refcount should ensure
    * that there are no pidlists left.
    */
    BUG_ON(!list_empty(&cgrp->pidlists));

    The cgroup pidlist rework in 2.6.32 generates the BUG_ON, which is caused
    when pidlist_array_load() calls cgroup_pidlist_find():

    (1) if a matching cgroup_pidlist is found, it down_write's the mutex of the
    pre-existing cgroup_pidlist, and increments its use_count.
    (2) if no matching cgroup_pidlist is found, then a new one is allocated, it
    down_write's its mutex, and the use_count is set to 0.
    (3) the matching, or new, cgroup_pidlist gets returned back to pidlist_array_load(),
    which increments its use_count -- regardless whether new or pre-existing --
    and up_write's the mutex.

    So if a matching list is ever encountered by cgroup_pidlist_find() during
    the life of a cgroup directory, it results in an inflated use_count value,
    preventing it from ever getting released by cgroup_release_pid_array().
    Then if the directory is subsequently removed, cgroup_diput() hits the
    BUG_ON() when it finds that the directory's cgroup is still populated with
    a pidlist.

    The patch simply removes the use_count increment when a matching pidlist
    is found by cgroup_pidlist_find(), because it gets bumped by the calling
    pidlist_array_load() function while still protected by the list's mutex.

    Signed-off-by: Dave Anderson
    Reviewed-by: Li Zefan
    Acked-by: Ben Blum
    Cc: Paul Menage
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Anderson
     

29 Oct, 2009

1 commit

  • cgroup_write_X64() and cgroup_write_string() ignore the return value of
    strstrip(). it makes small inconsistent behavior.

    example:
    =========================
    # cd /mnt/cgroup/hoge
    # cat memory.swappiness
    60
    # echo "59 " > memory.swappiness
    # cat memory.swappiness
    59
    # echo " 58" > memory.swappiness
    bash: echo: write error: Invalid argument

    This patch fixes it.

    Cc: Li Zefan
    Acked-by: Paul Menage
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

02 Oct, 2009

2 commits


24 Sep, 2009

11 commits

  • Alter the ss->can_attach and ss->attach functions to be able to deal with
    a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a
    pre-patch to cgroup-procs-writable.patch.)

    Currently, new mode of the attach function can only tell the subsystem
    about the old cgroup of the threadgroup leader. No subsystem currently
    needs that information for each thread that's being moved, but if one were
    to be added (for example, one that counts tasks within a group) this bit
    would need to be reworked a bit to tell the subsystem the right
    information.

    [hidave.darkstar@gmail.com: fix build]
    Signed-off-by: Ben Blum
    Signed-off-by: Paul Menage
    Acked-by: Li Zefan
    Reviewed-by: Matt Helsley
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Dave Young
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • Changes css_set freeing mechanism to be under RCU

    This is a prepatch for making the procs file writable. In order to free the
    old css_sets for each task to be moved as they're being moved, the freeing
    mechanism must be RCU-protected, or else we would have to have a call to
    synchronize_rcu() for each task before freeing its old css_set.

    Signed-off-by: Ben Blum
    Signed-off-by: Paul Menage
    Cc: "Paul E. McKenney"
    Acked-by: Li Zefan
    Cc: Matt Helsley
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • Separates all pidlist allocation requests to a separate function that
    judges based on the requested size whether or not the array needs to be
    vmalloced or can be gotten via kmalloc, and similar for kfree/vfree.

    Signed-off-by: Ben Blum
    Signed-off-by: Paul Menage
    Acked-by: Li Zefan
    Cc: Matt Helsley
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • Previously there was the problem in which two processes from different pid
    namespaces reading the tasks or procs file could result in one process
    seeing results from the other's namespace. Rather than one pidlist for
    each file in a cgroup, we now keep a list of pidlists keyed by namespace
    and file type (tasks versus procs) in which entries are placed on demand.
    Each pidlist has its own lock, and that the pidlists themselves are passed
    around in the seq_file's private pointer means we don't have to touch the
    cgroup or its master list except when creating and destroying entries.

    Signed-off-by: Ben Blum
    Signed-off-by: Paul Menage
    Cc: Li Zefan
    Cc: Matt Helsley
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • struct cgroup used to have a bunch of fields for keeping track of the
    pidlist for the tasks file. Those are now separated into a new struct
    cgroup_pidlist, of which two are had, one for procs and one for tasks.
    The way the seq_file operations are set up is changed so that just the
    pidlist struct gets passed around as the private data.

    Interface example: Suppose a multithreaded process has pid 1000 and other
    threads with ids 1001, 1002, 1003:
    $ cat tasks
    1000
    1001
    1002
    1003
    $ cat cgroup.procs
    1000
    $

    Signed-off-by: Ben Blum
    Signed-off-by: Paul Menage
    Acked-by: Li Zefan
    Cc: Matt Helsley
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • The following series adds a "cgroup.procs" file to each cgroup that
    reports unique tgids rather than pids, and allows all threads in a
    threadgroup to be atomically moved to a new cgroup.

    The subsystem "attach" interface is modified to support attaching whole
    threadgroups at a time, which could introduce potential problems if any
    subsystem were to need to access the old cgroup of every thread being
    moved. The attach interface may need to be revised if this becomes the
    case.

    Also added is functionality for read/write locking all CLONE_THREAD
    fork()ing within a threadgroup, by means of an rwsem that lives in the
    sighand_struct, for per-threadgroup-ness and also for sharing a cacheline
    with the sighand's atomic count. This scheme should introduce no extra
    overhead in the fork path when there's no contention.

    The final patch reveals potential for a race when forking before a
    subsystem's attach function is called - one potential solution in case any
    subsystem has this problem is to hang on to the group's fork mutex through
    the attach() calls, though no subsystem yet demonstrates need for an
    extended critical section.

    This patch:

    Revert

    commit 096b7fe012d66ed55e98bc8022405ede0cc80e96
    Author: Li Zefan
    AuthorDate: Wed Jul 29 15:04:04 2009 -0700
    Commit: Linus Torvalds
    CommitDate: Wed Jul 29 19:10:35 2009 -0700

    cgroups: fix pid namespace bug

    This is in preparation for some clashing cgroups changes that subsume the
    original commit's functionaliy.

    The original commit fixed a pid namespace bug which Ben Blum fixed
    independently (in the same way, but with different code) as part of a
    series of patches. I played around with trying to reconcile Ben's patch
    series with Li's patch, but concluded that it was simpler to just revert
    Li's, given that Ben's patch series contained essentially the same fix.

    Signed-off-by: Paul Menage
    Cc: Li Zefan
    Cc: Matt Helsley
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • This patch removes the restriction that a cgroup hierarchy must have at
    least one bound subsystem. The mount option "none" is treated as an
    explicit request for no bound subsystems.

    A hierarchy with no subsystems can be useful for plain task tracking, and
    is also a step towards the support for multiply-bindable subsystems.

    As part of this change, the hierarchy id is no longer calculated from the
    bitmask of subsystems in the hierarchy (since this is not guaranteed to be
    unique) but is allocated via an ida. Reference counts on cgroups from
    css_set objects are now taken explicitly one per hierarchy, rather than
    one per subsystem.

    Example usage:

    mount -t cgroup -o none,name=foo cgroup /mnt/cgroup

    Based on the "no-op"/"none" subsystem concept proposed by
    kamezawa.hiroyu@jp.fujitsu.com

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Currently the cgroups code makes the assumption that the subsystem
    pointers in a struct css_set uniquely identify the hierarchy->cgroup
    mappings associated with the css_set; and there's no way to directly
    identify the associated set of cgroups other than by indirecting through
    the appropriate subsystem state pointers.

    This patch removes the need for that assumption by adding a back-pointer
    from struct cg_cgroup_link object to its associated cgroup; this allows
    the set of cgroups to be determined by traversing the cg_links list in
    the struct css_set.

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • While it's architecturally clean to have the cgroup debug subsystem be
    completely independent of the cgroups framework, it limits its usefulness
    for debugging the contents of internal data structures. Move the debug
    subsystem code into the scope of all the cgroups data structures to make
    more detailed debugging possible.

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • To simplify referring to cgroup hierarchies in mount statements, and to
    allow disambiguation in the presence of empty hierarchies and
    multiply-bindable subsystems this patch adds support for naming a new
    cgroup hierarchy via the "name=" mount option

    A pre-existing hierarchy may be specified by either name or by subsystems;
    a hierarchy's name cannot be changed by a remount operation.

    Example usage:

    # To create a hierarchy called "foo" containing the "cpu" subsystem
    mount -t cgroup -oname=foo,cpu cgroup /mnt/cgroup1

    # To mount the "foo" hierarchy on a second location
    mount -t cgroup -oname=foo cgroup /mnt/cgroup2

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Make the last unlock sequence consistent with previous unlock sequeue.

    Acked-by: Balbir Singh
    Acked-by: Paul Menage
    Signed-off-by: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiaotian Feng
     

23 Sep, 2009

1 commit

  • Make all seq_operations structs const, to help mitigate against
    revectoring user-triggerable function pointers.

    This is derived from the grsecurity patch, although generated from scratch
    because it's simpler than extracting the changes from there.

    Signed-off-by: James Morris
    Acked-by: Serge Hallyn
    Acked-by: Casey Schaufler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    James Morris
     

22 Sep, 2009

2 commits


11 Sep, 2009

1 commit


30 Jul, 2009

2 commits

  • After commit ec64f51545fffbc4cb968f0cea56341a4b07e85a ("cgroup: fix
    frequent -EBUSY at rmdir"), cgroup's rmdir (especially against memcg)
    doesn't return -EBUSY by temporary ref counts. That commit expects all
    refs after pre_destroy() is temporary but...it wasn't. Then, rmdir can
    wait permanently. This patch tries to fix that and change followings.

    - set CGRP_WAIT_ON_RMDIR flag before pre_destroy().
    - clear CGRP_WAIT_ON_RMDIR flag when the subsys finds racy case.
    if there are sleeping ones, wakes them up.
    - rmdir() sleeps only when CGRP_WAIT_ON_RMDIR flag is set.

    Tested-by: Daisuke Nishimura
    Reported-by: Daisuke Nishimura
    Reviewed-by: Paul Menage
    Acked-by: Balbir Sigh
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • The bug was introduced by commit cc31edceee04a7b87f2be48f9489ebb72d264844
    ("cgroups: convert tasks file to use a seq_file with shared pid array").

    We cache a pid array for all threads that are opening the same "tasks"
    file, but the pids in the array are always from the namespace of the
    last process that opened the file, so all other threads will read pids
    from that namespace instead of their own namespaces.

    To fix it, we maintain a list of pid arrays, which is keyed by pid_ns.
    The list will be of length 1 at most time.

    Reported-by: Paul Menage
    Idea-by: Paul Menage
    Signed-off-by: Li Zefan
    Reviewed-by: Serge Hallyn
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

19 Jun, 2009

1 commit

  • The 'noprefix' option was introduced for backwards-compatibility of
    cpuset, but actually it can be used when mounting other subsystems.

    This results in possibility of name collision, and now the collision can
    really happen, because we have 'stat' file in both memory and cpuacct
    subsystem:

    # mount -t cgroup -o noprefix,memory,cpuacct xxx /mnt

    Cgroup will happily mount the 2 subsystems, but only 'stat' file of memory
    subsys can be seen.

    We don't want users to use nopreifx, and also want to avoid name
    collision, so we change to allow noprefix only if mounting just the cpuset
    subsystem.

    [akpm@linux-foundation.org: fix shift for cpuset_subsys_id >= 32]
    Signed-off-by: Li Zefan
    Cc: Paul Menage
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Acked-by: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

12 Jun, 2009

1 commit


09 May, 2009

1 commit


03 Apr, 2009

2 commits

  • This patch tries to fix OOM Killer problems caused by hierarchy.
    Now, memcg itself has OOM KILL function (in oom_kill.c) and tries to
    kill a task in memcg.

    But, when hierarchy is used, it's broken and correct task cannot
    be killed. For example, in following cgroup

    /groupA/ hierarchy=1, limit=1G,
    01 nolimit
    02 nolimit
    All tasks' memory usage under /groupA, /groupA/01, groupA/02 is limited to
    groupA's 1Gbytes but OOM Killer just kills tasks in groupA.

    This patch provides makes the bad process be selected from all tasks
    under hierarchy. BTW, currently, oom_jiffies is updated against groupA
    in above case. oom_jiffies of tree should be updated.

    To see how oom_jiffies is used, please check mem_cgroup_oom_called()
    callers.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: const fix]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Remount can fail in either case:
    - wrong mount options is specified, or option 'noprefix' is changed.
    - a to-be-added subsys is already mounted/active.

    When using remount to change 'release_agent', for the above former failure
    case, remount will return errno with release_agent unchanged, but for the
    latter case, remount will return EBUSY with relase_agent changed, which is
    unexpected I think:

    # mount -t cgroup -o cpu xxx /cgrp1
    # mount -t cgroup -o cpuset,release_agent=agent1 yyy /cgrp2
    # cat /cgrp2/release_agent
    agent1
    # mount -t cgroup -o remount,cpuset,noprefix,release_agent=agent2 yyy /cgrp2
    mount: /cgrp2 not mounted already, or bad option
    # cat /cgrp2/release_agent
    agent1
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan