24 Sep, 2009

4 commits

  • Soft limits is a new feature for the memory resource controller, something
    similar has existed in the group scheduler in the form of shares. The CPU
    controllers interpretation of shares is very different though.

    Soft limits are the most useful feature to have for environments where the
    administrator wants to overcommit the system, such that only on memory
    contention do the limits become active. The current soft limits
    implementation provides a soft_limit_in_bytes interface for the memory
    controller and not for memory+swap controller. The implementation
    maintains an RB-Tree of groups that exceed their soft limit and starts
    reclaiming from the group that exceeds this limit by the maximum amount.

    This patch:

    Add documentation for soft limits

    Signed-off-by: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Change the memory cgroup to remove the overhead associated with accounting
    all pages in the root cgroup. As a side-effect, we can no longer set a
    memory hard limit in the root cgroup.

    A new flag to track whether the page has been accounted or not has been
    added as well. Flags are now set atomically for page_cgroup,
    pcg_default_flags is now obsolete and removed.

    [akpm@linux-foundation.org: fix a few documentation glitches]
    Signed-off-by: Balbir Singh
    Signed-off-by: Daisuke Nishimura
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Li Zefan
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Alter the ss->can_attach and ss->attach functions to be able to deal with
    a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a
    pre-patch to cgroup-procs-writable.patch.)

    Currently, new mode of the attach function can only tell the subsystem
    about the old cgroup of the threadgroup leader. No subsystem currently
    needs that information for each thread that's being moved, but if one were
    to be added (for example, one that counts tasks within a group) this bit
    would need to be reworked a bit to tell the subsystem the right
    information.

    [hidave.darkstar@gmail.com: fix build]
    Signed-off-by: Ben Blum
    Signed-off-by: Paul Menage
    Acked-by: Li Zefan
    Reviewed-by: Matt Helsley
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Dave Young
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • To simplify referring to cgroup hierarchies in mount statements, and to
    allow disambiguation in the presence of empty hierarchies and
    multiply-bindable subsystems this patch adds support for naming a new
    cgroup hierarchy via the "name=" mount option

    A pre-existing hierarchy may be specified by either name or by subsystems;
    a hierarchy's name cannot be changed by a remount operation.

    Example usage:

    # To create a hierarchy called "foo" containing the "cpu" subsystem
    mount -t cgroup -oname=foo,cpu cgroup /mnt/cgroup1

    # To mount the "foo" hierarchy on a second location
    mount -t cgroup -oname=foo cgroup /mnt/cgroup2

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     

01 Jul, 2009

1 commit

  • By writing a tasks's pid to the file, a process adds that task to that
    cgroup/cpuset. But to add a cpu/mem to a cpuset, the new list of cpus
    should be written to the cpuset.mems file which would replace the old list
    of cpus. Make this clearer in the documentation.

    Signed-off-by: Nikanth Karthikesan
    Signed-off-by: Li Zefan
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikanth Karthikesan
     

19 Jun, 2009

2 commits

  • We don't have an interface to reset mem.limit or memsw.limit now.

    This patch allows to reset mem.limit or memsw.limit when they are being
    set to -1.

    Signed-off-by: Daisuke Nishimura
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Li Zefan
    Cc: Dhaval Giani
    Cc: YAMAMOTO Takashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • A user can set memcg.limit_in_bytes == memcg.memsw.limit_in_bytes when the
    user just want to limit the total size of applications, in other words,
    not very interested in memory usage itself. In this case, swap-out will
    be done only by global-LRU.

    But, under current implementation, memory.limit_in_bytes is checked at
    first and try_to_free_page() may do swap-out. But, that swap-out is
    useless for memsw.limit_in_bytes and the thread may hit limit again.

    This patch tries to fix the current behavior at memory.limit ==
    memsw.limit case. And documentation is updated to explain the behavior of
    this special case.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Li Zefan
    Cc: Dhaval Giani
    Cc: YAMAMOTO Takashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

14 Apr, 2009

2 commits

  • The description about various statistics from memory.stat is not accurate
    and confusing at times.

    Correct this along with a few other minor cleanups.

    Signed-off-by: Bharata B Rao
    Acked-by: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bharata B Rao
     
  • After the introduction of resource counters hierarchies
    (28dbc4b6a01fb579a9441c7b81e3d3413dc452df) the prototypes of
    res_counter_init() and res_counter_charge() have been changed.

    Keep the documentation consistent with the actual function prototypes.

    Signed-off-by: Andrea Righi
    Cc: Paul Menage
    Cc: Pavel Emelyanov
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Righi
     

08 Apr, 2009

1 commit


04 Apr, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
    trivial: Update my email address
    trivial: NULL noise: drivers/mtd/tests/mtd_*test.c
    trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h
    trivial: Fix misspelling of "Celsius".
    trivial: remove unused variable 'path' in alloc_file()
    trivial: fix a pdlfush -> pdflush typo in comment
    trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL
    trivial: wusb: Storage class should be before const qualifier
    trivial: drivers/char/bsr.c: Storage class should be before const qualifier
    trivial: h8300: Storage class should be before const qualifier
    trivial: fix where cgroup documentation is not correctly referred to
    trivial: Give the right path in Documentation example
    trivial: MTD: remove EOL from MODULE_DESCRIPTION
    trivial: Fix typo in bio_split()'s documentation
    trivial: PWM: fix of #endif comment
    trivial: fix typos/grammar errors in Kconfig texts
    trivial: Fix misspelling of firmware
    trivial: cgroups: documentation typo and spelling corrections
    trivial: Update contact info for Jochen Hein
    trivial: fix typo "resgister" -> "register"
    ...

    Linus Torvalds
     

03 Apr, 2009

3 commits

  • This patch tries to fix OOM Killer problems caused by hierarchy.
    Now, memcg itself has OOM KILL function (in oom_kill.c) and tries to
    kill a task in memcg.

    But, when hierarchy is used, it's broken and correct task cannot
    be killed. For example, in following cgroup

    /groupA/ hierarchy=1, limit=1G,
    01 nolimit
    02 nolimit
    All tasks' memory usage under /groupA, /groupA/01, groupA/02 is limited to
    groupA's 1Gbytes but OOM Killer just kills tasks in groupA.

    This patch provides makes the bad process be selected from all tasks
    under hierarchy. BTW, currently, oom_jiffies is updated against groupA
    in above case. oom_jiffies of tree should be updated.

    To see how oom_jiffies is used, please check mem_cgroup_oom_called()
    callers.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: const fix]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • This won't remove cpuacct from the mounted hierachy:
    # mount -t cgroup -o cpu,cpuacct xxx /mnt
    # mount -o remount,cpu /mnt

    Because for this usage mount(8) will append the new options to the original
    options.

    And this will get you right:
    # mount [-t cgroup] -o remount,cpu xxx /mnt

    Also document how to specify or change release_agent.

    Signed-off-by: Li Zefan
    Reviewd-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • In following situation, with memory subsystem,

    /groupA use_hierarchy==1
    /01 some tasks
    /02 some tasks
    /03 some tasks
    /04 empty

    When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim
    is triggered and the kernel walks tree under groupA. In this case,
    rmdir /groupA/04 fails with -EBUSY frequently because of temporal
    refcnt from the kernel.

    In general. cgroup can be rmdir'd if there are no children groups and
    no tasks. Frequent fails of rmdir() is not useful to users.
    (And the reason for -EBUSY is unknown to users.....in most cases)

    This patch tries to modify above behavior, by
    - retries if css_refcnt is got by someone.
    - add "return value" to pre_destroy() and allows subsystem to
    say "we're really busy!"

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

01 Apr, 2009

1 commit

  • Add per-cgroup cpuacct controller statistics like the system and user
    time consumed by the group of tasks.

    Changelog:

    v7
    - Changed the name of the statistic from utime to user and from stime to
    system so that in future we could easily add other statistics like irq,
    softirq, steal times etc easily.

    v6
    - Fixed a bug in the error path of cpuacct_create() (pointed by Li Zefan).

    v5
    - In cpuacct_stats_show(), use cputime64_to_clock_t() since we are
    operating on a 64bit variable here.

    v4
    - Remove comments in cpuacct_update_stats() which explained why rcu_read_lock()
    was needed (as per Peter Zijlstra's review comments).
    - Don't say that percpu_counter_read() is broken in Documentation/cpuacct.txt
    as per KAMEZAWA Hiroyuki's review comments.

    v3
    - Fix a small race in the cpuacct hierarchy walk.

    v2
    - stime and utime now exported in clock_t units instead of msecs.
    - Addressed the code review comments from Balbir and Li Zefan.
    - Moved to -tip tree.

    v1
    - Moved the stime/utime accounting to cpuacct controller.

    Earlier versions
    - http://lkml.org/lkml/2009/2/25/129

    Signed-off-by: Bharata B Rao
    Signed-off-by: Balaji Rao
    Cc: Dhaval Giani
    Cc: Paul Menage
    Cc: Andrew Morton
    Cc: KAMEZAWA Hiroyuki
    Reviewed-by: Li Zefan
    Acked-by: Peter Zijlstra
    Acked-by: Balbir Singh
    Tested-by: Balbir Singh
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Bharata B Rao
     

30 Mar, 2009

3 commits


21 Feb, 2009

1 commit

  • I noticed the old commit 8f5aa26c75b7722e80c0c5c5bb833d41865d7019
    ("cpusets: update_cpumask documentation fix") is not a complete fix,
    resulting in inconsistent paragraphs. This patch fixes it and does other
    fixes and updates:

    - s/migrate_all_tasks()/migrate_live_tasks()/
    - describe more cpuset control files
    - s/cpumask_t/struct cpumask/
    - document cpu hotplug and change of 'sched_relax_domain_level' may cause
    domain rebuild
    - document various ways to query and modify cpusets
    - the equivalent of "mount -t cpuset" is "mount -t cgroup -o cpuset,noprefix"

    Signed-off-by: Li Zefan
    Acked-by: Randy Dunlap
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

19 Feb, 2009

1 commit


30 Jan, 2009

1 commit


16 Jan, 2009

1 commit

  • Move Documentation/cpusets.txt and Documentation/controllers/* to
    Documentation/cgroups/

    Signed-off-by: Li Zefan
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

09 Jan, 2009

2 commits

  • These patches introduce new locking/refcount support for cgroups to
    reduce the need for subsystems to call cgroup_lock(). This will
    ultimately allow the atomicity of cgroup_rmdir() (which was removed
    recently) to be restored.

    These three patches give:

    1/3 - introduce a per-subsystem hierarchy_mutex which a subsystem can
    use to prevent changes to its own cgroup tree

    2/3 - use hierarchy_mutex in place of calling cgroup_lock() in the
    memory controller

    3/3 - introduce a css_tryget() function similar to the one recently
    proposed by Kamezawa, but avoiding spurious refcount failures in
    the event of a race between a css_tryget() and an unsuccessful
    cgroup_rmdir()

    Future patches will likely involve:

    - using hierarchy mutex in place of cgroup_lock() in more subsystems
    where appropriate

    - restoring the atomicity of cgroup_rmdir() with respect to cgroup_create()

    This patch:

    Add a hierarchy_mutex to the cgroup_subsys object that protects changes to
    the hierarchy observed by that subsystem. It is taken by the cgroup
    subsystem (in addition to cgroup_mutex) for the following operations:

    - linking a cgroup into that subsystem's cgroup tree
    - unlinking a cgroup from that subsystem's cgroup tree
    - moving the subsystem to/from a hierarchy (including across the
    bind() callback)

    Thus if the subsystem holds its own hierarchy_mutex, it can safely
    traverse its own hierarchy.

    Signed-off-by: Paul Menage
    Tested-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • - remove 'releasable' since it has been moved to the debug subsys.
    - update lock requirements of subsys callbacks.

    Signed-off-by: Li Zefan
    Cc: Paul Menage
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

13 Nov, 2008

1 commit

  • With this change, control file 'freezer.state' doesn't exist in root
    cgroup, making root cgroup unfreezable.

    I think it's reasonable to disallow freeze tasks in the root cgroup. And
    then we can avoid fork overhead when freezer subsystem is compiled but not
    used.

    Also make writing invalid value to freezer.state returns EINVAL rather
    than EIO. This is more consistent with other cgroup subsystem.

    Signed-off-by: Li Zefan
    Acked-by: Paul Menage
    Cc: Cedric Le Goater
    Cc: Paul Menage
    Cc: Matt Helsley
    Cc: "Serge E. Hallyn"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

20 Oct, 2008

1 commit

  • Describe why we need the freezer subsystem and how to use it in a
    documentation file. Since the cgroups.txt file is focused on the
    subsystem-agnostic portions of cgroups make a directory and move the old
    cgroups.txt file at the same time.

    Signed-off-by: Matt Helsley
    Cc: Paul Menage
    Cc: containers@lists.linux-foundation.org
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Helsley