24 Mar, 2011

2 commits

  • Since transparent huge pages, checking whether memory cgroups are below
    their limits is no longer enough, but the actual amount of chargeable
    space is important.

    To not have more than one limit-checking interface, replace
    memory_cgroup_check_under_limit() and memory_cgroup_check_margin() with a
    single memory_cgroup_margin() that returns the chargeable space and leaves
    the comparison to the callsite.

    Soft limits are now checked the other way round, by using the already
    existing function that returns the amount by which soft limits are
    exceeded: res_counter_soft_limit_excess().

    Also remove all the corresponding functions on the res_counter side that
    are now no longer used.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Acked-by: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Soft limit reclaim continues until the usage is below the current soft
    limit, but the documented semantics are actually that soft limit reclaim
    will push usage back until the soft limits are met again.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Acked-by: Balbir Singh
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

03 Feb, 2011

1 commit

  • If reclaim after a failed charging was unsuccessful, the limits are
    checked again, just in case they settled by means of other tasks.

    This is all fine as long as every charge is of size PAGE_SIZE, because in
    that case, being below the limit means having at least PAGE_SIZE bytes
    available.

    But with transparent huge pages, we may end up in an endless loop where
    charging and reclaim fail, but we keep going because the limits are not
    yet exceeded, although not allowing for a huge page.

    Fix this up by explicitely checking for enough room, not just whether we
    are within limits.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Reviewed-by: Minchan Kim
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

02 Oct, 2009

1 commit

  • This patch clean up/fixes for memcg's uncharge soft limit path.

    Problems:
    Now, res_counter_charge()/uncharge() handles softlimit information at
    charge/uncharge and softlimit-check is done when event counter per memcg
    goes over limit. Now, event counter per memcg is updated only when
    memory usage is over soft limit. Here, considering hierarchical memcg
    management, ancesotors should be taken care of.

    Now, ancerstors(hierarchy) are handled in charge() but not in uncharge().
    This is not good.

    Prolems:
    1. memcg's event counter incremented only when softlimit hits. That's bad.
    It makes event counter hard to be reused for other purpose.

    2. At uncharge, only the lowest level rescounter is handled. This is bug.
    Because ancesotor's event counter is not incremented, children should
    take care of them.

    3. res_counter_uncharge()'s 3rd argument is NULL in most case.
    ops under res_counter->lock should be small. No "if" sentense is better.

    Fixes:
    * Removed soft_limit_xx poitner and checks in charge and uncharge.
    Do-check-only-when-necessary scheme works enough well without them.

    * make event-counter of memcg incremented at every charge/uncharge.
    (per-cpu area will be accessed soon anyway)

    * All ancestors are checked at soft-limit-check. This is necessary because
    ancesotor's event counter may never be modified. Then, they should be
    checked at the same time.

    Reviewed-by: Daisuke Nishimura
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

24 Sep, 2009

2 commits

  • Organize cgroups over soft limit in a RB-Tree

    Introduce an RB-Tree for storing memory cgroups that are over their soft
    limit. The overall goal is to

    1. Add a memory cgroup to the RB-Tree when the soft limit is exceeded.
    We are careful about updates, updates take place only after a particular
    time interval has passed
    2. We remove the node from the RB-Tree when the usage goes below the soft
    limit

    The next set of patches will exploit the RB-Tree to get the group that is
    over its soft limit by the largest amount and reclaim from it, when we
    face memory contention.

    [hugh.dickins@tiscali.co.uk: CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_PREEMPT=y fails to boot]
    Signed-off-by: Balbir Singh
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: KOSAKI Motohiro
    Signed-off-by: Hugh Dickins
    Cc: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Add an interface to allow get/set of soft limits. Soft limits for memory
    plus swap controller (memsw) is currently not supported. Resource
    counters have been enhanced to support soft limits and new type
    RES_SOFT_LIMIT has been added. Unlike hard limits, soft limits can be
    directly set and do not need any reclaim or checks before setting them to
    a newer value.

    Kamezawa-San raised a question as to whether soft limit should belong to
    res_counter. Since all resources understand the basic concepts of hard
    and soft limits, it is justified to add soft limits here. Soft limits are
    a generic resource usage feature, even file system quotas support soft
    limits.

    Signed-off-by: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     

19 Jun, 2009

1 commit

  • We don't have an interface to reset mem.limit or memsw.limit now.

    This patch allows to reset mem.limit or memsw.limit when they are being
    set to -1.

    Signed-off-by: Daisuke Nishimura
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Li Zefan
    Cc: Dhaval Giani
    Cc: YAMAMOTO Takashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     

16 Jan, 2009

1 commit

  • Move Documentation/cpusets.txt and Documentation/controllers/* to
    Documentation/cgroups/

    Signed-off-by: Li Zefan
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

09 Jan, 2009

1 commit

  • Add support for building hierarchies in resource counters. Cgroups allows
    us to build a deep hierarchy, but we currently don't link the resource
    counters belonging to the memory controller control groups, in the same
    fashion as the corresponding cgroup entries in the cgroup hierarchy. This
    patch provides the infrastructure for resource counters that have the same
    hiearchy as their cgroup counter parts.

    These set of patches are based on the resource counter hiearchy patches
    posted by Pavel Emelianov.

    NOTE: Building hiearchies is expensive, deeper hierarchies imply charging
    the all the way up to the root. It is known that hiearchies are
    expensive, so the user needs to be careful and aware of the trade-offs
    before creating very deep ones.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Balbir Singh
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: David Rientjes
    Cc: Pavel Emelianov
    Cc: Dhaval Giani
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     

06 Sep, 2008

1 commit

  • I found we can no longer set limit to 0 with 2.6.27-rcX:
    # mount -t cgroup -omemory xxx /mnt
    # mkdir /mnt/0
    # echo 0 > /mnt/0/memory.limit_in_bytes
    bash: echo: write error: Device or resource busy

    It turned out 'limit' can't be set to 'usage', which is wrong IMO.

    Signed-off-by: Li Zefan
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Acked-by: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

26 Jul, 2008

3 commits

  • Add an interface to set limit. This is necessary to memory resource
    controller because it shrinks usage at set limit.

    Other controllers may not need this interface to shrink usage because
    shrinking is not necessary or impossible.

    Acked-by: Balbir Singh
    Acked-by: Pavel Emelyanov
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Currently res_counter_write() is a raw file handler even though it's
    ultimately taking a number, since in some cases it wants to
    pre-process the string when converting it to a number.

    This patch converts res_counter_write() from a raw file handler to a
    write_string() handler; this allows some of the boilerplate
    copying/locking/checking to be removed, and simplies the cleanup path,
    since these functions are now performed by the cgroups framework.

    [lizf@cn.fujitsu.com: build fix]
    Signed-off-by: Paul Menage
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: Serge Hallyn
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Ignoring their return values may result in counter underflow in the future -
    when the value charged will be uncharged (or in "leaks" - when the value is
    not uncharged).

    This also prevents from using charging routines to decrement the
    counter value (i.e. uncharge it) ;)

    (Current code works OK with res_counter, however :) )

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: Paul Menage
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

29 Apr, 2008

4 commits

  • This is a very common requirement from people using the resource accounting
    facilities (not only memcgroup but also OpenVZ beancounters). They want to
    put the cgroup in an initial state without re-creating it.

    For example after re-configuring a group people want to observe how this new
    configuration fits the group needs without saving the previous failcnt value.

    Merge two resets into one mem_cgroup_reset() function to demonstrate how
    multiplexing work.

    Besides, I have plans to move the files, that correspond to res_counter to the
    res_counter.c file and somehow "import" them into controller. I don't know
    how to make it gracefully yet, but merging resets of max_usage and failcnt in
    one function will be there for sure.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pavel Emelyanov
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The resource counter is supposed to facilitate the resource accounting of
    arbitrary resource (and it already does this for memory controller).

    However, it is about to be used in other resources controllers (swap, kernel
    memory, networking, etc), so provide a doc describing how to work with it.
    This will eliminate all the possible future duplications in the appropriate
    controllers' docs.

    Fixed errors pointed out by Randy.

    [akpm@linux-foundation.org: fix documentation tpyo]
    Signed-off-by: Pavel Emelyanov
    Cc: Randy Dunlap
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This field is the maximal value of the usage one since the counter creation
    (or since the latest reset).

    To reset this to the usage value simply write anything to the appropriate
    cgroup file.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Adds a function for returning the value of a resource counter member, in a
    form suitable for use in a cgroup read_u64 control file method.

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     

08 Feb, 2008

3 commits

  • Change the interface to use bytes instead of pages. Page sizes can vary
    across platforms and configurations. A new strategy routine has been added
    to the resource counters infrastructure to format the data as desired.

    Suggested by David Rientjes, Andrew Morton and Herbert Poetzl

    Tested on a UML setup with the config for memory control enabled.

    [kamezawa.hiroyu@jp.fujitsu.com: possible race fix in res_counter]
    Signed-off-by: Balbir Singh
    Signed-off-by: Pavel Emelianov
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Add the page_cgroup to the per cgroup LRU. The reclaim algorithm has
    been modified to make the isolate_lru_pages() as a pluggable component. The
    scan_control data structure now accepts the cgroup on behalf of which
    reclaims are carried out. try_to_free_pages() has been extended to become
    cgroup aware.

    [akpm@linux-foundation.org: fix warning]
    [Lee.Schermerhorn@hp.com: initialize all scan_control's isolate_pages member]
    [bunk@kernel.org: make do_try_to_free_pages() static]
    [hugh@veritas.com: memcgroup: fix try_to_free order]
    [kamezawa.hiroyu@jp.fujitsu.com: this unlock_page_cgroup() is unnecessary]
    Signed-off-by: Pavel Emelianov
    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Hugh Dickins
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • With fixes from David Rientjes

    Introduce generic structures and routines for resource accounting.

    Each resource accounting cgroup is supposed to aggregate it,
    cgroup_subsystem_state and its resource-specific members within.

    Signed-off-by: Pavel Emelianov
    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: David Rientjes
    Cc: Pavel Emelianov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov