08 Apr, 2014

1 commit

  • The res_counter_{charge,uncharge}_locked() variants are not used in the
    kernel outside of the resource counter code itself, so remove the
    interface.

    Signed-off-by: David Rientjes
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Tejun Heo
    Cc: Mel Gorman
    Cc: Oleg Nesterov
    Cc: Rik van Riel
    Cc: Jianguo Wu
    Cc: Tim Hockin
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

13 Sep, 2013

3 commits

  • This function dereferences res far too often, so optimize it.

    Signed-off-by: Sha Zhengju
    Signed-off-by: Qiang Huang
    Acked-by: Michal Hocko
    Cc: Daisuke Nishimura
    Cc: Jeff Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sha Zhengju
     
  • Since PAGE_ALIGN is aligning up(the next page boundary), so after
    PAGE_ALIGN, the value might be overflow, such as write the MAX value to
    *.limit_in_bytes.

    $ cat /cgroup/memory/memory.limit_in_bytes
    18446744073709551615

    # echo 18446744073709551615 > /cgroup/memory/memory.limit_in_bytes
    bash: echo: write error: Invalid argument

    Some user programs might depend on such behaviours(like libcg, we read
    the value in snapshot, then use the value to reset cgroup later), and
    that will cause confusion. So we need to fix it.

    Signed-off-by: Sha Zhengju
    Signed-off-by: Qiang Huang
    Acked-by: Michal Hocko
    Cc: Daisuke Nishimura
    Cc: Jeff Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sha Zhengju
     
  • RESOURCE_MAX is far too general name, change it to RES_COUNTER_MAX.

    Signed-off-by: Sha Zhengju
    Signed-off-by: Qiang Huang
    Acked-by: Michal Hocko
    Cc: Daisuke Nishimura
    Cc: Jeff Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sha Zhengju
     

19 Dec, 2012

1 commit

  • It is useful to know how many charges are still left after a call to
    res_counter_uncharge. While it is possible to issue a res_counter_read
    after uncharge, this can be racy.

    If we need, for instance, to take some action when the counters drop down
    to 0, only one of the callers should see it. This is the same semantics
    as the atomic variables in the kernel.

    Since the current return value is void, we don't need to worry about
    anything breaking due to this change: nobody relied on that, and only
    users appearing from now on will be checking this value.

    Signed-off-by: Glauber Costa
    Reviewed-by: Michal Hocko
    Acked-by: Kamezawa Hiroyuki
    Acked-by: David Rientjes
    Cc: Johannes Weiner
    Cc: Suleiman Souhlal
    Cc: Tejun Heo
    Cc: Christoph Lameter
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: JoonSoo Kim
    Cc: Mel Gorman
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     

13 Dec, 2012

1 commit

  • Since commit 628f42355389 ("memcg: limit change shrink usage") both
    res_counter_write() and write_strategy_fn have been unused. This patch
    deletes them both.

    Signed-off-by: Greg Thelen
    Cc: Glauber Costa
    Cc: Tejun Heo
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Frederic Weisbecker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Thelen
     

30 May, 2012

1 commit

  • When killing a res_counter which is a child of other counter, we need to
    do

    res_counter_uncharge(child, xxx)
    res_counter_charge(parent, xxx)

    This is not atomic and wastes CPU. This patch adds
    res_counter_uncharge_until(). This function's uncharge propagates to
    ancestors until specified res_counter.

    res_counter_uncharge_until(child, parent, xxx)

    Now the operation is atomic and efficient.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Aneesh Kumar K.V
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: Ying Han
    Cc: Glauber Costa
    Reviewed-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frederic Weisbecker
     

28 Apr, 2012

2 commits

  • Updating max_usage is something one would expect when we reach
    a new maximum usage value even when we do this by forcing through
    the limit with res_counter_charge_nofail().

    (Whether we want to account failcnt when we force through the limit
    is another debate).

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Tejun Heo
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Glauber Costa
    Acked-by: Kirill A. Shutemov
    Cc: Li Zefan

    Frederic Weisbecker
     
  • These two functions do almost the same thing and duplicate some code.
    Merge their implementation into a single common function.
    res_counter_charge_locked() takes one more parameter but it doesn't seem
    to be used outside res_counter.c yet anyway.

    There is no (intended) change in the behaviour.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Tejun Heo
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Glauber Costa
    Acked-by: Kirill A. Shutemov
    Cc: Li Zefan

    Frederic Weisbecker
     

23 Jan, 2012

1 commit

  • There is a case in __sk_mem_schedule(), where an allocation
    is beyond the maximum, but yet we are allowed to proceed.
    It happens under the following condition:

    sk->sk_wmem_queued + size >= sk->sk_sndbuf

    The network code won't revert the allocation in this case,
    meaning that at some point later it'll try to do it. Since
    this is never communicated to the underlying res_counter
    code, there is an inbalance in res_counter uncharge operation.

    I see two ways of fixing this:

    1) storing the information about those allocations somewhere
    in memcg, and then deducting from that first, before
    we start draining the res_counter,
    2) providing a slightly different allocation function for
    the res_counter, that matches the original behavior of
    the network code more closely.

    I decided to go for #2 here, believing it to be more elegant,
    since #1 would require us to do basically that, but in a more
    obscure way.

    Signed-off-by: Glauber Costa
    Cc: KAMEZAWA Hiroyuki
    Cc: Johannes Weiner
    Cc: Michal Hocko
    CC: Tejun Heo
    CC: Li Zefan
    CC: Laurent Chavey
    Acked-by: Tejun Heo
    Signed-off-by: David S. Miller

    Glauber Costa
     

13 Dec, 2011

1 commit


24 Mar, 2011

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

02 Oct, 2009

1 commit

  • This patch clean up/fixes for memcg's uncharge soft limit path.

    Problems:
    Now, res_counter_charge()/uncharge() handles softlimit information at
    charge/uncharge and softlimit-check is done when event counter per memcg
    goes over limit. Now, event counter per memcg is updated only when
    memory usage is over soft limit. Here, considering hierarchical memcg
    management, ancesotors should be taken care of.

    Now, ancerstors(hierarchy) are handled in charge() but not in uncharge().
    This is not good.

    Prolems:
    1. memcg's event counter incremented only when softlimit hits. That's bad.
    It makes event counter hard to be reused for other purpose.

    2. At uncharge, only the lowest level rescounter is handled. This is bug.
    Because ancesotor's event counter is not incremented, children should
    take care of them.

    3. res_counter_uncharge()'s 3rd argument is NULL in most case.
    ops under res_counter->lock should be small. No "if" sentense is better.

    Fixes:
    * Removed soft_limit_xx poitner and checks in charge and uncharge.
    Do-check-only-when-necessary scheme works enough well without them.

    * make event-counter of memcg incremented at every charge/uncharge.
    (per-cpu area will be accessed soon anyway)

    * All ancestors are checked at soft-limit-check. This is necessary because
    ancesotor's event counter may never be modified. Then, they should be
    checked at the same time.

    Reviewed-by: Daisuke Nishimura
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

24 Sep, 2009

2 commits

  • Organize cgroups over soft limit in a RB-Tree

    Introduce an RB-Tree for storing memory cgroups that are over their soft
    limit. The overall goal is to

    1. Add a memory cgroup to the RB-Tree when the soft limit is exceeded.
    We are careful about updates, updates take place only after a particular
    time interval has passed
    2. We remove the node from the RB-Tree when the usage goes below the soft
    limit

    The next set of patches will exploit the RB-Tree to get the group that is
    over its soft limit by the largest amount and reclaim from it, when we
    face memory contention.

    [hugh.dickins@tiscali.co.uk: CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_PREEMPT=y fails to boot]
    Signed-off-by: Balbir Singh
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: KOSAKI Motohiro
    Signed-off-by: Hugh Dickins
    Cc: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Add an interface to allow get/set of soft limits. Soft limits for memory
    plus swap controller (memsw) is currently not supported. Resource
    counters have been enhanced to support soft limits and new type
    RES_SOFT_LIMIT has been added. Unlike hard limits, soft limits can be
    directly set and do not need any reclaim or checks before setting them to
    a newer value.

    Kamezawa-San raised a question as to whether soft limit should belong to
    res_counter. Since all resources understand the basic concepts of hard
    and soft limits, it is justified to add soft limits here. Soft limits are
    a generic resource usage feature, even file system quotas support soft
    limits.

    Signed-off-by: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     

19 Jun, 2009

1 commit

  • We don't have an interface to reset mem.limit or memsw.limit now.

    This patch allows to reset mem.limit or memsw.limit when they are being
    set to -1.

    Signed-off-by: Daisuke Nishimura
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Li Zefan
    Cc: Dhaval Giani
    Cc: YAMAMOTO Takashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     

09 Jan, 2009

1 commit

  • Add support for building hierarchies in resource counters. Cgroups allows
    us to build a deep hierarchy, but we currently don't link the resource
    counters belonging to the memory controller control groups, in the same
    fashion as the corresponding cgroup entries in the cgroup hierarchy. This
    patch provides the infrastructure for resource counters that have the same
    hiearchy as their cgroup counter parts.

    These set of patches are based on the resource counter hiearchy patches
    posted by Pavel Emelianov.

    NOTE: Building hiearchies is expensive, deeper hierarchies imply charging
    the all the way up to the root. It is known that hiearchies are
    expensive, so the user needs to be careful and aware of the trade-offs
    before creating very deep ones.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Balbir Singh
    Cc: YAMAMOTO Takashi
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: David Rientjes
    Cc: Pavel Emelianov
    Cc: Dhaval Giani
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     

26 Jul, 2008

1 commit

  • Currently res_counter_write() is a raw file handler even though it's
    ultimately taking a number, since in some cases it wants to
    pre-process the string when converting it to a number.

    This patch converts res_counter_write() from a raw file handler to a
    write_string() handler; this allows some of the boilerplate
    copying/locking/checking to be removed, and simplies the cleanup path,
    since these functions are now performed by the cgroups framework.

    [lizf@cn.fujitsu.com: build fix]
    Signed-off-by: Paul Menage
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: Serge Hallyn
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     

29 Apr, 2008

3 commits

  • This field is the maximal value of the usage one since the counter creation
    (or since the latest reset).

    To reset this to the usage value simply write anything to the appropriate
    cgroup file.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Adds a function for returning the value of a resource counter member, in a
    form suitable for use in a cgroup read_u64 control file method.

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Following an experimental deletion of the unnecessary directive

    #include

    from the header file , these files under kernel/ were exposed
    as needing to include one of or , so explicit
    includes were added where necessary.

    Signed-off-by: Robert P. J. Day
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

05 Mar, 2008

1 commit


08 Feb, 2008

2 commits

  • Change the interface to use bytes instead of pages. Page sizes can vary
    across platforms and configurations. A new strategy routine has been added
    to the resource counters infrastructure to format the data as desired.

    Suggested by David Rientjes, Andrew Morton and Herbert Poetzl

    Tested on a UML setup with the config for memory control enabled.

    [kamezawa.hiroyu@jp.fujitsu.com: possible race fix in res_counter]
    Signed-off-by: Balbir Singh
    Signed-off-by: Pavel Emelianov
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: David Rientjes
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • With fixes from David Rientjes

    Introduce generic structures and routines for resource accounting.

    Each resource accounting cgroup is supposed to aggregate it,
    cgroup_subsystem_state and its resource-specific members within.

    Signed-off-by: Pavel Emelianov
    Signed-off-by: Balbir Singh
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: "Eric W. Biederman"
    Cc: Nick Piggin
    Cc: Kirill Korotaev
    Cc: Herbert Poetzl
    Cc: Vaidyanathan Srinivasan
    Signed-off-by: David Rientjes
    Cc: Pavel Emelianov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov