13 Sep, 2013

1 commit


05 Jul, 2013

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "The usual stuff from trivial tree"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
    treewide: relase -> release
    Documentation/cgroups/memory.txt: fix stat file documentation
    sysctl/net.txt: delete reference to obsolete 2.4.x kernel
    spinlock_api_smp.h: fix preprocessor comments
    treewide: Fix typo in printk
    doc: device tree: clarify stuff in usage-model.txt.
    open firmware: "/aliasas" -> "/aliases"
    md: bcache: Fixed a typo with the word 'arithmetic'
    irq/generic-chip: fix a few kernel-doc entries
    frv: Convert use of typedef ctl_table to struct ctl_table
    sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
    doc: clk: Fix incorrect wording
    Documentation/arm/IXP4xx fix a typo
    Documentation/networking/ieee802154 fix a typo
    Documentation/DocBook/media/v4l fix a typo
    Documentation/video4linux/si476x.txt fix a typo
    Documentation/virtual/kvm/api.txt fix a typo
    Documentation/early-userspace/README fix a typo
    Documentation/video4linux/soc-camera.txt fix a typo
    lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
    ...

    Linus Torvalds
     

04 Jul, 2013

1 commit


24 Jun, 2013

1 commit


28 May, 2013

1 commit


08 May, 2013

1 commit

  • This exports the amount of anonymous transparent hugepages for each
    memcg via the new "rss_huge" stat in memory.stat. The units are in
    bytes.

    This is helpful to determine the hugepage utilization for individual
    jobs on the system in comparison to rss and opportunities where
    MADV_HUGEPAGE may be helpful.

    The amount of anonymous transparent hugepages is also included in "rss"
    for backwards compatibility.

    Signed-off-by: David Rientjes
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

01 May, 2013

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small
    code cleanups"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits)
    mm: Convert print_symbol to %pSR
    gfs2: Convert print_symbol to %pSR
    m32r: Convert print_symbol to %pSR
    iostats.txt: add easy-to-find description for field 6
    x86 cmpxchg.h: fix wrong comment
    treewide: Fix typo in printk and comments
    doc: devicetree: Fix various typos
    docbook: fix 8250 naming in device-drivers
    pata_pdc2027x: Fix compiler warning
    treewide: Fix typo in printks
    mei: Fix comments in drivers/misc/mei
    treewide: Fix typos in kernel messages
    pm44xx: Fix comment for "CONFIG_CPU_IDLE"
    doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP"
    mmzone: correct "pags" to "pages" in comment.
    kernel-parameters: remove outdated 'noresidual' parameter
    Remove spurious _H suffixes from ifdef comments
    sound: Remove stray pluses from Kconfig file
    radio-shark: Fix printk "CONFIG_LED_CLASS"
    doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE
    ...

    Linus Torvalds
     

30 Apr, 2013

1 commit

  • With this patch userland applications that want to maintain the
    interactivity/memory allocation cost can use the pressure level
    notifications. The levels are defined like this:

    The "low" level means that the system is reclaiming memory for new
    allocations. Monitoring this reclaiming activity might be useful for
    maintaining cache level. Upon notification, the program (typically
    "Activity Manager") might analyze vmstat and act in advance (i.e.
    prematurely shutdown unimportant services).

    The "medium" level means that the system is experiencing medium memory
    pressure, the system might be making swap, paging out active file
    caches, etc. Upon this event applications may decide to further analyze
    vmstat/zoneinfo/memcg or internal memory usage statistics and free any
    resources that can be easily reconstructed or re-read from a disk.

    The "critical" level means that the system is actively thrashing, it is
    about to out of memory (OOM) or even the in-kernel OOM killer is on its
    way to trigger. Applications should do whatever they can to help the
    system. It might be too late to consult with vmstat or any other
    statistics, so it's advisable to take an immediate action.

    The events are propagated upward until the event is handled, i.e. the
    events are not pass-through. Here is what this means: for example you
    have three cgroups: A->B->C. Now you set up an event listener on
    cgroups A, B and C, and suppose group C experiences some pressure. In
    this situation, only group C will receive the notification, i.e. groups
    A and B will not receive it. This is done to avoid excessive
    "broadcasting" of messages, which disturbs the system and which is
    especially bad if we are low on memory or thrashing. So, organize the
    cgroups wisely, or propagate the events manually (or, ask us to
    implement the pass-through events, explaining why would you need them.)

    Performance wise, the memory pressure notifications feature itself is
    lightweight and does not require much of bookkeeping, in contrast to the
    rest of memcg features. Unfortunately, as of current memcg
    implementation, pages accounting is an inseparable part and cannot be
    turned off. The good news is that there are some efforts[1] to improve
    the situation; plus, implementing the same, fully API-compatible[2]
    interface for CONFIG_MEMCG=n case (e.g. embedded) is also a viable
    option, so it will not require any changes on the userland side.

    [1] http://permalink.gmane.org/gmane.linux.kernel.cgroups/6291
    [2] http://lkml.org/lkml/2013/2/21/454

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix CONFIG_CGROPUPS=n warnings]
    Signed-off-by: Anton Vorontsov
    Acked-by: Kirill A. Shutemov
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Tejun Heo
    Cc: David Rientjes
    Cc: Pekka Enberg
    Cc: Mel Gorman
    Cc: Glauber Costa
    Cc: Michal Hocko
    Cc: Luiz Capitulino
    Cc: Greg Thelen
    Cc: Leonid Moiseichuk
    Cc: KOSAKI Motohiro
    Cc: Minchan Kim
    Cc: Bartlomiej Zolnierkiewicz
    Cc: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Vorontsov
     

27 Mar, 2013

1 commit


19 Dec, 2012

2 commits

  • Signed-off-by: Glauber Costa
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Suleiman Souhlal
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     
  • Signed-off-by: Glauber Costa
    Acked-by: Kamezawa Hiroyuki
    Acked-by: Michal Hocko
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Frederic Weisbecker
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Cc: JoonSoo Kim
    Cc: Mel Gorman
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Suleiman Souhlal
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     

12 Dec, 2012

1 commit


17 Nov, 2012

1 commit

  • oom_badness() takes a totalpages argument which says how many pages are
    available and it uses it as a base for the score calculation. The value
    is calculated by mem_cgroup_get_limit which considers both limit and
    total_swap_pages (resp. memsw portion of it).

    This is usually correct but since fe35004fbf9e ("mm: avoid swapping out
    with swappiness==0") we do not swap when swappiness is 0 which means
    that we cannot really use up all the totalpages pages. This in turn
    confuses oom score calculation if the memcg limit is much smaller than
    the available swap because the used memory (capped by the limit) is
    negligible comparing to totalpages so the resulting score is too small
    if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
    A wrong process might be selected as result.

    The problem can be worked around by checking mem_cgroup_swappiness==0
    and not considering swap at all in such a case.

    Signed-off-by: Michal Hocko
    Acked-by: David Rientjes
    Acked-by: Johannes Weiner
    Acked-by: KOSAKI Motohiro
    Acked-by: KAMEZAWA Hiroyuki
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

09 Oct, 2012

1 commit

  • While reading through Documentation/cgroups/memory.txt, I found a number
    of minor wordos and typos. The patch below is a conservative handling of
    some of these: it provides just a number of "obviously correct" fixes to
    the English that improve the readability of the document somewhat.
    Obviously some more significant fixes need to be made to the document, but
    some of those may not be in the "obvious correct" category.

    Signed-off-by: Michael Kerrisk
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk
     

01 Aug, 2012

2 commits

  • Signed-off-by: Wanpeng Li
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanpeng Li
     
  • Sanity:

    CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
    CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
    CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
    CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

    [mhocko@suse.cz: fix missed bits]
    Cc: Glauber Costa
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc: Tejun Heo
    Cc: Aneesh Kumar K.V
    Cc: David Rientjes
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

30 May, 2012

3 commits

  • Presently, at removal of cgroup, ->pre_destroy() is called and moves
    charges to the parent cgroup. A major reason for returning -EBUSY from
    ->pre_destroy() is that the 'moving' hits the parent's resource
    limitation. It happens only when use_hierarchy=0.

    Considering use_hierarchy=0, all cgroups should be flat. So, no one
    cannot justify moving charges to parent...parent and children are in flat
    configuration, not hierarchical.

    This patch modifes the code to move charges to the root cgroup at
    rmdir/force_empty if use_hierarchy==0. This will much simplify rmdir()
    and reduce error in ->pre_destroy.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Aneesh Kumar K.V
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: Frederic Weisbecker
    Cc: Ying Han
    Cc: Glauber Costa
    Reviewed-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • This patch changes memcg's behavior at task_move().

    At task_move(), the kernel scans a task's page table and move the changes
    for mapped pages from source cgroup to target cgroup. There has been a
    bug at handling shared anonymous pages for a long time.

    Before patch:
    - The spec says 'shared anonymous pages are not moved.'
    - The implementation was 'shared anonymoys pages may be moved'.
    If page_mapcount
    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: Naoya Horiguchi
    Cc: Glauber Costa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • The hierarchical versions of per-memcg counters in memory.stat are all
    calculated the same way and are all named total_.

    Documenting the pattern is easier for maintenance than listing each
    counter twice.

    Signed-off-by: Johannes Weiner
    Acked-by: Michal Hocko
    Acked-by: KOSAKI Motohiro
    Acked-by: Ying Han
    Randy Dunlap
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

13 Apr, 2012

1 commit

  • In v3.3-rc1, the global LRU was removed in commit 925b7673cce3 ("mm:
    make per-memcg LRU lists exclusive"). The patch fixes up the memcg
    docs.

    I left the swap session to someone who has better understanding of
    'memory+swap'.

    Signed-off-by: Ying Han
    Acked-by: Michal Hocko
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ying Han
     

13 Jan, 2012

2 commits

  • The two memcg stats pgpgin/pgpgout have different meaning than the ones
    in vmstat, which indicates that we picked a bad naming for them.

    It might be late to change the stat name, but better documentation is
    always helpful.

    Signed-off-by: Ying Han
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ying Han
     
  • It should be memsw.max_usage_in_bytes. This typo has been there for
    a really long time.

    Signed-off-by: Zhu Yanhai
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhu Yanhai
     

23 Dec, 2011

1 commit

  • This reverts commit e5671dfae59b165e2adfd4dfbdeab11ac8db5bda.

    After a follow up discussion with Michal, it was agreed it would
    be better to leave the kmem controller with just the tcp files,
    deferring the behavior of the other general memory.kmem.* files
    for a later time, when more caches are controlled. This is because
    generic kmem files are not used by tcp accounting and it is
    not clear how other slab caches would fit into the scheme.

    We are reverting the original commit so we can track the reference.
    Part of the patch is kept, because it was used by the later tcp
    code. Conflicts are shown in the bottom. init/Kconfig is removed from
    the revert entirely.

    Signed-off-by: Glauber Costa
    Acked-by: Michal Hocko
    CC: Kirill A. Shutemov
    CC: Paul Menage
    CC: Greg Thelen
    CC: Johannes Weiner
    CC: David S. Miller

    Conflicts:

    Documentation/cgroups/memory.txt
    mm/memcontrol.c
    Signed-off-by: David S. Miller

    Glauber Costa
     

13 Dec, 2011

5 commits

  • This patch introduces kmem.tcp.usage_in_bytes file, living in the
    kmem_cgroup filesystem. It is a simple read-only file that displays the
    amount of kernel memory currently consumed by the cgroup.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch uses the "tcp.limit_in_bytes" field of the kmem_cgroup to
    effectively control the amount of kernel memory pinned by a cgroup.

    This value is ignored in the root cgroup, and in all others,
    caps the value specified by the admin in the net namespaces'
    view of tcp_sysctl_mem.

    If namespaces are being used, the admin is allowed to set a
    value bigger than cgroup's maximum, the same way it is allowed
    to set pretty much unlimited values in a real box.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch introduces memory pressure controls for the tcp
    protocol. It uses the generic socket memory pressure code
    introduced in earlier patches, and fills in the
    necessary data in cg_proto struct.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    CC: Eric W. Biederman
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • The goal of this work is to move the memory pressure tcp
    controls to a cgroup, instead of just relying on global
    conditions.

    To avoid excessive overhead in the network fast paths,
    the code that accounts allocated memory to a cgroup is
    hidden inside a static_branch(). This branch is patched out
    until the first non-root cgroup is created. So when nobody
    is using cgroups, even if it is mounted, no significant performance
    penalty should be seen.

    This patch handles the generic part of the code, and has nothing
    tcp-specific.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    CC: Kirill A. Shutemov
    CC: David S. Miller
    CC: Eric W. Biederman
    CC: Eric Dumazet
    Signed-off-by: David S. Miller

    Glauber Costa
     
  • This patch lays down the foundation for the kernel memory component
    of the Memory Controller.

    As of today, I am only laying down the following files:

    * memory.independent_kmem_limit
    * memory.kmem.limit_in_bytes (currently ignored)
    * memory.kmem.usage_in_bytes (always zero)

    Signed-off-by: Glauber Costa
    CC: Kirill A. Shutemov
    CC: Paul Menage
    CC: Greg Thelen
    CC: Johannes Weiner
    CC: Michal Hocko
    Signed-off-by: David S. Miller

    Glauber Costa
     

03 Nov, 2011

1 commit

  • Reclaim decides to skip scanning an active list when the corresponding
    inactive list is above a certain size in comparison to leave the assumed
    working set alone while there are still enough reclaim candidates around.

    The memcg implementation of comparing those lists instead reports whether
    the whole memcg is low on the requested type of inactive pages,
    considering all nodes and zones.

    This can lead to an oversized active list not being scanned because of the
    state of the other lists in the memcg, as well as an active list being
    scanned while its corresponding inactive list has enough pages.

    Not only is this wrong, it's also a scalability hazard, because the global
    memory state over all nodes and zones has to be gathered for each memcg
    and zone scanned.

    Make these calculations purely based on the size of the two LRU lists
    that are actually affected by the outcome of the decision.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Cc: KOSAKI Motohiro
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Reviewed-by: Minchan Kim
    Reviewed-by: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

15 Sep, 2011

1 commit

  • Revert the post-3.0 commit 82f9d486e59f5 ("memcg: add
    memory.vmscan_stat").

    The implementation of per-memcg reclaim statistics violates how memcg
    hierarchies usually behave: hierarchically.

    The reclaim statistics are accounted to child memcgs and the parent
    hitting the limit, but not to hierarchy levels in between. Usually,
    hierarchical statistics are perfectly recursive, with each level
    representing the sum of itself and all its children.

    Since this exports statistics to userspace, this may lead to confusion
    and problems with changing things after the release, so revert it now,
    we can try again later.

    Signed-off-by: Johannes Weiner
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Michal Hocko
    Cc: Ying Han
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

27 Jul, 2011

1 commit

  • The commit log of 0ae5e89c60c9 ("memcg: count the soft_limit reclaim
    in...") says it adds scanning stats to memory.stat file. But it doesn't
    because we considered we needed to make a concensus for such new APIs.

    This patch is a trial to add memory.scan_stat. This shows
    - the number of scanned pages(total, anon, file)
    - the number of rotated pages(total, anon, file)
    - the number of freed pages(total, anon, file)
    - the number of elaplsed time (including sleep/pause time)

    for both of direct/soft reclaim.

    The biggest difference with oringinal Ying's one is that this file
    can be reset by some write, as

    # echo 0 ...../memory.scan_stat

    Example of output is here. This is a result after make -j 6 kernel
    under 300M limit.

    [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
    [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
    scanned_pages_by_limit 9471864
    scanned_anon_pages_by_limit 6640629
    scanned_file_pages_by_limit 2831235
    rotated_pages_by_limit 4243974
    rotated_anon_pages_by_limit 3971968
    rotated_file_pages_by_limit 272006
    freed_pages_by_limit 2318492
    freed_anon_pages_by_limit 962052
    freed_file_pages_by_limit 1356440
    elapsed_ns_by_limit 351386416101
    scanned_pages_by_system 0
    scanned_anon_pages_by_system 0
    scanned_file_pages_by_system 0
    rotated_pages_by_system 0
    rotated_anon_pages_by_system 0
    rotated_file_pages_by_system 0
    freed_pages_by_system 0
    freed_anon_pages_by_system 0
    freed_file_pages_by_system 0
    elapsed_ns_by_system 0
    scanned_pages_by_limit_under_hierarchy 9471864
    scanned_anon_pages_by_limit_under_hierarchy 6640629
    scanned_file_pages_by_limit_under_hierarchy 2831235
    rotated_pages_by_limit_under_hierarchy 4243974
    rotated_anon_pages_by_limit_under_hierarchy 3971968
    rotated_file_pages_by_limit_under_hierarchy 272006
    freed_pages_by_limit_under_hierarchy 2318492
    freed_anon_pages_by_limit_under_hierarchy 962052
    freed_file_pages_by_limit_under_hierarchy 1356440
    elapsed_ns_by_limit_under_hierarchy 351386416101
    scanned_pages_by_system_under_hierarchy 0
    scanned_anon_pages_by_system_under_hierarchy 0
    scanned_file_pages_by_system_under_hierarchy 0
    rotated_pages_by_system_under_hierarchy 0
    rotated_anon_pages_by_system_under_hierarchy 0
    rotated_file_pages_by_system_under_hierarchy 0
    freed_pages_by_system_under_hierarchy 0
    freed_anon_pages_by_system_under_hierarchy 0
    freed_file_pages_by_system_under_hierarchy 0
    elapsed_ns_by_system_under_hierarchy 0

    total_xxxx is for hierarchy management.

    This will be useful for further memcg developments and need to be
    developped before we do some complicated rework on LRU/softlimit
    management.

    This patch adds a new struct memcg_scanrecord into scan_control struct.
    sc->nr_scanned at el is not designed for exporting information. For
    example, nr_scanned is reset frequentrly and incremented +2 at scanning
    mapped pages.

    To avoid complexity, I added a new param in scan_control which is for
    exporting scanning score.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Michal Hocko
    Cc: Ying Han
    Cc: Andrew Bresticker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

16 Jun, 2011

3 commits


29 Apr, 2011

1 commit

  • Since 569b846d ("memcg: coalesce uncharge during unmap/truncate"), we do
    batched (delayed) uncharge at truncation/unmap. And since cdec2e42(memcg:
    coalesce charging via percpu storage), we have percpu cache for
    res_counter.

    These changes improved performance of memory cgroup very much, but made
    res_counter->usage usually have a bigger value than the actual value of
    memory usage. So, *.usage_in_bytes, which show res_counter->usage, are
    not desirable for precise values of memory(and swap) usage anymore.

    Instead of removing these files completely(because we cannot know
    res_counter->usage without them), this patch updates the meaning of those
    files.

    Signed-off-by: Daisuke Nishimura
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Michal Hocko
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     

17 Feb, 2011

1 commit


28 May, 2010

4 commits

  • Some information are old, and I think current document doesn't work as "a
    guide for users". We need summary of all of our controls, at least.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Randy Dunlap
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • This patch adds support for moving charge of file pages, which include
    normal file, tmpfs file and swaps of tmpfs file. It's enabled by setting
    bit 1 of /memory.move_charge_at_immigrate.

    Unlike the case of anonymous pages, file pages(and swaps) in the range
    mmapped by the task will be moved even if the task hasn't done page fault,
    i.e. they might not be the task's "RSS", but other task's "RSS" that maps
    the same file. And mapcount of the page is ignored(the page can be moved
    even if page_mapcount(page) > 1). So, conditions that the page/swap
    should be met to be moved is that it must be in the range mmapped by the
    target task and it must be charged to the old cgroup.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix warning]
    Signed-off-by: Daisuke Nishimura
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • This adds a feature to disable oom-killer for memcg, if disabled, of
    course, tasks under memcg will stop.

    But now, we have oom-notifier for memcg. And the world around memcg is
    not under out-of-memory. memcg's out-of-memory just shows memcg hits
    limit. Then, administrator or management daemon can recover the situation
    by

    - kill some process
    - enlarge limit, add more swap.
    - migrate some tasks
    - remove file cache on tmps (difficult ?)

    Unlike oom-killer, you can take enough information before killing tasks.
    (by gcore, or, ps etc.)

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Considering containers or other resource management softwares in userland,
    event notification of OOM in memcg should be implemented. Now, memcg has
    "threshold" notifier which uses eventfd, we can make use of it for oom
    notification.

    This patch adds oom notification eventfd callback for memcg. The usage is
    very similar to threshold notifier, but control file is memory.oom_control
    and no arguments other than eventfd is required.

    % cgroup_event_notifier /cgroup/A/memory.oom_control dummy
    (About cgroup_event_notifier, see Documentation/cgroup/)

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: David Rientjes
    Cc: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki