Doug / smarc-fsl-linux-kernel | Embedian Git Server

13 Sep, 2013

1 commit

9cb2dc1c9 memcg: document cgroup dirty/writeback memory statistics ... Browse Code »

Signed-off-by: Sha Zhengju
Cc: Fengguang Wu
Cc: Greg Thelen
Cc: KAMEZAWA Hiroyuki
Cc: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sha Zhengju
2013-09-13 06:38:02 +0800

05 Jul, 2013

1 commit

80cc38b16 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree updates from Jiri Kosina:
"The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
treewide: relase -> release
Documentation/cgroups/memory.txt: fix stat file documentation
sysctl/net.txt: delete reference to obsolete 2.4.x kernel
spinlock_api_smp.h: fix preprocessor comments
treewide: Fix typo in printk
doc: device tree: clarify stuff in usage-model.txt.
open firmware: "/aliasas" -> "/aliases"
md: bcache: Fixed a typo with the word 'arithmetic'
irq/generic-chip: fix a few kernel-doc entries
frv: Convert use of typedef ctl_table to struct ctl_table
sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
doc: clk: Fix incorrect wording
Documentation/arm/IXP4xx fix a typo
Documentation/networking/ieee802154 fix a typo
Documentation/DocBook/media/v4l fix a typo
Documentation/video4linux/si476x.txt fix a typo
Documentation/virtual/kvm/api.txt fix a typo
Documentation/early-userspace/README fix a typo
Documentation/video4linux/soc-camera.txt fix a typo
lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
...

Linus Torvalds
2013-07-05 02:40:58 +0800

04 Jul, 2013

1 commit

f968ef1c5 memcg: update TODO list in Documentation ... Browse Code »

hugetlb cgroup has already been implemented.

Signed-off-by: Li Zefan
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Rob Landley
Cc: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2013-07-04 07:07:31 +0800

24 Jun, 2013

1 commit

a15e41909 Documentation/cgroups/memory.txt: fix stat file documentation ... Browse Code »

Documentation for inactive_anon / active_anon was mixed up. Fix that.

Signed-off-by: Aaro Koskinen
Acked-by: Rob Landley
Signed-off-by: Jiri Kosina

Aaro Koskinen
2013-06-24 17:22:44 +0800

28 May, 2013

1 commit

f884ab15a doc: fix misspellings with 'codespell' tool ... Browse Code »

Signed-off-by: Anatol Pomozov
Signed-off-by: Jiri Kosina

Anatol Pomozov
2013-05-28 18:02:12 +0800

08 May, 2013

1 commit

b070e65c0 mm, memcg: add rss_huge stat to memory.stat ... Browse Code »

This exports the amount of anonymous transparent hugepages for each
memcg via the new "rss_huge" stat in memory.stat. The units are in
bytes.

This is helpful to determine the hugepage utilization for individual
jobs on the system in comparison to rss and opportunities where
MADV_HUGEPAGE may be helpful.

The amount of anonymous transparent hugepages is also included in "rss"
for backwards compatibility.

Signed-off-by: David Rientjes
Acked-by: Michal Hocko
Acked-by: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2013-05-08 09:38:26 +0800

01 May, 2013

1 commit

5d434fcb2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree updates from Jiri Kosina:
"Usual stuff, mostly comment fixes, typo fixes, printk fixes and small
code cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits)
mm: Convert print_symbol to %pSR
gfs2: Convert print_symbol to %pSR
m32r: Convert print_symbol to %pSR
iostats.txt: add easy-to-find description for field 6
x86 cmpxchg.h: fix wrong comment
treewide: Fix typo in printk and comments
doc: devicetree: Fix various typos
docbook: fix 8250 naming in device-drivers
pata_pdc2027x: Fix compiler warning
treewide: Fix typo in printks
mei: Fix comments in drivers/misc/mei
treewide: Fix typos in kernel messages
pm44xx: Fix comment for "CONFIG_CPU_IDLE"
doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP"
mmzone: correct "pags" to "pages" in comment.
kernel-parameters: remove outdated 'noresidual' parameter
Remove spurious _H suffixes from ifdef comments
sound: Remove stray pluses from Kconfig file
radio-shark: Fix printk "CONFIG_LED_CLASS"
doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE
...

Linus Torvalds
2013-05-01 00:36:50 +0800

30 Apr, 2013

1 commit

70ddf637e memcg: add memory.pressure_level events ... Browse Code »

With this patch userland applications that want to maintain the
interactivity/memory allocation cost can use the pressure level
notifications. The levels are defined like this:

The "low" level means that the system is reclaiming memory for new
allocations. Monitoring this reclaiming activity might be useful for
maintaining cache level. Upon notification, the program (typically
"Activity Manager") might analyze vmstat and act in advance (i.e.
prematurely shutdown unimportant services).

The "medium" level means that the system is experiencing medium memory
pressure, the system might be making swap, paging out active file
caches, etc. Upon this event applications may decide to further analyze
vmstat/zoneinfo/memcg or internal memory usage statistics and free any
resources that can be easily reconstructed or re-read from a disk.

The "critical" level means that the system is actively thrashing, it is
about to out of memory (OOM) or even the in-kernel OOM killer is on its
way to trigger. Applications should do whatever they can to help the
system. It might be too late to consult with vmstat or any other
statistics, so it's advisable to take an immediate action.

The events are propagated upward until the event is handled, i.e. the
events are not pass-through. Here is what this means: for example you
have three cgroups: A->B->C. Now you set up an event listener on
cgroups A, B and C, and suppose group C experiences some pressure. In
this situation, only group C will receive the notification, i.e. groups
A and B will not receive it. This is done to avoid excessive
"broadcasting" of messages, which disturbs the system and which is
especially bad if we are low on memory or thrashing. So, organize the
cgroups wisely, or propagate the events manually (or, ask us to
implement the pass-through events, explaining why would you need them.)

Performance wise, the memory pressure notifications feature itself is
lightweight and does not require much of bookkeeping, in contrast to the
rest of memcg features. Unfortunately, as of current memcg
implementation, pages accounting is an inseparable part and cannot be
turned off. The good news is that there are some efforts[1] to improve
the situation; plus, implementing the same, fully API-compatible[2]
interface for CONFIG_MEMCG=n case (e.g. embedded) is also a viable
option, so it will not require any changes on the userland side.

[1] http://permalink.gmane.org/gmane.linux.kernel.cgroups/6291
[2] http://lkml.org/lkml/2013/2/21/454

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_CGROPUPS=n warnings]
Signed-off-by: Anton Vorontsov
Acked-by: Kirill A. Shutemov
Acked-by: KAMEZAWA Hiroyuki
Cc: Tejun Heo
Cc: David Rientjes
Cc: Pekka Enberg
Cc: Mel Gorman
Cc: Glauber Costa
Cc: Michal Hocko
Cc: Luiz Capitulino
Cc: Greg Thelen
Cc: Leonid Moiseichuk
Cc: KOSAKI Motohiro
Cc: Minchan Kim
Cc: Bartlomiej Zolnierkiewicz
Cc: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anton Vorontsov
2013-04-30 06:54:38 +0800

27 Mar, 2013

1 commit

df7c6b992 doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP" ... Browse Code »

Signed-off-by: Paul Bolle
Acked-by: Rob Landley
Signed-off-by: Jiri Kosina

Paul Bolle
2013-03-27 21:14:02 +0800

19 Dec, 2012

2 commits

92e793495 kmem: add slab-specific documentation about the kmem controller ... Browse Code »

Signed-off-by: Glauber Costa
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Frederic Weisbecker
Cc: Greg Thelen
Cc: Johannes Weiner
Cc: JoonSoo Kim
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Pekka Enberg
Cc: Rik van Riel
Cc: Suleiman Souhlal
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Glauber Costa
2012-12-19 07:02:15 +0800
d5bdae7d5 memcg: add documentation about the kmem controller ... Browse Code »

Signed-off-by: Glauber Costa
Acked-by: Kamezawa Hiroyuki
Acked-by: Michal Hocko
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Frederic Weisbecker
Cc: Greg Thelen
Cc: Johannes Weiner
Cc: JoonSoo Kim
Cc: Mel Gorman
Cc: Pekka Enberg
Cc: Rik van Riel
Cc: Suleiman Souhlal
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Glauber Costa
2012-12-19 07:02:13 +0800

12 Dec, 2012

1 commit

348b46553 Documentation/cgroups/memory.txt: s/mem_cgroup_charge/mem_cgroup_change_common/ ... Browse Code »

mem_cgroup_charge_common() is invoked as the entry point for cgroup limits
charge rather than mem_cgroup_charge(), as the later has been removed for
years. Update the cgroup/memory.txt to reflect this change.

Signed-off-by: Jie Liu
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Liu
2012-12-12 09:22:24 +0800

17 Nov, 2012

1 commit

9a5a8f19b memcg: oom: fix totalpages calculation for memory.swappiness==0 ... Browse Code »

oom_badness() takes a totalpages argument which says how many pages are
available and it uses it as a base for the score calculation. The value
is calculated by mem_cgroup_get_limit which considers both limit and
total_swap_pages (resp. memsw portion of it).

This is usually correct but since fe35004fbf9e ("mm: avoid swapping out
with swappiness==0") we do not swap when swappiness is 0 which means
that we cannot really use up all the totalpages pages. This in turn
confuses oom score calculation if the memcg limit is much smaller than
the available swap because the used memory (capped by the limit) is
negligible comparing to totalpages so the resulting score is too small
if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
A wrong process might be selected as result.

The problem can be worked around by checking mem_cgroup_swappiness==0
and not considering swap at all in such a case.

Signed-off-by: Michal Hocko
Acked-by: David Rientjes
Acked-by: Johannes Weiner
Acked-by: KOSAKI Motohiro
Acked-by: KAMEZAWA Hiroyuki
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2012-11-17 06:33:04 +0800

09 Oct, 2012

1 commit

1939c557b memcg: trivial fixes for Documentation/cgroups/memory.txt ... Browse Code »

While reading through Documentation/cgroups/memory.txt, I found a number
of minor wordos and typos. The patch below is a conservative handling of
some of these: it provides just a number of "obviously correct" fixes to
the English that improve the readability of the document somewhat.
Obviously some more significant fixes need to be made to the document, but
some of those may not be in the "obvious correct" category.

Signed-off-by: Michael Kerrisk
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Kerrisk
2012-10-09 15:22:54 +0800

01 Aug, 2012

2 commits

05a73ed29 mm/memcg: complete documentation for tcp memcg files ... Browse Code »

Signed-off-by: Wanpeng Li
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2012-08-01 09:42:43 +0800
c255a4580 memcg: rename config variables ... Browse Code »

Sanity:

CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

[mhocko@suse.cz: fix missed bits]
Cc: Glauber Costa
Acked-by: Michal Hocko
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc: Tejun Heo
Cc: Aneesh Kumar K.V
Cc: David Rientjes
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-08-01 09:42:43 +0800

30 May, 2012

3 commits

cc926f784 memcg: move charges to root cgroup if use_hierarchy=0 ... Browse Code »

Presently, at removal of cgroup, ->pre_destroy() is called and moves
charges to the parent cgroup. A major reason for returning -EBUSY from
->pre_destroy() is that the 'moving' hits the parent's resource
limitation. It happens only when use_hierarchy=0.

Considering use_hierarchy=0, all cgroups should be flat. So, no one
cannot justify moving charges to parent...parent and children are in flat
configuration, not hierarchical.

This patch modifes the code to move charges to the root cgroup at
rmdir/force_empty if use_hierarchy==0. This will much simplify rmdir()
and reduce error in ->pre_destroy.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Aneesh Kumar K.V
Cc: Michal Hocko
Cc: Johannes Weiner
Cc: Frederic Weisbecker
Cc: Ying Han
Cc: Glauber Costa
Reviewed-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2012-05-30 07:22:27 +0800
4b91355e9 memcg: fix/change behavior of shared anon at moving task ... Browse Code »

This patch changes memcg's behavior at task_move().

At task_move(), the kernel scans a task's page table and move the changes
for mapped pages from source cgroup to target cgroup. There has been a
bug at handling shared anonymous pages for a long time.

Before patch:
- The spec says 'shared anonymous pages are not moved.'
- The implementation was 'shared anonymoys pages may be moved'.
If page_mapcount
Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Michal Hocko
Cc: Johannes Weiner
Cc: Naoya Horiguchi
Cc: Glauber Costa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2012-05-30 07:22:24 +0800
eb6332a54 Documentation: memcg: future proof hierarchical statistics documentation ... Browse Code »

The hierarchical versions of per-memcg counters in memory.stat are all
calculated the same way and are all named total_.

Documenting the pattern is easier for maintenance than listing each
counter twice.

Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
Acked-by: KOSAKI Motohiro
Acked-by: Ying Han
Randy Dunlap
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:20 +0800

13 Apr, 2012

1 commit

6252efcc3 memcg: fix up documentation on global LRU ... Browse Code »

In v3.3-rc1, the global LRU was removed in commit 925b7673cce3 ("mm:
make per-memcg LRU lists exclusive"). The patch fixes up the memcg
docs.

I left the swap session to someone who has better understanding of
'memory+swap'.

Signed-off-by: Ying Han
Acked-by: Michal Hocko
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ying Han
2012-04-13 04:12:11 +0800

13 Jan, 2012

2 commits

0527b6903 memcg: fix pgpgin/pgpgout documentation ... Browse Code »

The two memcg stats pgpgin/pgpgout have different meaning than the ones
in vmstat, which indicates that we picked a bad naming for them.

It might be late to change the stat name, but better documentation is
always helpful.

Signed-off-by: Ying Han
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Johannes Weiner
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ying Han
2012-01-13 12:13:06 +0800
d66c1ce7d Documentation/cgroups/memory.txt: fix typo ... Browse Code »

It should be memsw.max_usage_in_bytes. This typo has been there for
a really long time.

Signed-off-by: Zhu Yanhai
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhu Yanhai
2012-01-13 12:13:06 +0800

23 Dec, 2011

1 commit

65c64ce8e Partial revert "Basic kernel memory functionality for the Memory Controller" ... Browse Code »

This reverts commit e5671dfae59b165e2adfd4dfbdeab11ac8db5bda.

After a follow up discussion with Michal, it was agreed it would
be better to leave the kmem controller with just the tcp files,
deferring the behavior of the other general memory.kmem.* files
for a later time, when more caches are controlled. This is because
generic kmem files are not used by tcp accounting and it is
not clear how other slab caches would fit into the scheme.

We are reverting the original commit so we can track the reference.
Part of the patch is kept, because it was used by the later tcp
code. Conflicts are shown in the bottom. init/Kconfig is removed from
the revert entirely.

Signed-off-by: Glauber Costa
Acked-by: Michal Hocko
CC: Kirill A. Shutemov
CC: Paul Menage
CC: Greg Thelen
CC: Johannes Weiner
CC: David S. Miller

Conflicts:

Documentation/cgroups/memory.txt
mm/memcontrol.c
Signed-off-by: David S. Miller

Glauber Costa
2011-12-23 11:37:18 +0800

13 Dec, 2011

5 commits

5a6dd3437 Display current tcp memory allocation in kmem cgroup ... Browse Code »

This patch introduces kmem.tcp.usage_in_bytes file, living in the
kmem_cgroup filesystem. It is a simple read-only file that displays the
amount of kernel memory currently consumed by the cgroup.

Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:11 +0800
3aaabe234 tcp buffer limitation: per-cgroup limit ... Browse Code »

This patch uses the "tcp.limit_in_bytes" field of the kmem_cgroup to
effectively control the amount of kernel memory pinned by a cgroup.

This value is ignored in the root cgroup, and in all others,
caps the value specified by the admin in the net namespaces'
view of tcp_sysctl_mem.

If namespaces are being used, the admin is allowed to set a
value bigger than cgroup's maximum, the same way it is allowed
to set pretty much unlimited values in a real box.

Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:11 +0800
d1a4c0b37 tcp memory pressure controls ... Browse Code »

This patch introduces memory pressure controls for the tcp
protocol. It uses the generic socket memory pressure code
introduced in earlier patches, and fills in the
necessary data in cg_proto struct.

Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: Eric W. Biederman
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:10 +0800
e1aab161e socket: initial cgroup code. ... Browse Code »

The goal of this work is to move the memory pressure tcp
controls to a cgroup, instead of just relying on global
conditions.

To avoid excessive overhead in the network fast paths,
the code that accounts allocated memory to a cgroup is
hidden inside a static_branch(). This branch is patched out
until the first non-root cgroup is created. So when nobody
is using cgroups, even if it is mounted, no significant performance
penalty should be seen.

This patch handles the generic part of the code, and has nothing
tcp-specific.

Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: Kirill A. Shutemov
CC: David S. Miller
CC: Eric W. Biederman
CC: Eric Dumazet
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:10 +0800
e5671dfae Basic kernel memory functionality for the Memory Controller ... Browse Code »

This patch lays down the foundation for the kernel memory component
of the Memory Controller.

As of today, I am only laying down the following files:

* memory.independent_kmem_limit
* memory.kmem.limit_in_bytes (currently ignored)
* memory.kmem.usage_in_bytes (always zero)

Signed-off-by: Glauber Costa
CC: Kirill A. Shutemov
CC: Paul Menage
CC: Greg Thelen
CC: Johannes Weiner
CC: Michal Hocko
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:03:55 +0800

03 Nov, 2011

1 commit

9b272977e memcg: skip scanning active lists based on individual size ... Browse Code »

Reclaim decides to skip scanning an active list when the corresponding
inactive list is above a certain size in comparison to leave the assumed
working set alone while there are still enough reclaim candidates around.

The memcg implementation of comparing those lists instead reports whether
the whole memcg is low on the requested type of inactive pages,
considering all nodes and zones.

This can lead to an oversized active list not being scanned because of the
state of the other lists in the memcg, as well as an active list being
scanned while its corresponding inactive list has enough pages.

Not only is this wrong, it's also a scalability hazard, because the global
memory state over all nodes and zones has to be gathered for each memcg
and zone scanned.

Make these calculations purely based on the size of the two LRU lists
that are actually affected by the outcome of the decision.

Signed-off-by: Johannes Weiner
Reviewed-by: Rik van Riel
Cc: KOSAKI Motohiro
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Reviewed-by: Minchan Kim
Reviewed-by: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-11-03 07:07:00 +0800

15 Sep, 2011

1 commit

185efc0f9 memcg: Revert "memcg: add memory.vmscan_stat" ... Browse Code »

Revert the post-3.0 commit 82f9d486e59f5 ("memcg: add
memory.vmscan_stat").

The implementation of per-memcg reclaim statistics violates how memcg
hierarchies usually behave: hierarchically.

The reclaim statistics are accounted to child memcgs and the parent
hitting the limit, but not to hierarchy levels in between. Usually,
hierarchical statistics are perfectly recursive, with each level
representing the sum of itself and all its children.

Since this exports statistics to userspace, this may lead to confusion
and problems with changing things after the release, so revert it now,
we can try again later.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Michal Hocko
Cc: Ying Han
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-09-15 09:09:38 +0800

27 Jul, 2011

1 commit

82f9d486e memcg: add memory.vmscan_stat ... Browse Code »

The commit log of 0ae5e89c60c9 ("memcg: count the soft_limit reclaim
in...") says it adds scanning stats to memory.stat file. But it doesn't
because we considered we needed to make a concensus for such new APIs.

This patch is a trial to add memory.scan_stat. This shows
- the number of scanned pages(total, anon, file)
- the number of rotated pages(total, anon, file)
- the number of freed pages(total, anon, file)
- the number of elaplsed time (including sleep/pause time)

for both of direct/soft reclaim.

The biggest difference with oringinal Ying's one is that this file
can be reset by some write, as

# echo 0 ...../memory.scan_stat

Example of output is here. This is a result after make -j 6 kernel
under 300M limit.

[kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
[kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
scanned_pages_by_limit 9471864
scanned_anon_pages_by_limit 6640629
scanned_file_pages_by_limit 2831235
rotated_pages_by_limit 4243974
rotated_anon_pages_by_limit 3971968
rotated_file_pages_by_limit 272006
freed_pages_by_limit 2318492
freed_anon_pages_by_limit 962052
freed_file_pages_by_limit 1356440
elapsed_ns_by_limit 351386416101
scanned_pages_by_system 0
scanned_anon_pages_by_system 0
scanned_file_pages_by_system 0
rotated_pages_by_system 0
rotated_anon_pages_by_system 0
rotated_file_pages_by_system 0
freed_pages_by_system 0
freed_anon_pages_by_system 0
freed_file_pages_by_system 0
elapsed_ns_by_system 0
scanned_pages_by_limit_under_hierarchy 9471864
scanned_anon_pages_by_limit_under_hierarchy 6640629
scanned_file_pages_by_limit_under_hierarchy 2831235
rotated_pages_by_limit_under_hierarchy 4243974
rotated_anon_pages_by_limit_under_hierarchy 3971968
rotated_file_pages_by_limit_under_hierarchy 272006
freed_pages_by_limit_under_hierarchy 2318492
freed_anon_pages_by_limit_under_hierarchy 962052
freed_file_pages_by_limit_under_hierarchy 1356440
elapsed_ns_by_limit_under_hierarchy 351386416101
scanned_pages_by_system_under_hierarchy 0
scanned_anon_pages_by_system_under_hierarchy 0
scanned_file_pages_by_system_under_hierarchy 0
rotated_pages_by_system_under_hierarchy 0
rotated_anon_pages_by_system_under_hierarchy 0
rotated_file_pages_by_system_under_hierarchy 0
freed_pages_by_system_under_hierarchy 0
freed_anon_pages_by_system_under_hierarchy 0
freed_file_pages_by_system_under_hierarchy 0
elapsed_ns_by_system_under_hierarchy 0

total_xxxx is for hierarchy management.

This will be useful for further memcg developments and need to be
developped before we do some complicated rework on LRU/softlimit
management.

This patch adds a new struct memcg_scanrecord into scan_control struct.
sc->nr_scanned at el is not designed for exporting information. For
example, nr_scanned is reset frequentrly and incremented +2 at scanning
mapped pages.

To avoid complexity, I added a new param in scan_control which is for
exporting scanning score.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Michal Hocko
Cc: Ying Han
Cc: Andrew Bresticker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2011-07-27 07:49:42 +0800

16 Jun, 2011

3 commits

67de0162f Documentation: fix cgroup typos and formatting ... Browse Code »

Fix format and spelling.

Signed-off-by: Jörg Sommer
Acked-by: Paul Menage
Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Jörg Sommer
2011-06-16 12:52:50 +0800
f6e07d380 Documentation: update cgroupfs mount point ... Browse Code »

According to commit 676db4af0430 ("cgroupfs: create /sys/fs/cgroup to
mount cgroupfs on") the canonical mountpoint for the cgroup filesystem
is /sys/fs/cgroup. Hence, this should be used in the documentation.

Signed-off-by: Jörg Sommer
Acked-by: Paul Menage
Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Jörg Sommer
2011-06-16 12:52:50 +0800
50c35e5ba memcg: add documentation for the memory.numastat API ... Browse Code »

[akpm@linux-foundation.org: rework text, fit it into 80-cols]
Signed-off-by: Ying Han
Reviewed-by: KOSAKI Motohiro
Acked-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ying Han
2011-06-16 11:03:59 +0800

29 Apr, 2011

1 commit

a111c966a memcg: update documentation to describe usage_in_bytes ... Browse Code »

Since 569b846d ("memcg: coalesce uncharge during unmap/truncate"), we do
batched (delayed) uncharge at truncation/unmap. And since cdec2e42(memcg:
coalesce charging via percpu storage), we have percpu cache for
res_counter.

These changes improved performance of memory cgroup very much, but made
res_counter->usage usually have a bigger value than the actual value of
memory usage. So, *.usage_in_bytes, which show res_counter->usage, are
not desirable for precise values of memory(and swap) usage anymore.

Instead of removing these files completely(because we cannot know
res_counter->usage without them), this patch updates the meaning of those
files.

Signed-off-by: Daisuke Nishimura
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Michal Hocko
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2011-04-29 02:28:21 +0800

17 Feb, 2011

1 commit

689bca3bb memcg: clarify use_hierarchy documentation ... Browse Code »

The memcg code does not allow changing memory.use_hierarchy if the
parent cgroup has enabled use_hierarchy. Update documentation to match
the code.

Signed-off-by: Greg Thelen
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Jiri Kosina

Greg Thelen
2011-02-17 16:17:21 +0800

28 May, 2010

4 commits

dc10e281f memcg: update documentation ... Browse Code »

Some information are old, and I think current document doesn't work as "a
guide for users". We need summary of all of our controls, at least.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Randy Dunlap
Cc: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-05-28 00:12:43 +0800
87946a722 memcg: move charge of file pages ... Browse Code »

This patch adds support for moving charge of file pages, which include
normal file, tmpfs file and swaps of tmpfs file. It's enabled by setting
bit 1 of /memory.move_charge_at_immigrate.

Unlike the case of anonymous pages, file pages(and swaps) in the range
mmapped by the task will be moved even if the task hasn't done page fault,
i.e. they might not be the task's "RSS", but other task's "RSS" that maps
the same file. And mapcount of the page is ignored(the page can be moved
even if page_mapcount(page) > 1). So, conditions that the page/swap
should be met to be moved is that it must be in the range mmapped by the
target task and it must be charged to the old cgroup.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Daisuke Nishimura
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2010-05-28 00:12:43 +0800
3c11ecf44 memcg: oom kill disable and oom status ... Browse Code »

This adds a feature to disable oom-killer for memcg, if disabled, of
course, tasks under memcg will stop.

But now, we have oom-notifier for memcg. And the world around memcg is
not under out-of-memory. memcg's out-of-memory just shows memcg hits
limit. Then, administrator or management daemon can recover the situation
by

- kill some process
- enlarge limit, add more swap.
- migrate some tasks
- remove file cache on tmps (difficult ?)

Unlike oom-killer, you can take enough information before killing tasks.
(by gcore, or, ps etc.)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-05-28 00:12:43 +0800
9490ff275 memcg: oom notifier ... Browse Code »

Considering containers or other resource management softwares in userland,
event notification of OOM in memcg should be implemented. Now, memcg has
"threshold" notifier which uses eventfd, we can make use of it for oom
notification.

This patch adds oom notification eventfd callback for memcg. The usage is
very similar to threshold notifier, but control file is memory.oom_control
and no arguments other than eventfd is required.

% cgroup_event_notifier /cgroup/A/memory.oom_control dummy
(About cgroup_event_notifier, see Documentation/cgroup/)

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: David Rientjes
Cc: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-05-28 00:12:43 +0800