Eric Lee / smarc-fsl-linux-kernel

13 Jan, 2012

5 commits

0efc8eb9c page_cgroup: drop multi CONFIG_MEMORY_HOTPLUG ... Browse Code »

No need for two CONFIG_MEMORY_HOTPLUG blocks.

Signed-off-by: Bob Liu
Acked-by: Michal Hocko
Cc: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2012-01-13 12:13:08 +0800
9fb4b7cc0 page_cgroup: add helper function to get swap_cgroup ... Browse Code »
43

There are multiple places which need to get the swap_cgroup address, so
add a helper function:

static struct swap_cgroup *swap_cgroup_getsc(swp_entry_t ent,
struct swap_cgroup_ctrl **ctrl);

to simplify the code.

Signed-off-by: Bob Liu
Acked-by: Michal Hocko
Acked-by: KAMEZAWA Hiroyuki
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2012-01-13 12:13:07 +0800
00c54c0ba mm: page_cgroup: check page_cgroup arrays in lookup_page_cgroup() only when necessary ... Browse Code »

lookup_page_cgroup() is usually used only against pages that are used in
userspace.

The exception is the CONFIG_DEBUG_VM-only memcg check from the page
allocator: it can run on pages without page_cgroup descriptors allocated
when the pages are fed into the page allocator for the first time during
boot or memory hotplug.

Include the array check only when CONFIG_DEBUG_VM is set and save the
unnecessary check in production kernels.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Michal Hocko
Cc: Balbir Singh
Cc: David Rientjes
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-01-13 12:13:06 +0800
6b208e3f6 mm: memcg: remove unused node/section info from pc->flags ... Browse Code »

To find the page corresponding to a certain page_cgroup, the pc->flags
encoded the node or section ID with the base array to compare the pc
pointer to.

Now that the per-memory cgroup LRU lists link page descriptors directly,
there is no longer any code that knows the struct page_cgroup of a PFN
but not the struct page.

[hughd@google.com: remove unused node/section info from pc->flags fix]
Signed-off-by: Johannes Weiner
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Michal Hocko
Reviewed-by: Kirill A. Shutemov
Cc: KAMEZAWA Hiroyuki
Cc: Michal Hocko
Cc: "Kirill A. Shutemov"
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Ying Han
Cc: Greg Thelen
Cc: Michel Lespinasse
Cc: Rik van Riel
Cc: Minchan Kim
Cc: Christoph Hellwig
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-01-13 12:13:05 +0800
925b7673c mm: make per-memcg LRU lists exclusive ... Browse Code »
43

Now that all code that operated on global per-zone LRU lists is
converted to operate on per-memory cgroup LRU lists instead, there is no
reason to keep the double-LRU scheme around any longer.

The pc->lru member is removed and page->lru is linked directly to the
per-memory cgroup LRU lists, which removes two pointers from a
descriptor that exists for every page frame in the system.

Signed-off-by: Johannes Weiner
Signed-off-by: Hugh Dickins
Signed-off-by: Ying Han
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Michal Hocko
Reviewed-by: Kirill A. Shutemov
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Greg Thelen
Cc: Michel Lespinasse
Cc: Rik van Riel
Cc: Minchan Kim
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-01-13 12:13:05 +0800

03 Nov, 2011

2 commits

61600f578 mm/page_cgroup.c: quiet sparse noise ... Browse Code »

warning: symbol 'swap_cgroup_ctrl' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten
Cc: Paul Menage
Cc: Li Zefan
Acked-by: Balbir Singh
Cc: Daisuke Nishimura
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

H Hartley Sweeten
2011-11-03 07:07:00 +0800
ff7ee93f4 cgroup/kmemleak: Annotate alloc_page() for cgroup allocations ... Browse Code »

When the cgroup base was allocated with kmalloc, it was necessary to
annotate the variable with kmemleak_not_leak(). But because it has
recently been changed to be allocated with alloc_page() (which skips
kmemleak checks) causes a warning on boot up.

I was triggering this output:

allocated 8388608 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
kmemleak: Trying to color unknown object at 0xf5840000 as Grey
Pid: 0, comm: swapper Not tainted 3.0.0-test #12
Call Trace:
[] ? printk+0x1d/0x1f^M
[] paint_ptr+0x4f/0x78
[] kmemleak_not_leak+0x58/0x7d
[] ? __rcu_read_unlock+0x9/0x7d
[] kmemleak_init+0x19d/0x1e9
[] start_kernel+0x346/0x3ec
[] ? loglevel+0x18/0x18
[] i386_start_kernel+0xaa/0xb0

After a bit of debugging I tracked the object 0xf840000 (and others) down
to the cgroup code. The change from allocating base with kmalloc to
alloc_page() has the base not calling kmemleak_alloc() which adds the
pointer to the object_tree_root, but kmemleak_not_leak() adds it to the
crt_early_log[] table. On kmemleak_init(), the entry is found in the
early_log[] but not the object_tree_root, and this error message is
displayed.

If alloc_page() fails then it defaults back to vmalloc() which still uses
the kmemleak_alloc() which makes us still need the kmemleak_not_leak()
call. The solution is to call the kmemleak_alloc() directly if the
alloc_page() succeeds.

Reviewed-by: Michal Hocko
Signed-off-by: Steven Rostedt
Acked-by: Catalin Marinas
Signed-off-by: Jonathan Nieder
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven Rostedt
2011-11-03 07:06:59 +0800

15 Sep, 2011

1 commit

8c1fec1ba mm: Convert vmalloc/memset to vzalloc ... Browse Code »

Signed-off-by: Joe Perches
Acked-by: Paul Menage
Signed-off-by: Jiri Kosina

Joe Perches
2011-09-15 19:56:56 +0800

26 Jul, 2011

2 commits

1bb36fbd4 mm/page_cgroup.c: simplify code by using SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macros ... Browse Code »

Commit a539f3533b78e3 ("mm: add SECTION_ALIGN_UP() and
SECTION_ALIGN_DOWN() macro") introduced the SECTION_ALIGN_UP() and
SECTION_ALIGN_DOWN() macros. Use those macros to increase code
readability.

Signed-off-by: Daniel Kiper
Acked-by: David Rientjes
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Kiper
2011-07-26 11:57:09 +0800
00a66d297 mm: remove the leftovers of noswapaccount ... Browse Code »

In commit a2c8990aed5ab ("memsw: remove noswapaccount kernel parameter"),
Michal forgot to remove some left pieces of noswapaccount in the tree,
this patch removes them all.

Signed-off-by: WANG Cong
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

WANG Cong
2011-07-26 11:57:09 +0800

16 Jun, 2011

1 commit

37573e8c7 memcg: fix init_page_cgroup nid with sparsemem ... Browse Code »

Commit 21a3c9646873 ("memcg: allocate memory cgroup structures in local
nodes") makes page_cgroup allocation as NUMA aware. But that caused a
problem https://bugzilla.kernel.org/show_bug.cgi?id=36192.

The problem was getting a NID from invalid struct pages, which was not
initialized because it was out-of-node, out of [node_start_pfn,
node_end_pfn)

Now, with sparsemem, page_cgroup_init scans pfn from 0 to max_pfn. But
this may scan a pfn which is not on any node and can access memmap which
is not initialized.

This makes page_cgroup_init() for SPARSEMEM node aware and remove a code
to get nid from page->flags. (Then, we'll use valid NID always.)

[akpm@linux-foundation.org: try to fix up comments]
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2011-06-16 11:04:01 +0800

27 May, 2011

3 commits

6a5b18d2b memcg: move page-freeing code out of lock ... Browse Code »

Move page-freeing code out of swap_cgroup_mutex in the hope that it could
reduce few of theoretical contentions between swapons and/or swapoffs.

This is just a cleanup, no functional changes.

Signed-off-by: Namhyung Kim
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2011-05-27 08:12:35 +0800
33278f7f0 memcg: fix off-by-one when calculating swap cgroup map length ... Browse Code »

It allocated one more page than necessary if @max_pages was a multiple of
SC_PER_PAGE.

Signed-off-by: Namhyung Kim
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Cc: Daisuke Nishimura
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2011-05-27 08:12:35 +0800
268433b8e memcg: mark init_section_page_cgroup() properly ... Browse Code »

Commit ca371c0d7e23 ("memcg: fix page_cgroup fatal error in FLATMEM")
removes call to alloc_bootmem() in the function so that it can be marked
as __meminit to reduce memory usage when MEMORY_HOTPLUG=n.

Also as the new helper function alloc_page_cgroup() is called only in the
function, it should be marked too.

Signed-off-by: Namhyung Kim
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Cc: Michal Hocko
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2011-05-27 08:12:35 +0800

12 May, 2011

1 commit

21a3c9646 memcg: allocate memory cgroup structures in local nodes ... Browse Code »

Commit dde79e005a769 ("page_cgroup: reduce allocation overhead for
page_cgroup array for CONFIG_SPARSEMEM") added a regression that the
memory cgroup data structures all end up in node 0 because the first
attempt at allocating them would not pass in a node hint. Since the
initialization runs on CPU #0 it would all end up node 0. This is a
problem on large memory systems, where node 0 would lose a lot of
memory.

Change the alloc_pages_exact() to alloc_pages_exact_nid(). This will
still fall back to other nodes if not enough memory is available.

[ RED-PEN: right now it would fall back first before trying
vmalloc_node. Probably not the best strategy ... But I left it like
that for now. ]

Signed-off-by: Andi Kleen
Reported-by: Doug Nelson
Cc: David Rientjes
Reviewed-by: Michal Hocko
Cc: Dave Hansen
Acked-by: Balbir Singh
Acked-by: Johannes Weiner
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2011-05-12 09:50:45 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

24 Mar, 2011

3 commits

6cfddb261 memcg: page_cgroup array is never stored on reserved pages ... Browse Code »

KAMEZAWA Hiroyuki noted that free_pages_cgroup doesn't have to check for
PageReserved because we never store the array on reserved pages (neither
alloc_pages_exact nor vmalloc use those pages).

So we can replace the check by a BUG_ON.

Signed-off-by: Michal Hocko
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-03-24 10:46:33 +0800
dde79e005 page_cgroup: reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM ... Browse Code »

Currently we are allocating a single page_cgroup array per memory section
(stored in mem_section->base) when CONFIG_SPARSEMEM is selected. This is
correct but memory inefficient solution because the allocated memory
(unless we fall back to vmalloc) is not kmalloc friendly:

- 32b - 16384 entries (20B per entry) fit into 327680B so the
524288B slab cache is used
- 32b with PAE - 131072 entries with 2621440B fit into 4194304B
- 64b - 32768 entries (40B per entry) fit into 2097152 cache

This is ~37% wasted space per memory section and it sumps up for the whole
memory. On a x86_64 machine it is something like 6MB per 1GB of RAM.

We can reduce the internal fragmentation by using alloc_pages_exact which
allocates PAGE_SIZE aligned blocks so we will get down to
Cc: Dave Hansen
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-03-24 10:46:32 +0800
6b3ae58ef memcg: remove direct page_cgroup-to-page pointer ... Browse Code »

In struct page_cgroup, we have a full word for flags but only a few are
reserved. Use the remaining upper bits to encode, depending on
configuration, the node or the section, to enable page_cgroup-to-page
lookups without a direct pointer.

This saves a full word for every page in a system with memory cgroups
enabled.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:28 +0800

23 Mar, 2011

1 commit

5fda1bd5b mm: notifier_from_errno() cleanup ... Browse Code »

While looking at some other notifier callbacks I noticed this code could
use a simple cleanup.

notifier_from_errno() no longer needs the if (ret)/else conditional. That
same conditional is now done in notifier_from_errno().

Signed-off-by: Prarit Bhargava
Cc: Paul Menage
Cc: Li Zefan
Acked-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Prarit Bhargava
2011-03-23 08:44:01 +0800

19 Jul, 2010

1 commit

7952f9881 kmemleak: Annotate false positive in init_section_page_cgroup() ... Browse Code »

The pointer to the page_cgroup table allocated in
init_section_page_cgroup() is stored in section->page_cgroup as (base -
pfn). Since this value does not point to the beginning or inside the
allocated memory block, kmemleak reports a false positive.

This was reported in bugzilla.kernel.org as #16297.

Signed-off-by: Catalin Marinas
Reported-by: Adrien Dessemond
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Pekka Enberg
Cc: Andrew Morton

Catalin Marinas
2010-07-19 18:54:14 +0800

18 Mar, 2010

1 commit

e9e58a4ec memcg: avoid use cmpxchg in swap cgroup maintainance ... Browse Code »

swap_cgroup uses 2bytes data and uses cmpxchg in a new operation. 2byte
cmpxchg/xchg is not available on some archs. This patch replaces
cmpxchg/xchg with operations under lock.

Signed-off-by: KAMEZAWA Hiroyuki
Reported-by: Sachin Sant wrote:
Acked-by: Balbir Singh
Acked-by: Daisuke Nishimura
Cc: Li Zefan
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-03-18 09:43:47 +0800

13 Mar, 2010

1 commit

024914477 memcg: move charges of anonymous swap ... Browse Code »

This patch is another core part of this move-charge-at-task-migration
feature. It enables moving charges of anonymous swaps.

To move the charge of swap, we need to exchange swap_cgroup's record.

In current implementation, swap_cgroup's record is protected by:

- page lock: if the entry is on swap cache.
- swap_lock: if the entry is not on swap cache.

This works well in usual swap-in/out activity.

But this behavior make the feature of moving swap charge check many
conditions to exchange swap_cgroup's record safely.

So I changed modification of swap_cgroup's recored(swap_cgroup_record())
to use xchg, and define a new function to cmpxchg swap_cgroup's record.

This patch also enables moving charge of non pte_present but not uncharged
swap caches, which can be exist on swap-out path, by getting the target
pages via find_get_page() as do_mincore() does.

[kosaki.motohiro@jp.fujitsu.com: fix ia64 build]
[akpm@linux-foundation.org: fix typos]
Signed-off-by: Daisuke Nishimura
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Paul Menage
Cc: Daisuke Nishimura
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2010-03-13 07:52:36 +0800

22 Sep, 2009

1 commit

f52407ce2 memory hotplug: alloc page from other node in memory online ... Browse Code »

To initialize hotadded node, some pages are allocated. At that time, the
node hasn't memory, this makes the allocation always fail. In such case,
let's allocate pages from other nodes.

Signed-off-by: Shaohua Li
Signed-off-by: Yakui Zhao
Cc: Mel Gorman
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2009-09-22 22:17:26 +0800

19 Jun, 2009

3 commits

338c84310 memcg: remove some redundant checks ... Browse Code »

We don't need to check do_swap_account in the case that the function which
checks do_swap_account will never get called if do_swap_account == 0.

Signed-off-by: Li Zefan
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2009-06-19 04:03:47 +0800
d69b042f3 memcg: add file-based RSS accounting ... Browse Code »

Add file RSS tracking per memory cgroup

We currently don't track file RSS, the RSS we report is actually anon RSS.
All the file mapped pages, come in through the page cache and get
accounted there. This patch adds support for accounting file RSS pages.
It should

1. Help improve the metrics reported by the memory resource controller
2. Will form the basis for a future shared memory accounting heuristic
that has been proposed by Kamezawa.

Unfortunately, we cannot rename the existing "rss" keyword used in
memory.stat to "anon_rss". We however, add "mapped_file" data and hope to
educate the end user through documentation.

[hugh.dickins@tiscali.co.uk: fix mem_cgroup_update_mapped_file_stat oops]
Signed-off-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Paul Menage
Cc: Dhaval Giani
Cc: Daisuke Nishimura
Cc: YAMAMOTO Takashi
Cc: KOSAKI Motohiro
Cc: David Rientjes
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2009-06-19 04:03:47 +0800
8ca739e36 cgroups: make messages more readable ... Browse Code »

Fix some cgroup messages to read better.
Update MAINTAINERS to include mm/*cgroup* files.

Signed-off-by: Randy Dunlap
Cc: Paul Menage
Cc: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2009-06-19 04:03:46 +0800

12 Jun, 2009

2 commits

ca371c0d7 memcg: fix page_cgroup fatal error in FLATMEM ... Browse Code »

Now, SLAB is configured in very early stage and it can be used in
init routine now.

But replacing alloc_bootmem() in FLAT/DISCONTIGMEM's page_cgroup()
initialization breaks the allocation, now.
(Works well in SPARSEMEM case...it supports MEMORY_HOTPLUG and
size of page_cgroup is in reasonable size (< 1 << MAX_ORDER.)

This patch revive FLATMEM+memory cgroup by using alloc_bootmem.

In future,
We stop to support FLATMEM (if no users) or rewrite codes for flatmem
completely.But this will adds more messy codes and overheads.

Reported-by: Li Zefan
Tested-by: Li Zefan
Tested-by: Ingo Molnar
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Pekka Enberg

KAMEZAWA Hiroyuki
2009-06-12 16:00:54 +0800
959982fee memcg: don't use bootmem allocator in setup code ... Browse Code »

The bootmem allocator is no longer available for page_cgroup_init() because we
set up the kernel slab allocator much earlier now.

Cc: Ingo Molnar
Cc: Johannes Weiner
Cc: Linus Torvalds
Signed-off-by: Yinghai Lu
Signed-off-by: Pekka Enberg

Yinghai Lu
2009-06-12 00:27:10 +0800

03 Apr, 2009

2 commits

627991a20 memcg: remove redundant message at swapon ... Browse Code »

It's pointed out that swap_cgroup's message at swapon() is nonsense.
Because

* It can be calculated very easily if all necessary information is
written in Kconfig.

* It's not necessary to annoying people at every swapon().

In other view, now, memory usage per swp_entry is reduced to 2bytes from
8bytes(64bit) and I think it's reasonably small.

Reported-by: Hugh Dickins
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-04-03 10:04:56 +0800
a3b2d6926 cgroups: use css id in swap cgroup for saving memory v5 ... Browse Code »

Try to use CSS ID for records in swap_cgroup. By this, on 64bit machine,
size of swap_cgroup goes down to 2 bytes from 8bytes.

This means, when 2GB of swap is equipped, (assume the page size is 4096bytes)

From size of swap_cgroup = 2G/4k * 8 = 4Mbytes.
To size of swap_cgroup = 2G/4k * 2 = 1Mbytes.

Reduction is large. Of course, there are trade-offs. This CSS ID will
add overhead to swap-in/swap-out/swap-free.

But in general,
- swap is a resource which the user tend to avoid use.
- If swap is never used, swap_cgroup area is not used.
- Reading traditional manuals, size of swap should be proportional to
size of memory. Memory size of machine is increasing now.

I think reducing size of swap_cgroup makes sense.

Note:
- ID->CSS lookup routine has no locks, it's under RCU-Read-Side.
- memcg can be obsolete at rmdir() but not freed while refcnt from
swap_cgroup is available.

Changelog v4->v5:
- reworked on to memcg-charge-swapcache-to-proper-memcg.patch
Changlog ->v4:
- fixed not configured case.
- deleted unnecessary comments.
- fixed NULL pointer bug.
- fixed message in dmesg.

[nishimura@mxp.nes.nec.co.jp: css_tryget can be called twice in !PageCgroupUsed case]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Cc: Paul Menage
Cc: Hugh Dickins
Signed-off-by: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-04-03 10:04:56 +0800

12 Feb, 2009

1 commit

2e9c23724 memcg: use __GFP_NOWARN in page cgroup allocation ... Browse Code »

page_cgroup's page allocation at init/memory hotplug uses kmalloc() and
vmalloc(). If kmalloc() failes, vmalloc() is used.

This is because vmalloc() is very limited resource on 32bit systems.
We want to use kmalloc() first.

But in this kind of call, __GFP_NOWARN should be specified.

Reported-by: Heiko Carstens
Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Acked-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-02-12 06:25:35 +0800

09 Jan, 2009

4 commits

f8d665422 memcg: add mem_cgroup_disabled() ... Browse Code »

We check mem_cgroup is disabled or not by checking
mem_cgroup_subsys.disabled. I think it has more references than expected,
now.

replacing
if (mem_cgroup_subsys.disabled)
with
if (mem_cgroup_disabled())

give us good look, I think.

[kamezawa.hiroyu@jp.fujitsu.com: fix typo]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hirokazu Takahashi
2009-01-09 00:31:05 +0800
08e552c69 memcg: synchronized LRU ... Browse Code »
43

A big patch for changing memcg's LRU semantics.

Now,
- page_cgroup is linked to mem_cgroup's its own LRU (per zone).

- LRU of page_cgroup is not synchronous with global LRU.

- page and page_cgroup is one-to-one and statically allocated.

- To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
- lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);

- SwapCache is handled.

And, when we handle LRU list of page_cgroup, we do following.

pc = lookup_page_cgroup(page);
lock_page_cgroup(pc); .....................(1)
mz = page_cgroup_zoneinfo(pc);
spin_lock(&mz->lru_lock);
.....add to LRU
spin_unlock(&mz->lru_lock);
unlock_page_cgroup(pc);

But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.

This is a trial to remove this dirty nesting of locks.
This patch changes mz->lru_lock to be zone->lru_lock.
Then, above sequence will be written as

spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
mem_cgroup_add/remove/etc_lru() {
pc = lookup_page_cgroup(page);
mz = page_cgroup_zoneinfo(pc);
if (PageCgroupUsed(pc)) {
....add to LRU
}
spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU

This is much simpler.
(*) We're safe even if we don't take lock_page_cgroup(pc). Because..
1. When pc->mem_cgroup can be modified.
- at charge.
- at account_move().
2. at charge
the PCG_USED bit is not set before pc->mem_cgroup is fixed.
3. at account_move()
the page is isolated and not on LRU.

Pros.
- easy for maintenance.
- memcg can make use of laziness of pagevec.
- we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
- LRU status of memcg will be synchronized with global LRU's one.
- # of locks are reduced.
- account_move() is simplified very much.
Cons.
- may increase cost of LRU rotation.
(no impact if memcg is not configured.)

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:05 +0800
27a7faa07 memcg: swap cgroup for remembering usage ... Browse Code »

For accounting swap, we need a record per swap entry, at least.

This patch adds following function.
- swap_cgroup_swapon() .... called from swapon
- swap_cgroup_swapoff() ... called at the end of swapoff.

- swap_cgroup_record() .... record information of swap entry.
- swap_cgroup_lookup() .... lookup information of swap entry.

This patch just implements "how to record information". No actual method
for limit the usage of swap. These routine uses flat table to record and
lookup. "wise" lookup system like radix-tree requires requires memory
allocation at new records but swap-out is usually called under memory
shortage (or memcg hits limit.) So, I used static allocation. (maybe
dynamic allocation is not very hard but it adds additional memory
allocation in memory shortage path.)

Note1: In this, we use pointer to record information and this means
8bytes per swap entry. I think we can reduce this when we
create "id of cgroup" in the range of 0-65535 or 0-255.

Reported-by: Daisuke Nishimura
Reviewed-by: Daisuke Nishimura
Tested-by: Daisuke Nishimura
Reported-by: Hugh Dickins
Reported-by: Balbir Singh
Reported-by: Andrew Morton
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Pavel Emelianov
Cc: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:05 +0800
0753b0ef3 memcg: do not recalculate section unnecessarily in init_section_page_cgroup ... Browse Code »

In init_section_page_cgroup() the section a given pfn belongs to is
calculated at the top of the function and, despite the fact that the
pfn/section correspondence does not change, it is recalculated further
down the same function. By computing this just once and reusing that
value we save some bytes in the object file and do not waste CPU cycles.

Signed-off-by: Fernando Luis Vazquez Cao
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fernando Luis Vazquez Cao
2009-01-09 00:31:04 +0800

07 Jan, 2009

1 commit

feb166948 mm: make init_section_page_cgroup() static ... Browse Code »

Sparse output following warning.

mm/page_cgroup.c:100:15: warning: symbol 'init_section_page_cgroup' was
not declared. Should it be static?

cleanup here.

Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-01-07 07:59:04 +0800

11 Dec, 2008

1 commit

653d22c0f page_cgroup should ignore empty nodes ... Browse Code »

Fix a total bootup freeze on ia64.

Signed-off-by: KAMEZAWA Hiroyuki
Tested-by: Li Zefan
Reported-by: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-12-11 00:01:53 +0800

02 Dec, 2008

1 commit

dc19f9db3 memcg: memory hotplug fix for notifier callback ... Browse Code »

Fixes for memcg/memory hotplug.

While memory hotplug allocate/free memmap, page_cgroup doesn't free
page_cgroup at OFFLINE when page_cgroup is allocated via bootomem.
(Because freeing bootmem requires special care.)

Then, if page_cgroup is allocated by bootmem and memmap is freed/allocated
by memory hotplug, page_cgroup->page == page is no longer true.

But current MEM_ONLINE handler doesn't check it and update
page_cgroup->page if it's not necessary to allocate page_cgroup. (This
was not found because memmap is not freed if SPARSEMEM_VMEMMAP is y.)

And I noticed that MEM_ONLINE can be called against "part of section".
So, freeing page_cgroup at CANCEL_ONLINE will cause trouble. (freeing
used page_cgroup) Don't rollback at CANCEL.

One more, current memory hotplug notifier is stopped by slub because it
sets NOTIFY_STOP_MASK to return vaule. So, page_cgroup's callback never
be called. (low priority than slub now.)

I think this slub's behavior is not intentional(BUG). and fixes it.

Another way to be considered about page_cgroup allocation:
- free page_cgroup at OFFLINE even if it's from bootmem
and remove specieal handler. But it requires more changes.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12041

Signed-off-by: KAMEZAWA Hiruyoki
Cc: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Tested-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-12-02 11:55:24 +0800

01 Dec, 2008

1 commit

31168481c meminit section warnings ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2008-12-01 02:03:35 +0800