Eric Lee / smarc-fsl-linux-kernel

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

24 Mar, 2011

3 commits

6b3ae58ef memcg: remove direct page_cgroup-to-page pointer ... Browse Code »

In struct page_cgroup, we have a full word for flags but only a few are
reserved. Use the remaining upper bits to encode, depending on
configuration, the node or the section, to enable page_cgroup-to-page
lookups without a direct pointer.

This saves a full word for every page in a system with memory cgroups
enabled.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:28 +0800
de3638d9c memcg: fold __mem_cgroup_move_account into caller ... Browse Code »

It is one logical function, no need to have it split up.

Also, get rid of some checks from the inner function that ensured the
sanity of the outer function.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:27 +0800
97a6c37b3 memcg: change page_cgroup_zoneinfo signature ... Browse Code »

Instead of passing a whole struct page_cgroup to this function, let it
take only what it really needs from it: the struct mem_cgroup and the
page.

This has the advantage that reading pc->mem_cgroup is now done at the same
place where the ordering rules for this pointer are enforced and
explained.

It is also in preparation for removing the pc->page backpointer.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:26 +0800

14 Jan, 2011

3 commits

836cb711a Revert update for dirty_ratio for memcg. ... Browse Code »

The flags added by commit db16d5ec1f87f17511599bc77857dd1662b5a22f
has no user now. We believe we'll use it soon but considering
patch reviewing, the change itself should be folded into incoming
set of "dirty ratio for memcg" patches.

So, it's better to drop this change from current mainline tree.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Greg Thelen
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2011-01-14 23:52:02 +0800
dbd4ea78f memcg: add lock to synchronize page accounting and migration ... Browse Code »

Introduce a new bit spin lock, PCG_MOVE_LOCK, to synchronize the page
accounting and migration code. This reworks the locking scheme of
_update_stat() and _move_account() by adding new lock bit PCG_MOVE_LOCK,
which is always taken under IRQ disable.

1. If pages are being migrated from a memcg, then updates to that
memcg page statistics are protected by grabbing PCG_MOVE_LOCK using
move_lock_page_cgroup(). In an upcoming commit, memcg dirty page
accounting will be updating memcg page accounting (specifically: num
writeback pages) from IRQ context (softirq). Avoid a deadlocking
nested spin lock attempt by disabling irq on the local processor when
grabbing the PCG_MOVE_LOCK.

2. lock for update_page_stat is used only for avoiding race with
move_account(). So, IRQ awareness of lock_page_cgroup() itself is not
a problem. The problem is between mem_cgroup_update_page_stat() and
mem_cgroup_move_account_page().

Trade-off:
* Changing lock_page_cgroup() to always disable IRQ (or
local_bh) has some impacts on performance and I think
it's bad to disable IRQ when it's not necessary.
* adding a new lock makes move_account() slower. Score is
here.

Performance Impact: moving a 8G anon process.

Before:
real 0m0.792s
user 0m0.000s
sys 0m0.780s

After:
real 0m0.854s
user 0m0.000s
sys 0m0.842s

This score is bad but planned patches for optimization can reduce
this impact.

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Greg Thelen
Reviewed-by: Minchan Kim
Acked-by: Daisuke Nishimura
Cc: Andrea Righi
Cc: Balbir Singh
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2011-01-14 09:32:50 +0800
db16d5ec1 memcg: add page_cgroup flags for dirty page tracking ... Browse Code »

This patchset provides the ability for each cgroup to have independent
dirty page limits.

Limiting dirty memory is like fixing the max amount of dirty (hard to
reclaim) page cache used by a cgroup. So, in case of multiple cgroup
writers, they will not be able to consume more than their designated share
of dirty pages and will be forced to perform write-out if they cross that
limit.

The patches are based on a series proposed by Andrea Righi in Mar 2010.

Overview:

- Add page_cgroup flags to record when pages are dirty, in writeback, or nfs
unstable.

- Extend mem_cgroup to record the total number of pages in each of the
interesting dirty states (dirty, writeback, unstable_nfs).

- Add dirty parameters similar to the system-wide /proc/sys/vm/dirty_*
limits to mem_cgroup. The mem_cgroup dirty parameters are accessible
via cgroupfs control files.

- Consider both system and per-memcg dirty limits in page writeback when
deciding to queue background writeback or block for foreground writeback.

Known shortcomings:

- When a cgroup dirty limit is exceeded, then bdi writeback is employed to
writeback dirty inodes. Bdi writeback considers inodes from any cgroup, not
just inodes contributing dirty pages to the cgroup exceeding its limit.

- When memory.use_hierarchy is set, then dirty limits are disabled. This is a
implementation detail. An enhanced implementation is needed to check the
chain of parents to ensure that no dirty limit is exceeded.

Performance data:
- A page fault microbenchmark workload was used to measure performance, which
can be called in read or write mode:
f = open(foo. $cpu)
truncate(f, 4096)
alarm(60)
while (1) {
p = mmap(f, 4096)
if (write)
*p = 1
else
x = *p
munmap(p)
}

- The workload was called for several points in the patch series in different
modes:
- s_read is a single threaded reader
- s_write is a single threaded writer
- p_read is a 16 thread reader, each operating on a different file
- p_write is a 16 thread writer, each operating on a different file

- Measurements were collected on a 16 core non-numa system using "perf stat
--repeat 3". The -a option was used for parallel (p_*) runs.

- All numbers are page fault rate (M/sec). Higher is better.

- To compare the performance of a kernel without non-memcg compare the first and
last rows, neither has memcg configured. The first row does not include any
of these memcg patches.

- To compare the performance of using memcg dirty limits, compare the baseline
(2nd row titled "w/ memcg") with the the code and memcg enabled (2nd to last
row titled "all patches").

root_cgroup child_cgroup
s_read s_write p_read p_write s_read s_write p_read p_write
mmotm w/o memcg 0.428 0.390 0.429 0.388
mmotm w/ memcg 0.411 0.378 0.391 0.362 0.412 0.377 0.385 0.363
all patches 0.384 0.360 0.370 0.348 0.381 0.363 0.368 0.347
all patches 0.431 0.402 0.427 0.395
w/o memcg

This patch:

Add additional flags to page_cgroup to track dirty pages within a
mem_cgroup.

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrea Righi
Signed-off-by: Greg Thelen
Acked-by: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Thelen
2011-01-14 09:32:50 +0800

25 Nov, 2010

1 commit

112bc2e12 memcg: fix false positive VM_BUG on non-SMP ... Browse Code »

Fix this:

kernel BUG at mm/memcontrol.c:2155!
invalid opcode: 0000 [#1]
last sysfs file:

Pid: 18, comm: sh Not tainted 2.6.37-rc3 #3 /Bochs
EIP: 0060:[] EFLAGS: 00000246 CPU: 0
EIP is at mem_cgroup_move_account+0xe2/0xf0
EAX: 00000004 EBX: c6f931d4 ECX: c681c300 EDX: c681c000
ESI: c681c300 EDI: ffffffea EBP: c681c000 ESP: c46f3e30
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process sh (pid: 18, ti=c46f2000 task=c6826e60 task.ti=c46f2000)
Stack:
00000155 c681c000 0805f000 c46ee180 c46f3e5c c7058820 c1074d37 00000000
08060000 c46db9a0 c46ec080 c7058820 0805f000 08060000 c46f3e98 c1074c50
c106c75e c46f3e98 c46ec080 08060000 0805ffff c46db9a0 c46f3e98 c46e0340
Call Trace:
[] ? mem_cgroup_move_charge_pte_range+0xe7/0x130
[] ? mem_cgroup_move_charge_pte_range+0x0/0x130
[] ? walk_page_range+0xee/0x1d0
[] ? mem_cgroup_move_task+0x66/0x90
[] ? mem_cgroup_move_charge_pte_range+0x0/0x130
[] ? mem_cgroup_move_task+0x0/0x90
[] ? cgroup_attach_task+0x136/0x200
[] ? cgroup_tasks_write+0x48/0xc0
[] ? cgroup_file_write+0xde/0x220
[] ? do_page_fault+0x17d/0x3f0
[] ? alloc_fd+0x2d/0xd0
[] ? cgroup_file_write+0x0/0x220
[] ? vfs_write+0x92/0xc0
[] ? sys_write+0x41/0x70
[] ? syscall_call+0x7/0xb
Code: 03 00 74 09 8b 44 24 04 e8 1c f1 ff ff 89 73 04 8d 86 b0 00 00 00 b9 01 00 00 00 89 da 31 ff e8 65 f5 ff ff e9 4d ff ff ff 0f 0b 0b 0f 0b 0f 0b 90 8d b4 26 00 00 00 00 83 ec 10 8b 0d f4 e3
EIP: [] mem_cgroup_move_account+0xe2/0xf0 SS:ESP 0068:c46f3e30
---[ end trace 7daa1582159b6532 ]---

lock_page_cgroup and unlock_page_cgroup are implemented using
bit_spinlock. bit_spinlock doesn't touch the bit if we are on non-SMP
machine, so we can't use the bit to check whether the lock was taken.

Let's introduce is_page_cgroup_locked based on bit_spin_is_locked instead
of PageCgroupLocked to fix it.

[akpm@linux-foundation.org: s/is_page_cgroup_locked/page_is_cgroup_locked/]
Signed-off-by: Kirill A. Shutemov
Reviewed-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2010-11-25 05:50:40 +0800

28 May, 2010

1 commit

ac39cf8cb memcg: fix mis-accounting of file mapped racy with migration ... Browse Code »

FILE_MAPPED per memcg of migrated file cache is not properly updated,
because our hook in page_add_file_rmap() can't know to which memcg
FILE_MAPPED should be counted.

Basically, this patch is for fixing the bug but includes some big changes
to fix up other messes.

Now, at migrating mapped file, events happen in following sequence.

1. allocate a new page.
2. get memcg of an old page.
3. charge ageinst a new page before migration. But at this point,
no changes to new page's page_cgroup, no commit for the charge.
(IOW, PCG_USED bit is not set.)
4. page migration replaces radix-tree, old-page and new-page.
5. page migration remaps the new page if the old page was mapped.
6. Here, the new page is unlocked.
7. memcg commits the charge for newpage, Mark the new page's page_cgroup
as PCG_USED.

Because "commit" happens after page-remap, we can count FILE_MAPPED
at "5", because we should avoid to trust page_cgroup->mem_cgroup.
if PCG_USED bit is unset.
(Note: memcg's LRU removal code does that but LRU-isolation logic is used
for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is
not on LRU or page_cgroup->mem_cgroup is NULL.)

We can lose file_mapped accounting information at 5 because FILE_MAPPED
is updated only when mapcount changes 0->1. So we should catch it.

BTW, historically, above implemntation comes from migration-failure
of anonymous page. Because we charge both of old page and new page
with mapcount=0, we can't catch
- the page is really freed before remap.
- migration fails but it's freed before remap
or .....corner cases.

New migration sequence with memcg is:

1. allocate a new page.
2. mark PageCgroupMigration to the old page.
3. charge against a new page onto the old page's memcg. (here, new page's pc
is marked as PageCgroupUsed.)
4. page migration replaces radix-tree, page table, etc...
5. At remapping, new page's page_cgroup is now makrked as "USED"
We can catch 0->1 event and FILE_MAPPED will be properly updated.

And we can catch SWAPOUT event after unlock this and freeing this
page by unmap() can be caught.

7. Clear PageCgroupMigration of the old page.

So, FILE_MAPPED will be correctly updated.

Then, for what MIGRATION flag is ?
Without it, at migration failure, we may have to charge old page again
because it may be fully unmapped. "charge" means that we have to dive into
memory reclaim or something complated. So, it's better to avoid
charge it again. Before this patch, __commit_charge() was working for
both of the old/new page and fixed up all. But this technique has some
racy condtion around FILE_MAPPED and SWAPOUT etc...
Now, the kernel use MIGRATION flag and don't uncharge old page until
the end of migration.

I hope this change will make memcg's page migration much simpler. This
page migration has caused several troubles. Worth to add a flag for
simplification.

Reviewed-by: Daisuke Nishimura
Tested-by: Daisuke Nishimura
Reported-by: Daisuke Nishimura
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Christoph Lameter
Cc: "Kirill A. Shutemov"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

akpm@linux-foundation.org
2010-05-28 00:12:44 +0800

07 Apr, 2010

1 commit

8725d5416 memcg: fix race in file_mapped accounting ... Browse Code »

Presently, memcg's FILE_MAPPED accounting has following race with
move_account (happens at rmdir()).

increment page->mapcount (rmap.c)
mem_cgroup_update_file_mapped() move_account()
lock_page_cgroup()
check page_mapped() if
page_mapped(page)>1 {
FILE_MAPPED -1 from old memcg
FILE_MAPPED +1 to old memcg
}
.....
overwrite pc->mem_cgroup
unlock_page_cgroup()
lock_page_cgroup()
FILE_MAPPED + 1 to pc->mem_cgroup
unlock_page_cgroup()

Then,
old memcg (-1 file mapped)
new memcg (+2 file mapped)

This happens because move_account see page_mapped() which is not guarded
by lock_page_cgroup(). This patch adds FILE_MAPPED flag to page_cgroup
and move account information based on it. Now, all checks are synchronous
with lock_page_cgroup().

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Balbir Singh
Reviewed-by: Daisuke Nishimura
Cc: Andrea Righi
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-04-07 23:38:05 +0800

13 Mar, 2010

1 commit

024914477 memcg: move charges of anonymous swap ... Browse Code »

This patch is another core part of this move-charge-at-task-migration
feature. It enables moving charges of anonymous swaps.

To move the charge of swap, we need to exchange swap_cgroup's record.

In current implementation, swap_cgroup's record is protected by:

- page lock: if the entry is on swap cache.
- swap_lock: if the entry is not on swap cache.

This works well in usual swap-in/out activity.

But this behavior make the feature of moving swap charge check many
conditions to exchange swap_cgroup's record safely.

So I changed modification of swap_cgroup's recored(swap_cgroup_record())
to use xchg, and define a new function to cmpxchg swap_cgroup's record.

This patch also enables moving charge of non pte_present but not uncharged
swap caches, which can be exist on swap-out path, by getting the target
pages via find_get_page() as do_mincore() does.

[kosaki.motohiro@jp.fujitsu.com: fix ia64 build]
[akpm@linux-foundation.org: fix typos]
Signed-off-by: Daisuke Nishimura
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Paul Menage
Cc: Daisuke Nishimura
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2010-03-13 07:52:36 +0800

16 Dec, 2009

1 commit

57f9fd7d2 memcg: cleanup mem_cgroup_move_parent() ... Browse Code »

mem_cgroup_move_parent() calls try_charge first and cancel_charge on
failure. IMHO, charge/uncharge(especially charge) is high cost operation,
so we should avoid it as far as possible.

This patch tries to delay try_charge in mem_cgroup_move_parent() by
re-ordering checks it does.

And this patch renames mem_cgroup_move_account() to
__mem_cgroup_move_account(), changes the return value of
__mem_cgroup_move_account() from int to void, and adds a new
wrapper(mem_cgroup_move_account()), which checks whether a @pc is valid
for moving account and calls __mem_cgroup_move_account().

This patch removes the last caller of trylock_page_cgroup(), so removes
its definition too.

Signed-off-by: Daisuke Nishimura
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2009-12-16 23:20:07 +0800

24 Sep, 2009

1 commit

4b3bde4c9 memcg: remove the overhead associated with the root cgroup ... Browse Code »

Change the memory cgroup to remove the overhead associated with accounting
all pages in the root cgroup. As a side-effect, we can no longer set a
memory hard limit in the root cgroup.

A new flag to track whether the page has been accounted or not has been
added as well. Flags are now set atomically for page_cgroup,
pcg_default_flags is now obsolete and removed.

[akpm@linux-foundation.org: fix a few documentation glitches]
Signed-off-by: Balbir Singh
Signed-off-by: Daisuke Nishimura
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Li Zefan
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2009-09-24 22:20:58 +0800

20 Sep, 2009

1 commit

97572751d includecheck fix: include/linux, page_cgroup.h ... Browse Code »

fix the following 'make includecheck' warning:

include/linux/page_cgroup.h: linux/swap.h is included more than once.

Signed-off-by: Jaswinder Singh Rajput
Cc: Li Zefan
Acked-by: Paul Menage

Jaswinder Singh Rajput
2009-09-20 19:27:50 +0800

12 Jun, 2009

1 commit

ca371c0d7 memcg: fix page_cgroup fatal error in FLATMEM ... Browse Code »

Now, SLAB is configured in very early stage and it can be used in
init routine now.

But replacing alloc_bootmem() in FLAT/DISCONTIGMEM's page_cgroup()
initialization breaks the allocation, now.
(Works well in SPARSEMEM case...it supports MEMORY_HOTPLUG and
size of page_cgroup is in reasonable size (< 1 << MAX_ORDER.)

This patch revive FLATMEM+memory cgroup by using alloc_bootmem.

In future,
We stop to support FLATMEM (if no users) or rewrite codes for flatmem
completely.But this will adds more messy codes and overheads.

Reported-by: Li Zefan
Tested-by: Li Zefan
Tested-by: Ingo Molnar
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Pekka Enberg

KAMEZAWA Hiroyuki
2009-06-12 16:00:54 +0800

03 Apr, 2009

1 commit

a3b2d6926 cgroups: use css id in swap cgroup for saving memory v5 ... Browse Code »

Try to use CSS ID for records in swap_cgroup. By this, on 64bit machine,
size of swap_cgroup goes down to 2 bytes from 8bytes.

This means, when 2GB of swap is equipped, (assume the page size is 4096bytes)

From size of swap_cgroup = 2G/4k * 8 = 4Mbytes.
To size of swap_cgroup = 2G/4k * 2 = 1Mbytes.

Reduction is large. Of course, there are trade-offs. This CSS ID will
add overhead to swap-in/swap-out/swap-free.

But in general,
- swap is a resource which the user tend to avoid use.
- If swap is never used, swap_cgroup area is not used.
- Reading traditional manuals, size of swap should be proportional to
size of memory. Memory size of machine is increasing now.

I think reducing size of swap_cgroup makes sense.

Note:
- ID->CSS lookup routine has no locks, it's under RCU-Read-Side.
- memcg can be obsolete at rmdir() but not freed while refcnt from
swap_cgroup is available.

Changelog v4->v5:
- reworked on to memcg-charge-swapcache-to-proper-memcg.patch
Changlog ->v4:
- fixed not configured case.
- deleted unnecessary comments.
- fixed NULL pointer bug.
- fixed message in dmesg.

[nishimura@mxp.nes.nec.co.jp: css_tryget can be called twice in !PageCgroupUsed case]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Cc: Paul Menage
Cc: Hugh Dickins
Signed-off-by: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-04-03 10:04:56 +0800

09 Jan, 2009

2 commits

08e552c69 memcg: synchronized LRU ... Browse Code »
43

A big patch for changing memcg's LRU semantics.

Now,
- page_cgroup is linked to mem_cgroup's its own LRU (per zone).

- LRU of page_cgroup is not synchronous with global LRU.

- page and page_cgroup is one-to-one and statically allocated.

- To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
- lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);

- SwapCache is handled.

And, when we handle LRU list of page_cgroup, we do following.

pc = lookup_page_cgroup(page);
lock_page_cgroup(pc); .....................(1)
mz = page_cgroup_zoneinfo(pc);
spin_lock(&mz->lru_lock);
.....add to LRU
spin_unlock(&mz->lru_lock);
unlock_page_cgroup(pc);

But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.

This is a trial to remove this dirty nesting of locks.
This patch changes mz->lru_lock to be zone->lru_lock.
Then, above sequence will be written as

spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
mem_cgroup_add/remove/etc_lru() {
pc = lookup_page_cgroup(page);
mz = page_cgroup_zoneinfo(pc);
if (PageCgroupUsed(pc)) {
....add to LRU
}
spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU

This is much simpler.
(*) We're safe even if we don't take lock_page_cgroup(pc). Because..
1. When pc->mem_cgroup can be modified.
- at charge.
- at account_move().
2. at charge
the PCG_USED bit is not set before pc->mem_cgroup is fixed.
3. at account_move()
the page is isolated and not on LRU.

Pros.
- easy for maintenance.
- memcg can make use of laziness of pagevec.
- we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
- LRU status of memcg will be synchronized with global LRU's one.
- # of locks are reduced.
- account_move() is simplified very much.
Cons.
- may increase cost of LRU rotation.
(no impact if memcg is not configured.)

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:05 +0800
27a7faa07 memcg: swap cgroup for remembering usage ... Browse Code »

For accounting swap, we need a record per swap entry, at least.

This patch adds following function.
- swap_cgroup_swapon() .... called from swapon
- swap_cgroup_swapoff() ... called at the end of swapoff.

- swap_cgroup_record() .... record information of swap entry.
- swap_cgroup_lookup() .... lookup information of swap entry.

This patch just implements "how to record information". No actual method
for limit the usage of swap. These routine uses flat table to record and
lookup. "wise" lookup system like radix-tree requires requires memory
allocation at new records but swap-out is usually called under memory
shortage (or memcg hits limit.) So, I used static allocation. (maybe
dynamic allocation is not very hard but it adds additional memory
allocation in memory shortage path.)

Note1: In this, we use pointer to record information and this means
8bytes per swap entry. I think we can reduce this when we
create "id of cgroup" in the range of 0-65535 or 0-255.

Reported-by: Daisuke Nishimura
Reviewed-by: Daisuke Nishimura
Tested-by: Daisuke Nishimura
Reported-by: Hugh Dickins
Reported-by: Balbir Singh
Reported-by: Andrew Morton
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Pavel Emelianov
Cc: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:05 +0800

01 Dec, 2008

1 commit

31168481c meminit section warnings ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2008-12-01 02:03:35 +0800

23 Oct, 2008

1 commit

94b6da5ab memcg: fix page_cgroup allocation ... Browse Code »

page_cgroup_init() is called from mem_cgroup_init(). But at this
point, we cannot call alloc_bootmem().
(and this caused panic at boot.)

This patch moves page_cgroup_init() to init/main.c.

Time table is following:
==
parse_args(). # we can trust mem_cgroup_subsys.disabled bit after this.
....
cgroup_init_early() # "early" init of cgroup.
....
setup_arch() # memmap is allocated.
...
page_cgroup_init();
mem_init(); # we cannot call alloc_bootmem after this.
....
cgroup_init() # mem_cgroup is initialized.
==

Before page_cgroup_init(), mem_map must be initialized. So,
I added page_cgroup_init() to init/main.c directly.

(*) maybe this is not very clean but
- cgroup_init_early() is too early
- in cgroup_init(), we have to use vmalloc instead of alloc_bootmem().
use of vmalloc area in x86-32 is important and we should avoid very large
vmalloc() in x86-32. So, we want to use alloc_bootmem() and added page_cgroup_init()
directly to init/main.c

[akpm@linux-foundation.org: remove unneeded/bad mem_cgroup_subsys declaration]
[akpm@linux-foundation.org: fix build]
Acked-by: Balbir Singh
Tested-by: Balbir Singh
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-10-23 23:55:02 +0800

20 Oct, 2008

1 commit

52d4b9ac0 memcg: allocate all page_cgroup at boot ... Browse Code »

Allocate all page_cgroup at boot and remove page_cgroup poitner from
struct page. This patch adds an interface as

struct page_cgroup *lookup_page_cgroup(struct page*)

All FLATMEM/DISCONTIGMEM/SPARSEMEM and MEMORY_HOTPLUG is supported.

Remove page_cgroup pointer reduces the amount of memory by
- 4 bytes per PAGE_SIZE.
- 8 bytes per PAGE_SIZE
if memory controller is disabled. (even if configured.)

On usual 8GB x86-32 server, this saves 8MB of NORMAL_ZONE memory.
On my x86-64 server with 48GB of memory, this saves 96MB of memory.
I think this reduction makes sense.

By pre-allocation, kmalloc/kfree in charge/uncharge are removed.
This means
- we're not necessary to be afraid of kmalloc faiulre.
(this can happen because of gfp_mask type.)
- we can avoid calling kmalloc/kfree.
- we can avoid allocating tons of small objects which can be fragmented.
- we can know what amount of memory will be used for this extra-lru handling.

I added printk message as

"allocated %ld bytes of page_cgroup"
"please try cgroup_disable=memory option if you don't want"

maybe enough informative for users.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-10-20 23:52:39 +0800