Eric Lee / smarc-fsl-linux-kernel

09 Apr, 2008

1 commit

41e3355de memcg: fix node_state handling ... Browse Code »

This should be N_NORMAL_MEMORY.

N_NORMAL_MEMORY is "true" if a node has memory for the kernel. N_HIGH_MEMORY
is "true" if a node has memory for HIGHMEM. (If CONFIG_HIGHMEM=n, always
"true")

This check is used for testing whether we can use kmalloc_node() on a node.
Then, if there is a node which only contains HIGHMEM, the system will call
kmalloc_node() which doesn't contain memory for the kernel. If it happens
under SLUB, the kernel will panic. I think this only happens on x86_32-numa.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-04-09 09:25:53 +0800

05 Apr, 2008

1 commit

4077960e2 memory controller: make memory resource control aware of boot options ... Browse Code »

A boot option for the memory controller was discussed on lkml. It is a good
idea to add it, since it saves memory for people who want to turn off the
memory controller.

By default the option is on for the following two reasons:

1. It provides compatibility with the current scheme where the memory
controller turns on if the config option is enabled
2. It allows for wider testing of the memory controller, once the config
option is enabled

We still allow the create, destroy callbacks to succeed, since they are not
aware of boot options. We do not populate the directory will memory resource
controller specific files.

Signed-off-by: Balbir Singh
Cc: Paul Menage
Cc: Balbir Singh
Cc: Pavel Emelyanov
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc: Sudhir Kumar
Cc: YAMAMOTO Takashi
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2008-04-05 05:46:26 +0800

20 Mar, 2008

1 commit

52ea27eb4 memcgroup: fix check for thread being a group leader in memcgroup ... Browse Code »

The check t->pid == t->pid is not the blessed way to check whether a task is a
group leader.

This is not about the code beautifulness only, but about pid namespaces fixes
- both the tgid and the pid fields on the task_struct are (slowly :( )
becoming deprecated.

Besides, the thread_group_leader() macro makes only one dereference :)

Signed-off-by: Pavel Emelyanov
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-03-20 09:53:35 +0800

05 Mar, 2008

12 commits

fb59e9f1e memcg: fix oops on NULL lru list ... Browse Code »

While testing force_empty, during an exit_mmap, __mem_cgroup_remove_list
called from mem_cgroup_uncharge_page oopsed on a NULL pointer in the lru list.
I couldn't see what racing tasks on other cpus were doing, but surmise that
another must have been in mem_cgroup_charge_common on the same page, between
its unlock_page_cgroup and spin_lock_irqsave near done (thanks to that kzalloc
which I'd almost changed to a kmalloc).

Normally such a race cannot happen, the ref_cnt prevents it, the final
uncharge cannot race with the initial charge. But force_empty buggers the
ref_cnt, that's what it's all about; and thereafter forced pages are
vulnerable to races such as this (just think of a shared page also mapped into
an mm of another mem_cgroup than that just emptied). And remain vulnerable
until they're freed indefinitely later.

This patch just fixes the oops by moving the unlock_page_cgroups down below
adding to and removing from the list (only possible given the previous patch);
and while we're at it, we might as well make it an invariant that
page->page_cgroup is always set while pc is on lru.

But this behaviour of force_empty seems highly unsatisfactory to me: why have
a ref_cnt if we always have to cope with it being violated (as in the earlier
page migration patch). We may prefer force_empty to move pages to an orphan
mem_cgroup (could be the root, but better not), from which other cgroups could
recover them; we might need to reverse the locking again; but no time now for
such concerns.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-03-05 08:35:15 +0800
9b3c0a07e memcg: simplify force_empty and move_lists ... Browse Code »

As for force_empty, though this may not be the main topic here,
mem_cgroup_force_empty_list() can be implemented simpler. It is possible to
make the function just call mem_cgroup_uncharge_page() instead of releasing
page_cgroups by itself. The tip is to call get_page() before invoking
mem_cgroup_uncharge_page(), so the page won't be released during this
function.

Kamezawa-san points out that by the time mem_cgroup_uncharge_page() uncharges,
the page might have been reassigned to an lru of a different mem_cgroup, and
now be emptied from that; but Hugh claims that's okay, the end state is the
same as when it hasn't gone to another list.

And once force_empty stops taking lock_page_cgroup within mz->lru_lock,
mem_cgroup_move_lists() can be simplified to take mz->lru_lock directly while
holding page_cgroup lock (but still has to use try_lock_page_cgroup).

Signed-off-by: Hirokazu Takahashi
Signed-off-by: Hugh Dickins
Cc: David Rientjes
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hirokazu Takahashi
2008-03-05 08:35:15 +0800
2680eed72 memcg: fix mem_cgroup_move_lists locking ... Browse Code »

Ever since the VM_BUG_ON(page_get_page_cgroup(page)) (now Bad page state) went
into page freeing, I've hit it from time to time in testing on some machines,
sometimes only after many days. Recently found a machine which could usually
produce it within a few hours, which got me there at last.

The culprit is mem_cgroup_move_lists, whose locking is inadequate; and the
arrangement of structures was such that you got page_cgroups from the lru list
neatly put on to SLUB's freelist. Kamezawa-san identified the same hole
independently.

The main problem was that it was missing the lock_page_cgroup it needs to
safely page_get_page_cgroup; but it's tricky to go beyond that too, and I
couldn't do it with SLAB_DESTROY_BY_RCU as I'd expected. See the code for
comments on the constraints.

This patch immediately gets replaced by a simpler one from Hirokazu-san; but
is it just foolish pride that tells me to put this one on record, in case we
need to come back to it later?

Signed-off-by: Hugh Dickins
Cc: David Rientjes
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Hirokazu Takahashi
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-03-05 08:35:15 +0800
6d48ff8bc memcg: css_put after remove_list ... Browse Code »

mem_cgroup_uncharge_page does css_put on the mem_cgroup before uncharging from
it, and before removing page_cgroup from one of its lru lists: isn't there a
danger that struct mem_cgroup memory could be freed and reused before
completing that, so corrupting something? Never seen it, and for all I know
there may be other constraints which make it impossible; but let's be
defensive and reverse the ordering there.

mem_cgroup_force_empty_list is safe because there's an extra css_get around
all its works; but even so, change its ordering the same way round, to help
get in the habit of doing it like this.

Signed-off-by: Hugh Dickins
Cc: David Rientjes
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Hirokazu Takahashi
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-03-05 08:35:15 +0800
b9c565d5a memcg: remove clear_page_cgroup and atomics ... Browse Code »

Remove clear_page_cgroup: it's an unhelpful helper, see for example how
mem_cgroup_uncharge_page had to unlock_page_cgroup just in order to call it
(serious races from that? I'm not sure).

Once that's gone, you can see it's pointless for page_cgroup's ref_cnt to be
atomic: it's always manipulated under lock_page_cgroup, except where
force_empty unilaterally reset it to 0 (and how does uncharge's
atomic_dec_and_test protect against that?).

Simplify this page_cgroup locking: if you've got the lock and the pc is
attached, then the ref_cnt must be positive: VM_BUG_ONs to check that, and to
check that pc->page matches page (we're on the way to finding why sometimes it
doesn't, but this patch doesn't fix that).

Signed-off-by: Hugh Dickins
Cc: David Rientjes
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Hirokazu Takahashi
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-03-05 08:35:15 +0800
d5b69e38f memcg: memcontrol uninlined and static ... Browse Code »

More cleanup to memcontrol.c, this time changing some of the code generated.
Let the compiler decide what to inline (except for page_cgroup_locked which is
only used when CONFIG_DEBUG_VM): the __always_inline on lock_page_cgroup etc.
was quite a waste since bit_spin_lock etc. are inlines in a header file; made
mem_cgroup_force_empty and mem_cgroup_write_strategy static.

Signed-off-by: Hugh Dickins
Cc: David Rientjes
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Hirokazu Takahashi
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-03-05 08:35:15 +0800
8869b8f6e memcg: memcontrol whitespace cleanups ... Browse Code »

Sorry, before getting down to more important changes, I'd like to do some
cleanup in memcontrol.c. This patch doesn't change the code generated, but
cleans up whitespace, moves up a double declaration, removes an unused enum,
removes void returns, removes misleading comments, that kind of thing.

Signed-off-by: Hugh Dickins
Cc: David Rientjes
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Cc: Hirokazu Takahashi
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-03-05 08:35:15 +0800
8289546e5 memcg: remove mem_cgroup_uncharge ... Browse Code »

Nothing uses mem_cgroup_uncharge apart from mem_cgroup_uncharge_page, (a
trivial wrapper around it) and mem_cgroup_end_migration (which does the same
as mem_cgroup_uncharge_page). And it often ends up having to lock just to let
its caller unlock. Remove it (but leave the silly locking until a later
patch).

Moved mem_cgroup_cache_charge next to mem_cgroup_charge in memcontrol.h.

Signed-off-by: Hugh Dickins
Cc: David Rientjes
Acked-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Hirokazu Takahashi
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-03-05 08:35:15 +0800
7e924aafa memcg: mem_cgroup_charge never NULL ... Browse Code »

My memcgroup patch to fix hang with shmem/tmpfs added NULL page handling to
mem_cgroup_charge_common. It seemed convenient at the time, but hard to
justify now: there's a perfectly appropriate swappage to charge and uncharge
instead, this is not on any hot path through shmem_getpage, and no performance
hit was observed from the slight extra overhead.

So revert that NULL page handling from mem_cgroup_charge_common; and make it
clearer by bringing page_cgroup_assign_new_page_cgroup into its body - that
was a helper I found more of a hindrance to understanding.

Signed-off-by: Hugh Dickins
Cc: David Rientjes
Acked-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Hirokazu Takahashi
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-03-05 08:35:15 +0800
9442ec9df memcg: bad page if page_cgroup when free ... Browse Code »

Replace free_hot_cold_page's VM_BUG_ON(page_get_page_cgroup(page)) by a "Bad
page state" and clear: most users don't have CONFIG_DEBUG_VM on, and if it
were set here, it'd likely cause corruption when the page is reused.

Don't use page_assign_page_cgroup to clear it: that should be private to
memcontrol.c, and always called with the lock taken; and memmap_init_zone
doesn't need it either - like page->mapping and other pointers throughout the
kernel, Linux assumes pointers in zeroed structures are NULL pointers.

Instead use page_reset_bad_cgroup, added to memcontrol.h for this only.

Signed-off-by: Hugh Dickins
Cc: David Rientjes
Acked-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Hirokazu Takahashi
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-03-05 08:35:15 +0800
427d5416f memcg: move_lists on page not page_cgroup ... Browse Code »

Each caller of mem_cgroup_move_lists is having to use page_get_page_cgroup:
it's more convenient if it acts upon the page itself not the page_cgroup; and
in a later patch this becomes important to handle within memcontrol.c.

Signed-off-by: Hugh Dickins
Cc: David Rientjes
Acked-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Hirokazu Takahashi
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-03-05 08:35:14 +0800
bd845e38c memcg: mm_match_cgroup not vm_match_cgroup ... Browse Code »

vm_match_cgroup is a perverse name for a macro to match mm with cgroup: rename
it mm_match_cgroup, matching mm_init_cgroup and mm_free_cgroup.

Signed-off-by: Hugh Dickins
Acked-by: David Rientjes
Acked-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Hirokazu Takahashi
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-03-05 08:35:14 +0800

24 Feb, 2008

2 commits

2dda81ca3 memcgroup: return negative error code in mem_cgroup_create() ... Browse Code »

Cgroup requires the subsystem to return negative error code on error in the
create method.

Signed-off-by: Li Zefan
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-02-24 09:13:25 +0800
7fde4c3eb memcgroup: remove a useless VM_BUG_ON() ... Browse Code »

Remove this VM_BUG_ON(), as Balbir stated:

We used to have a for loop with !list_empty() as a termination condition
and VM_BUG_ON(!pc) is a spill over. With the new loop, VM_BUG_ON(!pc) does
not make sense.

Signed-off-by: Li Zefan
Acked-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-02-24 09:13:25 +0800

10 Feb, 2008

1 commit

60c12b120 memcontrol: add vm_match_cgroup() ... Browse Code »

mm_cgroup() is exclusively used to test whether an mm's mem_cgroup pointer
is pointing to a specific cgroup. Instead of returning the pointer, we can
just do the test itself in a new macro:

vm_match_cgroup(mm, cgroup)

returns non-zero if the mm's mem_cgroup points to cgroup. Otherwise it
returns zero.

Signed-off-by: David Rientjes
Cc: Balbir Singh
Cc: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2008-02-10 03:08:33 +0800

08 Feb, 2008

22 commits

3c541e14b Memory controller remove control_type feature ... Browse Code »

Based on the discussion at http://lkml.org/lkml/2007/12/20/383, it was felt
that control_type might not be a good thing to implement right away. We
can add this flexibility at a later point when required.

Signed-off-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2008-02-08 00:42:22 +0800
072c56c13 per-zone and reclaim enhancements for memory controller: per-zone-lock for cgroup ... Browse Code »

Now, lru is per-zone.

Then, lru_lock can be (should be) per-zone, too.
This patch implementes per-zone lru lock.

lru_lock is placed into mem_cgroup_per_zone struct.

lock can be accessed by
mz = mem_cgroup_zoneinfo(mem_cgroup, node, zone);
&mz->lru_lock

or
mz = page_cgroup_zoneinfo(page_cgroup);
&mz->lru_lock

Signed-off-by: KAMEZAWA hiroyuki
Cc: "Eric W. Biederman"
Cc: Balbir Singh
Cc: David Rientjes
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Nick Piggin
Cc: Paul Menage
Cc: Pavel Emelianov
Cc: Peter Zijlstra
Cc: Vaidyanathan Srinivasan
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:22 +0800
1ecaab2bd per-zone and reclaim enhancements for memory controller: per zone lru for cgroup ... Browse Code »

This patch implements per-zone lru for memory cgroup.
This patch makes use of mem_cgroup_per_zone struct for per zone lru.

LRU can be accessed by

mz = mem_cgroup_zoneinfo(mem_cgroup, node, zone);
&mz->active_list
&mz->inactive_list

or
mz = page_cgroup_zoneinfo(page_cgroup);
&mz->active_list
&mz->inactive_list

Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Eric W. Biederman"
Cc: Balbir Singh
Cc: David Rientjes
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Nick Piggin
Cc: Paul Menage
Cc: Pavel Emelianov
Cc: Peter Zijlstra
Cc: Vaidyanathan Srinivasan
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:22 +0800
cc38108e1 per-zone and reclaim enhancements for memory controller: calculate the number of… ... Browse Code »

… pages to be scanned per cgroup

Define function for calculating the number of scan target on each Zone/LRU.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Paul Menage <menage@google.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

KAMEZAWA Hiroyuki
2008-02-08 00:42:22 +0800
6c48a1d04 per-zone and reclaim enhancements for memory controller: remember reclaim priority in memory cgroup ... Browse Code »

Functions to remember reclaim priority per cgroup (as zone->prev_priority)

[akpm@linux-foundation.org: build fixes]
[akpm@linux-foundation.org: more build fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Eric W. Biederman"
Cc: Balbir Singh
Cc: David Rientjes
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Nick Piggin
Cc: Paul Menage
Cc: Pavel Emelianov
Cc: Peter Zijlstra
Cc: Vaidyanathan Srinivasan
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:22 +0800
5932f3671 per-zone and reclaim enhancements for memory controller: calculate active/inacti… ... Browse Code »

…ve imbalance per cgroup

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Paul Menage <menage@google.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

KAMEZAWA Hiroyuki
2008-02-08 00:42:21 +0800
58ae83db2 per-zone and reclaim enhancements for memory controller: calculate mapper_ratio per cgroup ... Browse Code »

Define function for calculating mapped_ratio in memory cgroup.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Eric W. Biederman"
Cc: Balbir Singh
Cc: David Rientjes
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Nick Piggin
Cc: Paul Menage
Cc: Pavel Emelianov
Cc: Peter Zijlstra
Cc: Vaidyanathan Srinivasan
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:21 +0800
6d12e2d8d per-zone and reclaim enhancements for memory controller: per-zone active inactive counter ... Browse Code »

This patch adds per-zone status in memory cgroup. These values are often read
(as per-zone value) by page reclaiming.

In current design, per-zone stat is just a unsigned long value and not an
atomic value because they are modified only under lru_lock. (So, atomic_ops
is not necessary.)

This patch adds ACTIVE and INACTIVE per-zone status values.

For handling per-zone status, this patch adds
struct mem_cgroup_per_zone {
...
}
and some helper functions. This will be useful to add per-zone objects
in mem_cgroup.

This patch turns memory controller's early_init to be 0 for calling
kmalloc() in initialization.

Acked-by: Balbir Singh
Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Eric W. Biederman"
Cc: David Rientjes
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Nick Piggin
Cc: Paul Menage
Cc: Pavel Emelianov
Cc: Peter Zijlstra
Cc: Vaidyanathan Srinivasan
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:21 +0800
c0149530d per-zone and reclaim enhancements for memory controller: nid/zid helper function for cgroup ... Browse Code »

Add macro to get node_id and zone_id of page_cgroup. Will be used in
per-zone-xxx patches and others.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Eric W. Biederman"
Cc: Balbir Singh
Cc: David Rientjes
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Nick Piggin
Cc: Paul Menage
Cc: Pavel Emelianov
Cc: Peter Zijlstra
Cc: Vaidyanathan Srinivasan
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:21 +0800
df878fb04 memory cgroup enhancements: implicit force_empty() at rmdir ... Browse Code »

Add pre_destroy handler for mem_cgroup and try to make mem_cgroup empty at
rmdir().

Signed-off-by: KAMEZAWA Hiroyuki
Cc: "Eric W. Biederman"
Cc: Balbir Singh
Cc: David Rientjes
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Nick Piggin
Cc: Paul Menage
Cc: Pavel Emelianov
Cc: Peter Zijlstra
Cc: Vaidyanathan Srinivasan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:20 +0800
d2ceb9b7d memory cgroup enhancements: add memory.stat file ... Browse Code »

Show accounted information of memory cgroup by memory.stat file

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: YAMAMOTO Takashi
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Pavel Emelianov
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Nick Piggin
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: David Rientjes
Cc: Vaidyanathan Srinivasan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:20 +0800
d52aa412d memory cgroup enhancements: add status accounting function for memory cgroup ... Browse Code »

Add statistics account infrastructure for memory controller. All account
information is stored per-cpu and caller will not have to take lock or use
atomic ops. This will be used by memory.stat file later.

CACHE includes swapcache now. I'd like to divide it to
PAGECACHE and SWAPCACHE later.

This patch adds 3 functions for accounting.
* __mem_cgroup_stat_add() ... for usual routine.
* __mem_cgroup_stat_add_safe ... for calling under irq_disabled section.
* mem_cgroup_read_stat() ... for reading stat value.
* renamed PAGECACHE to CACHE (because it may include swapcache *now*)

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix smp_processor_id-in-preemptible]
[akpm@linux-foundation.org: uninline things]
[akpm@linux-foundation.org: remove dead code]
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: YAMAMOTO Takashi
Cc: Balbir Singh
Cc: Pavel Emelianov
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Nick Piggin
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: David Rientjes
Cc: Vaidyanathan Srinivasan
Cc: Kirill Korotaev
Cc: Nick Piggin
Cc: Paul Menage
Cc: Pavel Emelianov
Cc: Peter Zijlstra
Cc: Vaidyanathan Srinivasan
Cc: YAMAMOTO Takashi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:20 +0800
3564c7c45 memory cgroup enhancements: remember "a page is on active list of cgroup or not" ... Browse Code »

Remember page_cgroup is on active_list or not in page_cgroup->flags.

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: YAMAMOTO Takashi
Cc: Balbir Singh
Cc: Pavel Emelianov
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Nick Piggin
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: David Rientjes
Cc: Vaidyanathan Srinivasan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:20 +0800
82369553d memcgroup: fix hang with shmem/tmpfs ... Browse Code »

The memcgroup regime relies upon a cgroup reclaiming pages from itself within
add_to_page_cache: which may involve some waiting. Whereas shmem and tmpfs
rely upon using add_to_page_cache while holding a spinlock: when it cannot
wait. The consequence is that when a cgroup reaches its limit, shmem_getpage
just hangs - unless there is outside memory pressure too, neither kswapd nor
radix_tree_preload get it out of the retry loop.

In most cases we can mem_cgroup_cache_charge the page waitably first, to
attach the page_cgroup in advance, so add_to_page_cache will do no more than
increment a count; then mem_cgroup_uncharge_page after (in both success and
failure cases) to balance the books again.

And where there used to be a congestion_wait for kswapd (recently made
redundant by radix_tree_preload), use mem_cgroup_cache_charge with NULL page
to go through a cycle of allocation and freeing, without accounting to any
particular page, and without updating the statistics vector. This brings the
cgroup below its limit so the next try usually succeeds.

Signed-off-by: Hugh Dickins
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-02-08 00:42:20 +0800
3be91277e memcgroup: tidy up mem_cgroup_charge_common ... Browse Code »

Tidy up mem_cgroup_charge_common before extending it. Adjust some comments,
but mainly clean up its loop: I've an aversion to loops full of continues,
then a break or a goto at the bottom. And the is_atomic test should be on the
__GFP_WAIT bit, not GFP_ATOMIC bits.

Signed-off-by: Hugh Dickins
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-02-08 00:42:20 +0800
ac44d354d Memory controller use rcu_read_lock() in mem_cgroup_cache_charge() ... Browse Code »

Hugh Dickins noticed that we were using rcu_dereference() without
rcu_read_lock() in the cache charging routine. The patch below fixes
this problem

Signed-off-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2008-02-08 00:42:20 +0800
217bc3194 memory cgroup enhancements: remember "a page is charged as page cache" ... Browse Code »

Add a flag to page_cgroup to remember "this page is
charged as cache."
cache here includes page caches and swap cache.
This is useful for implementing precise accounting in memory cgroup.
TODO:
distinguish page-cache and swap-cache

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: YAMAMOTO Takashi
Cc: Balbir Singh
Cc: Pavel Emelianov
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Nick Piggin
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: David Rientjes
Cc: Vaidyanathan Srinivasan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:20 +0800
cc8475822 memory cgroup enhancements: force_empty interface for dropping all account in empty cgroup ... Browse Code »

This patch adds an interface "memory.force_empty". Any write to this file
will drop all charges in this cgroup if there is no task under.

%echo 1 > /....../memory.force_empty

will drop all charges of memory cgroup if cgroup's tasks is empty.

This is useful to invoke rmdir() against memory cgroup successfully.

Tested and worked well on x86_64/fake-NUMA system.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Pavel Emelianov
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Nick Piggin
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: David Rientjes
Cc: Vaidyanathan Srinivasan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:20 +0800
436c6541b memcgroup: fix zone isolation OOM ... Browse Code »

mem_cgroup_charge_common shows a tendency to OOM without good reason, when
a memhog goes well beyond its rss limit but with plenty of swap available.
Seen on x86 but not on PowerPC; seen when the next patch omits swapcache
from memcgroup, but we presume it can happen without.

mem_cgroup_isolate_pages is not quite satisfying reclaim's criteria for OOM
avoidance. Already it has to scan beyond the nr_to_scan limit when it
finds a !LRU page or an active page when handling inactive or an inactive
page when handling active. It needs to do exactly the same when it finds a
page from the wrong zone (the x86 tests had two zones, the PowerPC tests
had only one).

Don't increment scan and then decrement it in these cases, just move the
incrementation down. Fix recent off-by-one when checking against
nr_to_scan. Cut out "Check if the meta page went away from under us",
presumably left over from early debugging: no amount of such checks could
save us if this list really were being updated without locking.

This change does make the unlimited scan while holding two spinlocks
even worse - bad for latency and bad for containment; but that's a
separate issue which is better left to be fixed a little later.

Signed-off-by: Hugh Dickins
Cc: Pavel Emelianov
Acked-by: Balbir Singh
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Nick Piggin
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: David Rientjes
Cc: Vaidyanathan Srinivasan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-02-08 00:42:20 +0800
ff7283fa3 bugfix for memory cgroup controller: avoid !PageLRU page in mem_cgroup_isolate_pages ... Browse Code »

This patch makes mem_cgroup_isolate_pages() to be

- ignore !PageLRU pages.
- fixes the bug that isolation makes no progress if page_zone(page) != zone
page once find. (just increment scan in this case.)

kswapd and memory migration removes a page from list when it handles
a page for reclaiming/migration.

Because __isolate_lru_page() doesn't moves page !PageLRU pages, it will
be safe to avoid touching !PageLRU() page and its page_cgroup.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Pavel Emelianov
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Nick Piggin
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: David Rientjes
Cc: Vaidyanathan Srinivasan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:20 +0800
ae41be374 bugfix for memory cgroup controller: migration under memory controller fix ... Browse Code »

While using memory control cgroup, page-migration under it works as following.
==
1. uncharge all refs at try to unmap.
2. charge regs again remove_migration_ptes()
==
This is simple but has following problems.
==
The page is uncharged and charged back again if *mapped*.
- This means that cgroup before migration can be different from one after
migration
- If page is not mapped but charged as page cache, charge is just ignored
(because not mapped, it will not be uncharged before migration)
This is memory leak.
==
This patch tries to keep memory cgroup at page migration by increasing
one refcnt during it. 3 functions are added.

mem_cgroup_prepare_migration() --- increase refcnt of page->page_cgroup
mem_cgroup_end_migration() --- decrease refcnt of page->page_cgroup
mem_cgroup_page_migration() --- copy page->page_cgroup from old page to
new page.

During migration
- old page is under PG_locked.
- new page is under PG_locked, too.
- both old page and new page is not on LRU.

These 3 facts guarantee that page_cgroup() migration has no race.

Tested and worked well in x86_64/fake-NUMA box.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Pavel Emelianov
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Nick Piggin
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: David Rientjes
Cc: Vaidyanathan Srinivasan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:19 +0800
9175e0311 bugfix for memory controller: add helper function for assigning cgroup to page ... Browse Code »

This patch adds following functions.
- clear_page_cgroup(page, pc)
- page_cgroup_assign_new_page_group(page, pc)

Mainly for cleanup.

A manner "check page->cgroup again after lock_page_cgroup()" is
implemented in straight way.

A comment in mem_cgroup_uncharge() will be removed by force-empty patch

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Pavel Emelianov
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Nick Piggin
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: David Rientjes
Cc: Vaidyanathan Srinivasan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-02-08 00:42:19 +0800