Eric Lee / smarc-fsl-linux-kernel

03 May, 2009

1 commit

00a62ce91 mm: fix Committed_AS underflow on large NR_CPUS environment ... Browse Code »
43

The Committed_AS field can underflow in certain situations:

> # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c
> 1 Committed_AS: 18446744073709323392 kB
> 11 Committed_AS: 18446744073709455488 kB
> 6 Committed_AS: 35136 kB
> 5 Committed_AS: 18446744073709454400 kB
> 7 Committed_AS: 35904 kB
> 3 Committed_AS: 18446744073709453248 kB
> 2 Committed_AS: 34752 kB
> 9 Committed_AS: 18446744073709453248 kB
> 8 Committed_AS: 34752 kB
> 3 Committed_AS: 18446744073709320960 kB
> 7 Committed_AS: 18446744073709454080 kB
> 3 Committed_AS: 18446744073709320960 kB
> 5 Committed_AS: 18446744073709454080 kB
> 6 Committed_AS: 18446744073709320960 kB

Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
not check for underflow.

But NR_CPUS proportional isn't good calculation. In general,
possibility of lock contention is proportional to the number of online
cpus, not theorical maximum cpus (NR_CPUS).

The current kernel has generic percpu-counter stuff. using it is right
way. it makes code simplify and percpu_counter_read_positive() don't
make underflow issue.

Reported-by: Dave Hansen
Signed-off-by: KOSAKI Motohiro
Cc: Eric B Munson
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: [All kernel versions]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-05-03 06:36:10 +0800

03 Apr, 2009

1 commit

266cf658e FS-Cache: Recruit a page flags for cache management ... Browse Code »

Recruit a page flag to aid in cache management. The following extra flag is
defined:

(1) PG_fscache (PG_private_2)

The marked page is backed by a local cache and is pinning resources in the
cache driver.

If PG_fscache is set, then things that checked for PG_private will now also
check for that. This includes things like truncation and page invalidation.
The function page_has_private() had been added to make the checks for both
PG_private and PG_private_2 at the same time.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Rik van Riel
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:36 +0800

01 Apr, 2009

1 commit

d1d748717 mm: remove pagevec_swap_free() ... Browse Code »

pagevec_swap_free() is now unused.

Signed-off-by: KOSAKI Motohiro
Cc: Johannes Weiner
Cc: Rik van Riel
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-04-01 23:59:13 +0800

09 Jan, 2009

3 commits

3e2f41f1f memcg: add zone_reclaim_stat ... Browse Code »

Introduce mem_cgroup_per_zone::reclaim_stat member and its statics
collecting function.

Now, get_scan_ratio() can calculate correct value on memcg reclaim.

[hugh@veritas.com: avoid reclaim_stat oops when disabled]
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Rik van Riel
Signed-off-by: KOSAKI Motohiro
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: Hugh Dickins
Cc: KOSAKI Motohiro
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-01-09 00:31:08 +0800
6e9015716 mm: introduce zone_reclaim struct ... Browse Code »

Add zone_reclam_stat struct for later enhancement.

A later patch uses this. This patch doesn't any behavior change (yet).

Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: KOSAKI Motohiro
Acked-by: Rik van Riel
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: Hugh Dickins
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-01-09 00:31:07 +0800
08e552c69 memcg: synchronized LRU ... Browse Code »
43

A big patch for changing memcg's LRU semantics.

Now,
- page_cgroup is linked to mem_cgroup's its own LRU (per zone).

- LRU of page_cgroup is not synchronous with global LRU.

- page and page_cgroup is one-to-one and statically allocated.

- To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
- lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);

- SwapCache is handled.

And, when we handle LRU list of page_cgroup, we do following.

pc = lookup_page_cgroup(page);
lock_page_cgroup(pc); .....................(1)
mz = page_cgroup_zoneinfo(pc);
spin_lock(&mz->lru_lock);
.....add to LRU
spin_unlock(&mz->lru_lock);
unlock_page_cgroup(pc);

But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.

This is a trial to remove this dirty nesting of locks.
This patch changes mz->lru_lock to be zone->lru_lock.
Then, above sequence will be written as

spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
mem_cgroup_add/remove/etc_lru() {
pc = lookup_page_cgroup(page);
mz = page_cgroup_zoneinfo(pc);
if (PageCgroupUsed(pc)) {
....add to LRU
}
spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU

This is much simpler.
(*) We're safe even if we don't take lock_page_cgroup(pc). Because..
1. When pc->mem_cgroup can be modified.
- at charge.
- at account_move().
2. at charge
the PCG_USED bit is not set before pc->mem_cgroup is fixed.
3. at account_move()
the page is isolated and not on LRU.

Pros.
- easy for maintenance.
- memcg can make use of laziness of pagevec.
- we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
- LRU status of memcg will be synchronized with global LRU's one.
- # of locks are reduced.
- account_move() is simplified very much.
Cons.
- may increase cost of LRU rotation.
(no impact if memcg is not configured.)

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:05 +0800

07 Jan, 2009

3 commits

a2c43eed8 mm: try_to_free_swap replaces remove_exclusive_swap_page ... Browse Code »

remove_exclusive_swap_page(): its problem is in living up to its name.

It doesn't matter if someone else has a reference to the page (raised
page_count); it doesn't matter if the page is mapped into userspace
(raised page_mapcount - though that hints it may be worth keeping the
swap): all that matters is that there be no more references to the swap
(and no writeback in progress).

swapoff (try_to_unuse) has been removing pages from swapcache for years,
with no concern for page count or page mapcount, and we used to have a
comment in lookup_swap_cache() recognizing that: if you go for a page of
swapcache, you'll get the right page, but it could have been removed from
swapcache by the time you get page lock.

So, give up asking for exclusivity: get rid of
remove_exclusive_swap_page(), and remove_exclusive_swap_page_ref() and
remove_exclusive_swap_page_count() which were spawned for the recent LRU
work: replace them by the simpler try_to_free_swap() which just checks
page_swapcount().

Similarly, remove the page_count limitation from free_swap_and_count(),
but assume that it's worth holding on to the swap if page is mapped and
swap nowhere near full. Add a vm_swap_full() test in free_swap_cache()?
It would be consistent, but I think we probably have enough for now.

Signed-off-by: Hugh Dickins
Cc: Lee Schermerhorn
Cc: Rik van Riel
Cc: Nick Piggin
Cc: KAMEZAWA Hiroyuki
Cc: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-01-07 07:59:03 +0800
b5934c531 mm: add_active_or_unevictable into rmap ... Browse Code »

lru_cache_add_active_or_unevictable() and page_add_new_anon_rmap() always
appear together. Save some symbol table space and some jumping around by
removing lru_cache_add_active_or_unevictable(), folding its code into
page_add_new_anon_rmap(): like how we add file pages to lru just after
adding them to page cache.

Remove the nearby "TODO: is this safe?" comments (yes, it is safe), and
change page_add_new_anon_rmap()'s address BUG_ON to VM_BUG_ON as
originally intended.

Signed-off-by: Hugh Dickins
Acked-by: Rik van Riel
Cc: Lee Schermerhorn
Cc: Nick Piggin
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-01-07 07:59:02 +0800
1b0bd1188 mm: get rid of pagevec_release_nonlru() ... Browse Code »

speculative page references patch (commit:
e286781d5f2e9c846e012a39653a166e9d31777d) removed last
pagevec_release_nonlru() caller.

So this function can be removed now.

This patch doesn't have any functional change.

Signed-off-by: KOSAKI Motohiro
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-01-07 07:59:00 +0800

11 Dec, 2008

1 commit

6841c8e26 mm: remove UP version of lru_add_drain_all() ... Browse Code »

Currently, lru_add_drain_all() has two version.
(1) use schedule_on_each_cpu()
(2) don't use schedule_on_each_cpu()

Gerald Schaefer reported it doesn't work well on SMP (not NUMA) S390
machine.

offline_pages() calls lru_add_drain_all() followed by drain_all_pages().
While drain_all_pages() works on each cpu, lru_add_drain_all() only runs
on the current cpu for architectures w/o CONFIG_NUMA. This let us run
into the BUG_ON(!PageBuddy(page)) in __offline_isolated_pages() during
memory hotplug stress test on s390. The page in question was still on the
pcp list, because of a race with lru_add_drain_all() and drain_all_pages()
on different cpus.

Actually, Almost machine has CONFIG_UNEVICTABLE_LRU=y. Then almost machine use
(1) version lru_add_drain_all although the machine is UP.

Then this ifdef is not valueable.
simple removing is better.

Signed-off-by: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Lee Schermerhorn
Acked-by: Gerald Schaefer
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2008-12-11 00:01:53 +0800

03 Dec, 2008

1 commit

9ff473b9a vmscan: evict streaming IO first ... Browse Code »

Count the insertion of new pages in the statistics used to drive the
pageout scanning code. This should help the kernel quickly evict
streaming file IO.

We count on the fact that new file pages start on the inactive file LRU
and new anonymous pages start on the active anon list. This means
streaming file IO will increment the recent scanned file statistic, while
leaving the recent rotated file statistic alone, driving pageout scanning
to the file LRUs.

Pageout activity does its own list manipulation.

Signed-off-by: Rik van Riel
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Tested-by: Gene Heskett
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2008-12-03 07:50:40 +0800

20 Oct, 2008

7 commits

64d6519dd swap: cull unevictable pages in fault path ... Browse Code »

In the fault paths that install new anonymous pages, check whether the
page is evictable or not using lru_cache_add_active_or_unevictable(). If
the page is evictable, just add it to the active lru list [via the pagevec
cache], else add it to the unevictable list.

This "proactive" culling in the fault path mimics the handling of mlocked
pages in Nick Piggin's series to keep mlocked pages off the lru lists.

Notes:

1) This patch is optional--e.g., if one is concerned about the
additional test in the fault path. We can defer the moving of
nonreclaimable pages until when vmscan [shrink_*_list()]
encounters them. Vmscan will only need to handle such pages
once, but if there are a lot of them it could impact system
performance.

2) The 'vma' argument to page_evictable() is require to notice that
we're faulting a page into an mlock()ed vma w/o having to scan the
page's rmap in the fault path. Culling mlock()ed anon pages is
currently the only reason for this patch.

3) We can't cull swap pages in read_swap_cache_async() because the
vma argument doesn't necessarily correspond to the swap cache
offset passed in by swapin_readahead(). This could [did!] result
in mlocking pages in non-VM_LOCKED vmas if [when] we tried to
cull in this path.

4) Move set_pte_at() to after where we add page to lru to keep it
hidden from other tasks that might walk the page table.
We already do it in this order in do_anonymous() page. And,
these are COW'd anon pages. Is this safe?

[riel@redhat.com: undo an overzealous code cleanup]
Signed-off-by: Lee Schermerhorn
Signed-off-by: Rik van Riel
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2008-10-20 23:52:31 +0800
b291f0003 mlock: mlocked pages are unevictable ... Browse Code »

Make sure that mlocked pages also live on the unevictable LRU, so kswapd
will not scan them over and over again.

This is achieved through various strategies:

1) add yet another page flag--PG_mlocked--to indicate that
the page is locked for efficient testing in vmscan and,
optionally, fault path. This allows early culling of
unevictable pages, preventing them from getting to
page_referenced()/try_to_unmap(). Also allows separate
accounting of mlock'd pages, as Nick's original patch
did.

Note: Nick's original mlock patch used a PG_mlocked
flag. I had removed this in favor of the PG_unevictable
flag + an mlock_count [new page struct member]. I
restored the PG_mlocked flag to eliminate the new
count field.

2) add the mlock/unevictable infrastructure to mm/mlock.c,
with internal APIs in mm/internal.h. This is a rework
of Nick's original patch to these files, taking into
account that mlocked pages are now kept on unevictable
LRU list.

3) update vmscan.c:page_evictable() to check PageMlocked()
and, if vma passed in, the vm_flags. Note that the vma
will only be passed in for new pages in the fault path;
and then only if the "cull unevictable pages in fault
path" patch is included.

4) add try_to_unlock() to rmap.c to walk a page's rmap and
ClearPageMlocked() if no other vmas have it mlocked.
Reuses as much of try_to_unmap() as possible. This
effectively replaces the use of one of the lru list links
as an mlock count. If this mechanism let's pages in mlocked
vmas leak through w/o PG_mlocked set [I don't know that it
does], we should catch them later in try_to_unmap(). One
hopes this will be rare, as it will be relatively expensive.

Original mm/internal.h, mm/rmap.c and mm/mlock.c changes:
Signed-off-by: Nick Piggin

splitlru: introduce __get_user_pages():

New munlock processing need to GUP_FLAGS_IGNORE_VMA_PERMISSIONS.
because current get_user_pages() can't grab PROT_NONE pages theresore it
cause PROT_NONE pages can't munlock.

[akpm@linux-foundation.org: fix this for pagemap-pass-mm-into-pagewalkers.patch]
[akpm@linux-foundation.org: untangle patch interdependencies]
[akpm@linux-foundation.org: fix things after out-of-order merging]
[hugh@veritas.com: fix page-flags mess]
[lee.schermerhorn@hp.com: fix munlock page table walk - now requires 'mm']
[kosaki.motohiro@jp.fujitsu.com: build fix]
[kosaki.motohiro@jp.fujitsu.com: fix truncate race and sevaral comments]
[kosaki.motohiro@jp.fujitsu.com: splitlru: introduce __get_user_pages()]
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Rik van Riel
Signed-off-by: Lee Schermerhorn
Cc: Nick Piggin
Cc: Dave Hansen
Cc: Matt Mackall
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2008-10-20 23:52:30 +0800
894bc3104 Unevictable LRU Infrastructure ... Browse Code »

When the system contains lots of mlocked or otherwise unevictable pages,
the pageout code (kswapd) can spend lots of time scanning over these
pages. Worse still, the presence of lots of unevictable pages can confuse
kswapd into thinking that more aggressive pageout modes are required,
resulting in all kinds of bad behaviour.

Infrastructure to manage pages excluded from reclaim--i.e., hidden from
vmscan. Based on a patch by Larry Woodman of Red Hat. Reworked to
maintain "unevictable" pages on a separate per-zone LRU list, to "hide"
them from vmscan.

Kosaki Motohiro added the support for the memory controller unevictable
lru list.

Pages on the unevictable list have both PG_unevictable and PG_lru set.
Thus, PG_unevictable is analogous to and mutually exclusive with
PG_active--it specifies which LRU list the page is on.

The unevictable infrastructure is enabled by a new mm Kconfig option
[CONFIG_]UNEVICTABLE_LRU.

A new function 'page_evictable(page, vma)' in vmscan.c tests whether or
not a page may be evictable. Subsequent patches will add the various
!evictable tests. We'll want to keep these tests light-weight for use in
shrink_active_list() and, possibly, the fault path.

To avoid races between tasks putting pages [back] onto an LRU list and
tasks that might be moving the page from non-evictable to evictable state,
the new function 'putback_lru_page()' -- inverse to 'isolate_lru_page()'
-- tests the "evictability" of a page after placing it on the LRU, before
dropping the reference. If the page has become unevictable,
putback_lru_page() will redo the 'putback', thus moving the page to the
unevictable list. This way, we avoid "stranding" evictable pages on the
unevictable list.

[akpm@linux-foundation.org: fix fallout from out-of-order merge]
[riel@redhat.com: fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build]
[nishimura@mxp.nes.nec.co.jp: remove redundant mapping check]
[kosaki.motohiro@jp.fujitsu.com: unevictable-lru-infrastructure: putback_lru_page()/unevictable page handling rework]
[kosaki.motohiro@jp.fujitsu.com: kill unnecessary lock_page() in vmscan.c]
[kosaki.motohiro@jp.fujitsu.com: revert migration change of unevictable lru infrastructure]
[kosaki.motohiro@jp.fujitsu.com: revert to unevictable-lru-infrastructure-kconfig-fix.patch]
[kosaki.motohiro@jp.fujitsu.com: restore patch failure of vmstat-unevictable-and-mlocked-pages-vm-events.patch]
Signed-off-by: Lee Schermerhorn
Signed-off-by: Rik van Riel
Signed-off-by: KOSAKI Motohiro
Debugged-by: Benjamin Kidwell
Signed-off-by: Daisuke Nishimura
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2008-10-20 23:50:26 +0800
4f98a2fee vmscan: split LRU lists into anon & file sets ... Browse Code »

Split the LRU lists in two, one set for pages that are backed by real file
systems ("file") and one for pages that are backed by memory and swap
("anon"). The latter includes tmpfs.

The advantage of doing this is that the VM will not have to scan over lots
of anonymous pages (which we generally do not want to swap out), just to
find the page cache pages that it should evict.

This patch has the infrastructure and a basic policy to balance how much
we scan the anon lists and how much we scan the file lists. The big
policy changes are in separate patches.

[lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]
[kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]
[kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page]
[hugh@veritas.com: memcg swapbacked pages active]
[hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]
[akpm@linux-foundation.org: fix /proc/vmstat units]
[nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]
[kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]
[kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]
Signed-off-by: Rik van Riel
Signed-off-by: Lee Schermerhorn
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Hugh Dickins
Signed-off-by: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2008-10-20 23:50:25 +0800
68a22394c vmscan: free swap space on swap-in/activation ... Browse Code »

If vm_swap_full() (swap space more than 50% full), the system will free
swap space at swapin time. With this patch, the system will also free the
swap space in the pageout code, when we decide that the page is not a
candidate for swapout (and just wasting swap space).

Signed-off-by: Rik van Riel
Signed-off-by: Lee Schermerhorn
Signed-off-by: MinChan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2008-10-20 23:50:25 +0800
f04e9ebbe swap: use an array for the LRU pagevecs ... Browse Code »

Turn the pagevecs into an array just like the LRUs. This significantly
cleans up the source code and reduces the size of the kernel by about 13kB
after all the LRU lists have been created further down in the split VM
patch series.

Signed-off-by: Rik van Riel
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2008-10-20 23:50:25 +0800
b69408e88 vmscan: Use an indexed array for LRU variables ... Browse Code »

Currently we are defining explicit variables for the inactive and active
list. An indexed array can be more generic and avoid repeating similar
code in several places in the reclaim code.

We are saving a few bytes in terms of code size:

Before:

text data bss dec hex filename
4097753 573120 4092484 8763357 85b7dd vmlinux

After:

text data bss dec hex filename
4097729 573120 4092484 8763333 85b7c5 vmlinux

Having an easy way to add new lru lists may ease future work on the
reclaim code.

Signed-off-by: Rik van Riel
Signed-off-by: Lee Schermerhorn
Signed-off-by: Christoph Lameter
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2008-10-20 23:50:25 +0800

05 Aug, 2008

1 commit

529ae9aaa mm: rename page trylock ... Browse Code »

Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).

This also facilitates lockdeping of page lock.

Signed-off-by: Nick Piggin
Acked-by: KOSAKI Motohiro
Acked-by: Peter Zijlstra
Acked-by: Andrew Morton
Acked-by: Benjamin Herrenschmidt
Signed-off-by: Linus Torvalds

Nick Piggin
2008-08-05 12:31:34 +0800

31 Jul, 2008

1 commit

ab33dc09a swap: update function comment of release_pages ... Browse Code »

Signed-off-by: Fernando Luis Vazquez Cao
Cc: Hugh Dickins
Cc: Nick Piggin
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fernando Luis Vazquez Cao
2008-07-31 00:41:46 +0800

25 Jul, 2008

1 commit

f84f9504b mm: remove initialization of static per-cpu variables ... Browse Code »

This was required by some old, no-longer-used gcc on sparc.

Signed-off-by: Vegard Nossum
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vegard Nossum
2008-07-25 01:47:21 +0800

25 May, 2008

1 commit

80119ef5c mm: fix atomic_t overflow in vm ... Browse Code »

The atomic_t type is 32bit but a 64bit system can have more than 2^32
pages of virtual address space available. Without this we overflow on
ludicrously large mappings

Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2008-05-25 00:56:09 +0800

28 Apr, 2008

1 commit

ac6aadb24 mm: rotate_reclaimable_page() cleanup ... Browse Code »

Clean up messy conditional calling of test_clear_page_writeback() from both
rotate_reclaimable_page() and end_page_writeback().

The only user of rotate_reclaimable_page() is end_page_writeback() so this is
OK.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-04-28 23:58:20 +0800

20 Mar, 2008

1 commit

7682486b3 mm: fix various kernel-doc comments ... Browse Code »

Fix various kernel-doc notation in mm/:

filemap.c: add function short description; convert 2 to kernel-doc
fremap.c: change parameter 'prot' to @prot
pagewalk.c: change "-" in function parameters to ":"
slab.c: fix short description of kmem_ptr_validate()
swap.c: fix description & parameters of put_pages_list()
swap_state.c: fix function parameters
vmalloc.c: change "@returns" to "Returns:" since that is not a parameter

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2008-03-20 09:53:35 +0800

05 Mar, 2008

1 commit

427d5416f memcg: move_lists on page not page_cgroup ... Browse Code »

Each caller of mem_cgroup_move_lists is having to use page_get_page_cgroup:
it's more convenient if it acts upon the page itself not the page_cgroup; and
in a later patch this becomes important to handle within memcontrol.c.

Signed-off-by: Hugh Dickins
Cc: David Rientjes
Acked-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Hirokazu Takahashi
Cc: YAMAMOTO Takashi
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-03-05 08:35:14 +0800

08 Feb, 2008

1 commit

66e1707bc Memory controller: add per cgroup LRU and reclaim ... Browse Code »

Add the page_cgroup to the per cgroup LRU. The reclaim algorithm has
been modified to make the isolate_lru_pages() as a pluggable component. The
scan_control data structure now accepts the cgroup on behalf of which
reclaims are carried out. try_to_free_pages() has been extended to become
cgroup aware.

[akpm@linux-foundation.org: fix warning]
[Lee.Schermerhorn@hp.com: initialize all scan_control's isolate_pages member]
[bunk@kernel.org: make do_try_to_free_pages() static]
[hugh@veritas.com: memcgroup: fix try_to_free order]
[kamezawa.hiroyu@jp.fujitsu.com: this unlock_page_cgroup() is unnecessary]
Signed-off-by: Pavel Emelianov
Signed-off-by: Balbir Singh
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: "Eric W. Biederman"
Cc: Nick Piggin
Cc: Kirill Korotaev
Cc: Herbert Poetzl
Cc: David Rientjes
Cc: Vaidyanathan Srinivasan
Signed-off-by: Lee Schermerhorn
Signed-off-by: Hugh Dickins
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Balbir Singh
2008-02-08 00:42:18 +0800

06 Feb, 2008

1 commit

920c7a5d0 mm: remove fastcall from mm/ ... Browse Code »

fastcall is always defined to be empty, remove it

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Harvey Harrison
2008-02-06 01:44:18 +0800

20 Oct, 2007

1 commit

183ff22bb spelling fixes: mm/ ... Browse Code »

Spelling fixes in mm/.

Signed-off-by: Simon Arlott
Signed-off-by: Adrian Bunk

Simon Arlott
2007-10-20 07:27:18 +0800

17 Oct, 2007

3 commits

e0bf68dde mm: bdi init hooks ... Browse Code »

provide BDI constructor/destructor hooks

[akpm@linux-foundation.org: compile fix]
Signed-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-10-17 23:42:45 +0800
902aaed0d mm: use pagevec to rotate reclaimable page ... Browse Code »

While running some memory intensive load, system response deteriorated just
after swap-out started.

The cause of this problem is that when a PG_reclaim page is moved to the tail
of the inactive LRU list in rotate_reclaimable_page(), lru_lock spin lock is
acquired every page writeback . This deteriorates system performance and
makes interrupt hold off time longer when swap-out started.

Following patch solves this problem. I use pagevec in rotating reclaimable
pages to mitigate LRU spin lock contention and reduce interrupt hold off time.

I did a test that allocating and touching pages in multiple processes, and
pinging to the test machine in flooding mode to measure response under memory
intensive load.

The test result is:

-2.6.23-rc5
--- testmachine ping statistics ---
3000 packets transmitted, 3000 received, 0% packet loss, time 53222ms
rtt min/avg/max/mdev = 0.074/0.652/172.228/7.176 ms, pipe 11, ipg/ewma
17.746/0.092 ms

-2.6.23-rc5-patched
--- testmachine ping statistics ---
3000 packets transmitted, 3000 received, 0% packet loss, time 51924ms
rtt min/avg/max/mdev = 0.072/0.108/3.884/0.114 ms, pipe 2, ipg/ewma
17.314/0.091 ms

Max round-trip-time was improved.

The test machine spec is that 4CPU(3.16GHz, Hyper-threading enabled)
8GB memory , 8GB swap.

I did ping test again to observe performance deterioration caused by taking
a ref.

-2.6.23-rc6-with-modifiedpatch
--- testmachine ping statistics ---
3000 packets transmitted, 3000 received, 0% packet loss, time 53386ms
rtt min/avg/max/mdev = 0.074/0.110/4.716/0.147 ms, pipe 2, ipg/ewma 17.801/0.129 ms

The result for my original patch is as follows.

-2.6.23-rc5-with-originalpatch
--- testmachine ping statistics ---
3000 packets transmitted, 3000 received, 0% packet loss, time 51924ms
rtt min/avg/max/mdev = 0.072/0.108/3.884/0.114 ms, pipe 2, ipg/ewma 17.314/0.091 ms

The influence to response was small.

[akpm@linux-foundation.org: fix uninitalised var warning]
[hugh@veritas.com: fix locking]
[randy.dunlap@oracle.com: fix function declaration]
[hugh@veritas.com: fix BUG at include/linux/mm.h:220!]
[hugh@veritas.com: kill redundancy in rotate_reclaimable_page]
[hugh@veritas.com: move_tail_pages into lru_add_drain]
Signed-off-by: Hisashi Hifumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hisashi Hifumi
2007-10-17 00:42:54 +0800
43fac94dd Clean up duplicate includes in mm/ ... Browse Code »

This patch cleans up duplicate includes in
mm/

Signed-off-by: Jesper Juhl
Acked-by: Paul Mundt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2007-10-17 00:42:52 +0800

10 May, 2007

1 commit

8bb784428 Add suspend-related notifications for CPU hotplug ... Browse Code »

Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki
Cc: Gautham R Shenoy
Cc: Pavel Machek
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2007-05-10 03:30:56 +0800

08 May, 2007

1 commit

d85f33855 Make page->private usable in compound pages ... Browse Code »

If we add a new flag so that we can distinguish between the first page and the
tail pages then we can avoid to use page->private in the first page.
page->private == page for the first page, so there is no real information in
there.

Freeing up page->private makes the use of compound pages more transparent.
They become more usable like real pages. Right now we have to be careful f.e.
if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
can then no longer use the private field. This is one of the issues that
cause us not to support debugging for page size slabs in SLAB.

Having page->private available for SLUB would allow more meta information in
the page struct. I can probably avoid the 16 bit ints that I have in there
right now.

Also if page->private is available then a compound page may be equipped with
buffer heads. This may free up the way for filesystems to support larger
blocks than page size.

We add PageTail as an alias of PageReclaim. Compound pages cannot currently
be reclaimed. Because of the alias one needs to check PageCompound first.

The RFC for the this approach was discussed at
http://marc.info/?t=117574302800001&r=1&w=2

[nacc@us.ibm.com: fix hugetlbfs]
Signed-off-by: Christoph Lameter
Signed-off-by: Nishanth Aravamudan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-05-08 03:12:53 +0800

08 Dec, 2006

2 commits

023160678 [PATCH] hotplug CPU: clean up hotcpu_notifier() use ... Browse Code »

There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
generating compiler warnings of unused symbols, hence forcing people to add
#ifdefs.

the compiler can skip truly unused functions just fine:

text data bss dec hex filename
1624412 728710 3674856 6027978 5bfaca vmlinux.before
1624412 728710 3674856 6027978 5bfaca vmlinux.after

[akpm@osdl.org: topology.c fix]
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-12-08 00:39:39 +0800
33f2ef89f [PATCH] mm: make compound page destructor handling explicit ... Browse Code »

Currently we we use the lru head link of the second page of a compound page
to hold its destructor. This was ok when it was purely an internal
implmentation detail. However, hugetlbfs overrides this destructor
violating the layering. Abstract this out as explicit calls, also
introduce a type for the callback function allowing them to be type
checked. For each callback we pre-declare the function, causing a type
error on definition rather than on use elsewhere.

[akpm@osdl.org: cleanups]
Signed-off-by: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Whitcroft
2006-12-08 00:39:25 +0800

22 Nov, 2006

1 commit

c4028958b WorkStruct: make allyesconfig ... Browse Code »

Fix up for make allyesconfig.

Signed-Off-By: David Howells

David Howells
2006-11-22 22:57:56 +0800

26 Sep, 2006

2 commits

b221385bc [PATCH] mm/: make functions static ... Browse Code »

This patch makes the following needlessly global functions static:
- slab.c: kmem_find_general_cachep()
- swap.c: __page_cache_release()
- vmalloc.c: __vmalloc_node()

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2006-09-26 23:48:45 +0800
725d704ec [PATCH] mm: VM_BUG_ON ... Browse Code »

Introduce a VM_BUG_ON, which is turned on with CONFIG_DEBUG_VM. Use this
in the lightweight, inline refcounting functions; PageLRU and PageActive
checks in vmscan, because they're pretty well confined to vmscan. And in
page allocate/free fastpaths which can be the hottest parts of the kernel
for kbuilds.

Unlike BUG_ON, VM_BUG_ON must not be used to execute statements with
side-effects, and should not be used outside core mm code.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-09-26 23:48:44 +0800

15 Aug, 2006

1 commit

1d7ea7324 [PATCH] fuse: fix error case in fuse_readpages ... Browse Code »

Don't let fuse_readpages leave the @pages list not empty when exiting
on error.

[akpm@osdl.org: kernel-doc fixes]
Signed-off-by: Alexander Zarochentsev
Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman

Alexander Zarochentsev
2006-08-15 03:54:29 +0800

01 Jul, 2006

1 commit

f8891e5e1 [PATCH] Light weight event counters ... Browse Code »

The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat. They have no
essential function for the VM.

We use a simple increment of per cpu variables. In order to avoid the most
severe races we disable preempt. Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter. However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.

In the non preempt case this results in a simple increment for each
counter. For many architectures this will be reduced by the compiler to a
single instruction. This single instruction is atomic for i386 and x86_64.
And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.

The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.

The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).

Benefits:
- VM event counter operations usually reduce to a single inline instruction
on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.

References:

RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-01 02:25:36 +0800