Doug / smarc-fsl-linux-kernel | Embedian Git Server

17 Jun, 2009

40 commits

aca8bf323 mm: remove file argument from swap_readpage() ... Browse Code »

The file argument resulted from address_space's readpage long time ago.

We don't use it any more. Let's remove unnecessary argement.

Signed-off-by: Minchan Kim
Acked-by: Hugh Dickins
Reviewed-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2009-06-17 10:47:44 +0800
8192da6a8 mm: remove annotation of gfp_mask in add_to_swap ... Browse Code »

Hugh removed add_to_swap's gfp_mask argument. (mm: remove gfp_mask from
add_to_swap) So we have to remove annotation of gfp_mask of the function.

Signed-off-by: Minchan Kim
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2009-06-17 10:47:43 +0800
73d60b7f7 page-allocator: clear N_HIGH_MEMORY map before we set it again ... Browse Code »

SRAT tables may contains nodes of very small size. The arch code may
decide to not activate such a node. However, currently the early boot
code sets N_HIGH_MEMORY for such nodes. These nodes therefore seem to be
active although these nodes have no present pages.

For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too

Signed-off-by: Yinghai Lu
Tested-by: Jack Steiner
Acked-by: Christoph Lameter
Cc: Mel Gorman
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2009-06-17 10:47:43 +0800
286973552 mm: remove __invalidate_mapping_pages variant ... Browse Code »

Remove __invalidate_mapping_pages atomic variant now that its sole caller
can sleep (fixed in eccb95cee4f0d56faa46ef22fb94dd4a3578d3eb ("vfs: fix
lock inversion in drop_pagecache_sb()")).

This fixes softlockups that can occur while in the drop_caches path.

Signed-off-by: Mike Waychison
Cc: Jan Kara
Cc: Wu Fengguang
Cc: Dave Chinner
Cc: Nick Piggin
Acked-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Waychison
2009-06-17 10:47:43 +0800
82553a937 oom: invoke oom killer for __GFP_NOFAIL ... Browse Code »

The oom killer must be invoked regardless of the order if the allocation
is __GFP_NOFAIL, otherwise it will loop forever when reclaim fails to free
some memory.

Cc: Nick Piggin
Acked-by: Rik van Riel
Acked-by: Mel Gorman
Cc: Peter Zijlstra
Cc: Christoph Lameter
Cc: Dave Hansen
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2009-06-17 10:47:43 +0800
4d8b9135c oom: avoid unnecessary mm locking and scanning for OOM_DISABLE ... Browse Code »

This moves the check for OOM_DISABLE to the badness heuristic so it is
only necessary to hold task_lock() once. If the mm is OOM_DISABLE, the
score is 0, which is also correctly exported via /proc/pid/oom_score.
This requires that tasks with badness scores of 0 are prohibited from
being oom killed, which makes sense since they would not allow for future
memory freeing anyway.

Since the oom_adj value is a characteristic of an mm and not a task, it is
no longer necessary to check the oom_adj value for threads sharing the
same memory (except when simply issuing SIGKILLs for threads in other
thread groups).

Cc: Nick Piggin
Cc: Rik van Riel
Cc: Mel Gorman
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2009-06-17 10:47:43 +0800
2ff05b2b4 oom: move oom_adj value from task_struct to mm_struct ... Browse Code »

The per-task oom_adj value is a characteristic of its mm more than the
task itself since it's not possible to oom kill any thread that shares the
mm. If a task were to be killed while attached to an mm that could not be
freed because another thread were set to OOM_DISABLE, it would have
needlessly been terminated since there is no potential for future memory
freeing.

This patch moves oomkilladj (now more appropriately named oom_adj) from
struct task_struct to struct mm_struct. This requires task_lock() on a
task to check its oom_adj value to protect against exec, but it's already
necessary to take the lock when dereferencing the mm to find the total VM
size for the badness heuristic.

This fixes a livelock if the oom killer chooses a task and another thread
sharing the same memory has an oom_adj value of OOM_DISABLE. This occurs
because oom_kill_task() repeatedly returns 1 and refuses to kill the
chosen task while select_bad_process() will repeatedly choose the same
task during the next retry.

Taking task_lock() in select_bad_process() to check for OOM_DISABLE and in
oom_kill_task() to check for threads sharing the same memory will be
removed in the next patch in this series where it will no longer be
necessary.

Writing to /proc/pid/oom_adj for a kthread will now return -EINVAL since
these threads are immune from oom killing already. They simply report an
oom_adj value of OOM_DISABLE.

Cc: Nick Piggin
Cc: Rik van Riel
Cc: Mel Gorman
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2009-06-17 10:47:43 +0800
c9e444103 mm: reuse unused swap entry if necessary ... Browse Code »

Presently we can know a swap entry is just used as SwapCache via swap_map,
without looking up swap cache.

Then, we have a chance to reuse swap-cache-only swap entries in
get_swap_pages().

This patch tries to free swap-cache-only swap entries if swap is not
enough.

Note: We hit following path when swap_cluster code cannot find a free
cluster. Then, vm_swap_full() is not only condition to allow the kernel
to reclaim unused swap.

Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Li Zefan
Cc: Dhaval Giani
Cc: YAMAMOTO Takashi
Tested-by: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-06-17 10:47:42 +0800
355cfa73d mm: modify swap_map and add SWAP_HAS_CACHE flag ... Browse Code »

This is a part of the patches for fixing memcg's swap accountinf leak.
But, IMHO, not a bad patch even if no memcg.

There are 2 kinds of references to swap.
- reference from swap entry
- reference from swap cache

Then,

- If there is swap cache && swap's refcnt is 1, there is only swap cache.
(*) swapcount(entry) == 1 && find_get_page(swapper_space, entry) != NULL

This counting logic have worked well for a long time. But considering
that we cannot know there is a _real_ reference or not by swap_map[],
current usage of counter is not very good.

This patch adds a flag SWAP_HAS_CACHE and recored information that a swap
entry has a cache or not. This will remove -1 magic used in swapfile.c
and be a help to avoid unnecessary find_get_page().

Signed-off-by: KAMEZAWA Hiroyuki
Tested-by: Daisuke Nishimura
Cc: Balbir Singh
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Li Zefan
Cc: Dhaval Giani
Cc: YAMAMOTO Takashi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-06-17 10:47:42 +0800
cb4b86ba4 mm: add swap cache interface for swap reference ... Browse Code »

In a following patch, the usage of swap cache is recorded into swap_map.
This patch is for necessary interface changes to do that.

2 interfaces:

- swapcache_prepare()
- swapcache_free()

are added for allocating/freeing refcnt from swap-cache to existing swap
entries. But implementation itself is not changed under this patch. At
adding swapcache_free(), memcg's hook code is moved under
swapcache_free(). This is better than using scattered hooks.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Daisuke Nishimura
Acked-by: Balbir Singh
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Li Zefan
Cc: Dhaval Giani
Cc: YAMAMOTO Takashi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-06-17 10:47:42 +0800
683776596 mm: remove CONFIG_UNEVICTABLE_LRU config option ... Browse Code »

Currently, nobody wants to turn UNEVICTABLE_LRU off. Thus this
configurability is unnecessary.

Signed-off-by: KOSAKI Motohiro
Cc: Johannes Weiner
Cc: Andi Kleen
Acked-by: Minchan Kim
Cc: David Woodhouse
Cc: Matt Mackall
Cc: Rik van Riel
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-06-17 10:47:42 +0800
bce7394a3 page-allocator: reset wmark_min and inactive ratio of zone when hotplug happens ... Browse Code »

Solve two problems.

Whenever memory hotplug sucessfully happens, zone->present_pages
have to be changed.

1) Now memory hotplug calls setup_per_zone_wmark_min only when
online_pages called, not offline_pages.

It breaks balance.

2) If zone->present_pages is changed, we also have to change
zone->inactive_ratio. That's because inactive_ratio depends on
zone->present_pages.

Signed-off-by: Minchan Kim
Acked-by: Yasunori Goto
Cc: Rik van Riel
Cc: KOSAKI Motohiro
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2009-06-17 10:47:42 +0800
96cb4df5d page-allocator: add inactive ratio calculation function of each zone ... Browse Code »

Factor the per-zone arithemetic inside setup_per_zone_inactive_ratio()'s
loop into a a separate function, calculate_zone_inactive_ratio(). This
function will be used in a later patch

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Minchan Kim
Reviewed-by: KOSAKI Motohiro
Cc: Rik van Riel
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2009-06-17 10:47:42 +0800
bc75d33f0 page-allocator: clean up functions related to pages_min ... Browse Code »

Change the names of two functions. It doesn't affect behavior.

Presently, setup_per_zone_pages_min() changes low, high of zone as well as
min. So a better name is setup_per_zone_wmarks(). That's because Mel
changed zone->pages_[hig/low/min] to zone->watermark array in "page
allocator: replace the watermark-related union in struct zone with a
watermark[] array".

* setup_per_zone_pages_min => setup_per_zone_wmarks

Of course, we have to change init_per_zone_pages_min, too. There are not
pages_min any more.

* init_per_zone_pages_min => init_per_zone_wmark_min

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Minchan Kim
Acked-by: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Rik van Riel
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2009-06-17 10:47:41 +0800
b70d94ee4 page-allocator: use integer fields lookup for gfp_zone and check for errors in f… ... Browse Code »

…lags passed to the page allocator

This simplifies the code in gfp_zone() and also keeps the ability of the
compiler to use constant folding to get rid of gfp_zone processing.

The lookup of the zone is done using a bitfield stored in an integer. So
the code in gfp_zone is a simple extraction of bits from a constant
bitfield. The compiler is generating a load of a constant into a register
and then performs a shift and mask operation to get the zone from a gfp_t.
No cachelines are touched and no branches have to be predicted by the
compiler.

We are doing some macro tricks here to convince the compiler to always do
the constant folding if possible.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Mel Gorman <mel@csn.ul.ie>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Christoph Lameter
2009-06-17 10:47:41 +0800
31c911329 mm: check the argument of kunmap on architectures without highmem ... Browse Code »

If you're using a non-highmem architecture, passing an argument with the
wrong type to kunmap() doesn't give you a warning because the ifdef
doesn't check the type.

Using a static inline function solves the problem nicely.

Reported-by: David Woodhouse
Signed-off-by: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2009-06-17 10:47:41 +0800
69c854817 vmscan: prevent shrinking of active anon lru list in case of no swap space V3 ... Browse Code »

shrink_zone() can deactivate active anon pages even if we don't have a
swap device. Many embedded products don't have a swap device. So the
deactivation of anon pages is unnecessary.

This patch prevents unnecessary deactivation of anon lru pages. But, it
don't prevent aging of anon pages to swap out.

Signed-off-by: Minchan Kim
Acked-by: KOSAKI Motohiro
Cc: Johannes Weiner
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

MinChan Kim
2009-06-17 10:47:41 +0800
35282a2de migration: only migrate_prep() once per move_pages() ... Browse Code »

migrate_prep() is fairly expensive (72us on 16-core barcelona 1.9GHz).
Commit 3140a2273009c01c27d316f35ab76a37e105fdd8 improved move_pages()
throughput by breaking it into chunks, but it also made migrate_prep() be
called once per chunk (every 128pages or so) instead of once per
move_pages().

This patch reverts to calling migrate_prep() only once per chunk as we did
before 2.6.29. It is also a followup to commit
0aedadf91a70a11c4a3e7c7d99b21e5528af8d5d ("mm: move migrate_prep out from
under mmap_sem").

This improves migration throughput on the above machine from 600MB/s to
750MB/s.

Signed-off-by: Brice Goglin
Acked-by: Christoph Lameter
Cc: KOSAKI Motohiro
Cc: Heiko Carstens
Cc: Nick Piggin
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Lee Schermerhorn
Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Brice Goglin
2009-06-17 10:47:41 +0800
7f33d49a2 mm, PM/Freezer: Disable OOM killer when tasks are frozen ... Browse Code »

Currently, the following scenario appears to be possible in theory:

* Tasks are frozen for hibernation or suspend.
* Free pages are almost exhausted.
* Certain piece of code in the suspend code path attempts to allocate
some memory using GFP_KERNEL and allocation order less than or
equal to PAGE_ALLOC_COSTLY_ORDER.
* __alloc_pages_internal() cannot find a free page so it invokes the
OOM killer.
* The OOM killer attempts to kill a task, but the task is frozen, so
it doesn't die immediately.
* __alloc_pages_internal() jumps to 'restart', unsuccessfully tries
to find a free page and invokes the OOM killer.
* No progress can be made.

Although it is now hard to trigger during hibernation due to the memory
shrinking carried out by the hibernation code, it is theoretically
possible to trigger during suspend after the memory shrinking has been
removed from that code path. Moreover, since memory allocations are
going to be used for the hibernation memory shrinking, it will be even
more likely to happen during hibernation.

To prevent it from happening, introduce the oom_killer_disabled switch
that will cause __alloc_pages_internal() to fail in the situations in
which the OOM killer would have been called and make the freezer set
this switch after tasks have been successfully frozen.

[akpm@linux-foundation.org: be nicer to the namespace]
Signed-off-by: Rafael J. Wysocki
Cc: Fengguang Wu
Cc: David Rientjes
Acked-by: Pavel Machek
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2009-06-17 10:47:40 +0800
75927af8b mm: madvise(): correct return code ... Browse Code »

The posix_madvise() function succeeds (and does nothing) when called with
parameters (NULL, 0, -1); according to LSB tests, it should fail with
EINVAL because -1 is not a valid flag.

When called with a valid address and size, it correctly fails.

So perform an initial check for valid flags first.

Reported-by: Jiri Dluhos
Signed-off-by: Nick Piggin
Reviewed-and-Tested-by: WANG Cong
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2009-06-17 10:47:40 +0800
dab48dab3 page-allocator: warn if __GFP_NOFAIL is used for a large allocation ... Browse Code »

__GFP_NOFAIL is a bad fiction. Allocations _can_ fail, and callers should
detect and suitably handle this (and not by lamely moving the infinite
loop up to the caller level either).

Attempting to use __GFP_NOFAIL for a higher-order allocation is even
worse, so add a once-off runtime check for this to slap people around for
even thinking about trying it.

Cc: David Rientjes
Acked-by: Mel Gorman
Acked-by: Peter Zijlstra
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2009-06-17 10:47:40 +0800
720b17e75 videobuf-dma-contig: zero copy USERPTR support ... Browse Code »

Since videobuf-dma-contig is designed to handle physically contiguous
memory, this patch modifies the videobuf-dma-contig code to only accept a
user space pointer to physically contiguous memory. For now only
VM_PFNMAP vmas are supported, so forget hotplug.

On SuperH Mobile we use this with our sh_mobile_ceu_camera driver together
with various multimedia accelerator blocks that are exported to user space
using UIO. The UIO kernel code exports physically contiguous memory to
user space and lets the user space application mmap() this memory and pass
a pointer using the USERPTR interface for V4L2 zero copy operation.

With this approach we support zero copy capture, hardware scaling and
various forms of hardware encoding and decoding.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Magnus Damm
Cc: Johannes Weiner
Cc: Paul Mundt
Acked-by: Mauro Carvalho Chehab
Cc: Hans Verkuil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Magnus Damm
2009-06-17 10:47:40 +0800
3b6748e2d mm: introduce follow_pfn() ... Browse Code »

Analoguous to follow_phys(), add a helper that looks up the PFN at a
user virtual address in an IO mapping or a raw PFN mapping.

Signed-off-by: Johannes Weiner
Cc: Christoph Hellwig
Acked-by: Magnus Damm
Cc: Hans Verkuil
Cc: Paul Mundt
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2009-06-17 10:47:40 +0800
03668a4de mm: use generic follow_pte() in follow_phys() ... Browse Code »

Signed-off-by: Johannes Weiner
Cc: Christoph Hellwig
Acked-by: Magnus Damm
Cc: Hans Verkuil
Cc: Paul Mundt
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2009-06-17 10:47:40 +0800
f8ad0f499 mm: introduce follow_pte() ... Browse Code »

A generic readonly page table lookup helper to map an address space and an
address from it to a pte.

Signed-off-by: Johannes Weiner
Cc: Christoph Hellwig
Acked-by: Magnus Damm
Cc: Hans Verkuil
Cc: Paul Mundt
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2009-06-17 10:47:39 +0800
e9bb35df6 mm: setup_per_zone_inactive_ratio - fix comment and make it __init ... Browse Code »

The caller of setup_per_zone_inactive_ratio is an __init function. There
is no need to keep the callee after it completed as well. Also fix a
comment.

Acked-by: David Rientjes
Signed-off-by: Cyrill Gorcunov
Reviewed-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cyrill Gorcunov
2009-06-17 10:47:39 +0800
5c87eada6 mm: setup_per_zone_inactive_ratio - do not call for int_sqrt if not needed ... Browse Code »

int_sqrt() returns 0 if its argument is zero so call it if only needed.

Signed-off-by: Cyrill Gorcunov
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cyrill Gorcunov
2009-06-17 10:47:39 +0800
af166777c vmscan: ZVC updates in shrink_active_list() can be done once ... Browse Code »

This effectively lifts the unit of updates to nr_inactive_* and
pgdeactivate from PAGEVEC_SIZE=14 to SWAP_CLUSTER_MAX=32, or
MAX_ORDER_NR_PAGES=1024 for reclaim_zone().

Cc: Johannes Weiner
Acked-by: Rik van Riel
Reviewed-by: Minchan Kim
Signed-off-by: Wu Fengguang
Cc: Peter Zijlstra
Cc: Christoph Lameter
Cc: KOSAKI Motohiro
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:39 +0800
08d9ae7cb vmscan: don't export nr_saved_scan in /proc/zoneinfo ... Browse Code »

The lru->nr_saved_scan's are not meaningful counters for even kernel
developers. They typically are smaller than 32 and are always 0 for large
lists. So remove them from /proc/zoneinfo.

Hopefully this interface change won't break too many scripts.
/proc/zoneinfo is too unstructured to be script friendly, and I wonder the
affected scripts - if there are any - are still bleeding since the not
long ago commit "vmscan: split LRU lists into anon & file sets", which
also touched the "scanned" line :)

If we are to re-export accumulated vmscan counts in the future, they can
go to new lines in /proc/zoneinfo instead of the current form, or to
/sys/devices/system/node/node0/meminfo?

Signed-off-by: Wu Fengguang

Acked-by: Rik van Riel
Cc: Nick Piggin
Acked-by: Christoph Lameter
Cc: Peter Zijlstra
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:39 +0800
6e08a369e vmscan: cleanup the scan batching code ... Browse Code »

The vmscan batching logic is twisting. Move it into a standalone function
nr_scan_try_batch() and document it. No behavior change.

Signed-off-by: Wu Fengguang
Acked-by: Rik van Riel
Cc: Nick Piggin
Cc: Christoph Lameter
Acked-by: Peter Zijlstra
Acked-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:39 +0800
56e49d218 vmscan: evict use-once pages first ... Browse Code »

When the file LRU lists are dominated by streaming IO pages, evict those
pages first, before considering evicting other pages.

This should be safe from deadlocks or performance problems
because only three things can happen to an inactive file page:

1) referenced twice and promoted to the active list
2) evicted by the pageout code
3) under IO, after which it will get evicted or promoted

The pages freed in this way can either be reused for streaming IO, or
allocated for something else. If the pages are used for streaming IO,
this pageout pattern continues. Otherwise, we will fall back to the
normal pageout pattern.

Signed-off-by: Rik van Riel
Reported-by: Elladan
Cc: KOSAKI Motohiro
Cc: Peter Zijlstra
Cc: Lee Schermerhorn
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2009-06-17 10:47:38 +0800
35efa5e99 pagemap: add page-types tool ... Browse Code »

Add page-types, a handy tool for querying page flags.

It will expand some of the overloaded flags:
PG_slob_free = PG_private
PG_slub_frozen = PG_active
PG_slub_debug = PG_error
PG_readahead = PG_reclaim

and mask out obscure flags except in -raw mode:
PG_reserved
PG_mlocked
PG_mappedtodisk
PG_private
PG_private_2
PG_owner_priv_1
PG_arch_1
PG_uncached
PG_compound* for non hugeTLB pages

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Wu Fengguang
Cc: KOSAKI Motohiro
Cc: Andi Kleen
Cc: Matt Mackall
Cc: Alexey Dobriyan
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:38 +0800
17e895012 pagemap: document 9 more exported page flags ... Browse Code »

Also add short descriptions for all of the 20 exported page flags.

Signed-off-by: Wu Fengguang
Cc: KOSAKI Motohiro
Cc: Andi Kleen
Cc: Matt Mackall
Cc: Alexey Dobriyan
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:38 +0800
c9ba78e22 pagemap: document clarifications ... Browse Code »

Some bit ranges were inclusive and some not. Fix them to be consistently
inclusive.

Signed-off-by: Wu Fengguang
Cc: KOSAKI Motohiro
Cc: Andi Kleen
Cc: Matt Mackall
Cc: Alexey Dobriyan
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:38 +0800
177975495 proc: export more page flags in /proc/kpageflags ... Browse Code »

Export all page flags faithfully in /proc/kpageflags.

11. KPF_MMAP (pseudo flag) memory mapped page
12. KPF_ANON (pseudo flag) memory mapped page (anonymous)
13. KPF_SWAPCACHE page is in swap cache
14. KPF_SWAPBACKED page is swap/RAM backed
15. KPF_COMPOUND_HEAD (*)
16. KPF_COMPOUND_TAIL (*)
17. KPF_HUGE hugeTLB pages
18. KPF_UNEVICTABLE page is in the unevictable LRU list
19. KPF_HWPOISON(TBD) hardware detected corruption
20. KPF_NOPAGE (pseudo flag) no page frame at the address
32-39. more obscure flags for kernel developers

(*) For compound pages, exporting _both_ head/tail info enables
users to tell where a compound page starts/ends, and its order.

The accompanying page-types tool will handle the details like decoupling
overloaded flags and hiding obscure flags to normal users.

Thanks to KOSAKI and Andi for their valuable recommendations!

Signed-off-by: Wu Fengguang
Cc: KOSAKI Motohiro
Cc: Andi Kleen
Cc: Matt Mackall
Cc: Alexey Dobriyan
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:38 +0800
ed7ce0f10 proc: kpagecount/kpageflags code cleanup ... Browse Code »

Move increments of pfn/out to bottom of the loop.

Signed-off-by: Wu Fengguang
Cc: KOSAKI Motohiro
Cc: Andi Kleen
Acked-by: Matt Mackall
Cc: Alexey Dobriyan
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:36 +0800
20a0307c0 mm: introduce PageHuge() for testing huge/gigantic pages ... Browse Code »

A series of patches to enhance the /proc/pagemap interface and to add a
userspace executable which can be used to present the pagemap data.

Export 10 more flags to end users (and more for kernel developers):

11. KPF_MMAP (pseudo flag) memory mapped page
12. KPF_ANON (pseudo flag) memory mapped page (anonymous)
13. KPF_SWAPCACHE page is in swap cache
14. KPF_SWAPBACKED page is swap/RAM backed
15. KPF_COMPOUND_HEAD (*)
16. KPF_COMPOUND_TAIL (*)
17. KPF_HUGE hugeTLB pages
18. KPF_UNEVICTABLE page is in the unevictable LRU list
19. KPF_HWPOISON hardware detected corruption
20. KPF_NOPAGE (pseudo flag) no page frame at the address

(*) For compound pages, exporting _both_ head/tail info enables
users to tell where a compound page starts/ends, and its order.

a simple demo of the page-types tool

# ./page-types -h
page-types [options]
-r|--raw Raw mode, for kernel developers
-a|--addr addr-spec Walk a range of pages
-b|--bits bits-spec Walk pages with specified bits
-l|--list Show page details in ranges
-L|--list-each Show page details one by one
-N|--no-summary Don't show summay info
-h|--help Show this usage message
addr-spec:
N one page at offset N (unit: pages)
N+M pages range from N to N+M-1
N,M pages range from N to M-1
N, pages range from N to end
,M pages range from 0 to M
bits-spec:
bit1,bit2 (flags & (bit1|bit2)) != 0
bit1,bit2=bit1 (flags & (bit1|bit2)) == bit1
bit1,~bit2 (flags & (bit1|bit2)) == bit1
=bit1,bit2 flags == (bit1|bit2)
bit-names:
locked error referenced uptodate
dirty lru active slab
writeback reclaim buddy mmap
anonymous swapcache swapbacked compound_head
compound_tail huge unevictable hwpoison
nopage reserved(r) mlocked(r) mappedtodisk(r)
private(r) private_2(r) owner_private(r) arch(r)
uncached(r) readahead(o) slob_free(o) slub_frozen(o)
slub_debug(o)
(r) raw mode bits (o) overloaded bits

# ./page-types
flags
0x0000000000000000
0x0000000000000014
0x0000000000000020
0x0000000000000024
0x0000000000000028
0x0001000000000028
0x000000000000002c
0x000100000000002c
0x0000000000000040
0x0000000000000060
0x0000000000000068
0x0001000000000068
0x000000000000006c
0x000100000000006c
0x0000000000004078
0x000000000000407c
0x0000000000000400
0x0000000000000804
0x0000000000000828
0x0001000000000828
0x000000000000082c
0x000100000000082c
0x0000000000000868
0x0001000000000868
0x000000000000086c
0x000100000000086c
0x0000000000004878
0x0000000000001000
0x0000000000005808
0x0000000000005868
0x000000000000586c
total page-count MB symbolic-flags long-symbolic-flags 487369 1903 _________________________________ 5 0 __R_D____________________________ referenced,dirty 1 0 _____l___________________________ lru 34 0 __R__l___________________________ referenced,lru 3838 14 ___U_l___________________________ uptodate,lru 48 0 ___U_l_______________________I___ uptodate,lru,readahead 6478 25 __RU_l___________________________ referenced,uptodate,lru 47 0 __RU_l_______________________I___ referenced,uptodate,lru,readahead 8344 32 ______A__________________________ active 1 0 _____lA__________________________ lru,active 348 1 ___U_lA__________________________ uptodate,lru,active 12 0 ___U_lA______________________I___ uptodate,lru,active,readahead 988 3 __RU_lA__________________________ referenced,uptodate,lru,active 48 0 __RU_lA______________________I___ referenced,uptodate,lru,active,readahead 1 0 ___UDlA_______b__________________ uptodate,dirty,lru,active,swapbacked 34 0 __RUDlA_______b__________________ referenced,uptodate,dirty,lru,active,swapbacked 503 1 __________B______________________ buddy 1 0 __R________M_____________________ referenced,mmap 1029 4 ___U_l_____M_____________________ uptodate,lru,mmap 43 0 ___U_l_____M_________________I___ uptodate,lru,mmap,readahead 382 1 __RU_l_____M_____________________ referenced,uptodate,lru,mmap 12 0 __RU_l_____M_________________I___ referenced,uptodate,lru,mmap,readahead 192 0 ___U_lA____M_____________________ uptodate,lru,active,mmap 12 0 ___U_lA____M_________________I___ uptodate,lru,active,mmap,readahead 800 3 __RU_lA____M_____________________ referenced,uptodate,lru,active,mmap 31 0 __RU_lA____M_________________I___ referenced,uptodate,lru,active,mmap,readahead 2 0 ___UDlA____M__b__________________ uptodate,dirty,lru,active,mmap,swapbacked 492 1 ____________a____________________ anonymous 4 0 ___U_______Ma_b__________________ uptodate,mmap,anonymous,swapbacked 2839 11 ___U_lA____Ma_b__________________ uptodate,lru,active,mmap,anonymous,swapbacked 30 0 __RU_lA____Ma_b__________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked 513968 2007

# ./page-types -r
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000000000 468002 1828 _________________________________
0x0000000100000000 19102 74 _____________________r___________ reserved
0x0000000000008000 41 0 _______________H_________________ compound_head
0x0000000000010000 188 0 ________________T________________ compound_tail
0x0000000000008014 1 0 __R_D__________H_________________ referenced,dirty,compound_head
0x0000000000010014 4 0 __R_D___________T________________ referenced,dirty,compound_tail
0x0000000000000020 1 0 _____l___________________________ lru
0x0000000800000024 34 0 __R__l__________________P________ referenced,lru,private
0x0000000000000028 3794 14 ___U_l___________________________ uptodate,lru
0x0001000000000028 46 0 ___U_l_______________________I___ uptodate,lru,readahead
0x0000000400000028 44 0 ___U_l_________________d_________ uptodate,lru,mappedtodisk
0x0001000400000028 2 0 ___U_l_________________d_____I___ uptodate,lru,mappedtodisk,readahead
0x000000000000002c 6434 25 __RU_l___________________________ referenced,uptodate,lru
0x000100000000002c 47 0 __RU_l_______________________I___ referenced,uptodate,lru,readahead
0x000000040000002c 14 0 __RU_l_________________d_________ referenced,uptodate,lru,mappedtodisk
0x000000080000002c 30 0 __RU_l__________________P________ referenced,uptodate,lru,private
0x0000000800000040 8124 31 ______A_________________P________ active,private
0x0000000000000040 219 0 ______A__________________________ active
0x0000000800000060 1 0 _____lA_________________P________ lru,active,private
0x0000000000000068 322 1 ___U_lA__________________________ uptodate,lru,active
0x0001000000000068 12 0 ___U_lA______________________I___ uptodate,lru,active,readahead
0x0000000400000068 13 0 ___U_lA________________d_________ uptodate,lru,active,mappedtodisk
0x0000000800000068 12 0 ___U_lA_________________P________ uptodate,lru,active,private
0x000000000000006c 977 3 __RU_lA__________________________ referenced,uptodate,lru,active
0x000100000000006c 48 0 __RU_lA______________________I___ referenced,uptodate,lru,active,readahead
0x000000040000006c 5 0 __RU_lA________________d_________ referenced,uptodate,lru,active,mappedtodisk
0x000000080000006c 3 0 __RU_lA_________________P________ referenced,uptodate,lru,active,private
0x0000000c0000006c 3 0 __RU_lA________________dP________ referenced,uptodate,lru,active,mappedtodisk,private
0x0000000c00000068 1 0 ___U_lA________________dP________ uptodate,lru,active,mappedtodisk,private
0x0000000000004078 1 0 ___UDlA_______b__________________ uptodate,dirty,lru,active,swapbacked
0x000000000000407c 34 0 __RUDlA_______b__________________ referenced,uptodate,dirty,lru,active,swapbacked
0x0000000000000400 538 2 __________B______________________ buddy
0x0000000000000804 1 0 __R________M_____________________ referenced,mmap
0x0000000000000828 1029 4 ___U_l_____M_____________________ uptodate,lru,mmap
0x0001000000000828 43 0 ___U_l_____M_________________I___ uptodate,lru,mmap,readahead
0x000000000000082c 382 1 __RU_l_____M_____________________ referenced,uptodate,lru,mmap
0x000100000000082c 12 0 __RU_l_____M_________________I___ referenced,uptodate,lru,mmap,readahead
0x0000000000000868 192 0 ___U_lA____M_____________________ uptodate,lru,active,mmap
0x0001000000000868 12 0 ___U_lA____M_________________I___ uptodate,lru,active,mmap,readahead
0x000000000000086c 800 3 __RU_lA____M_____________________ referenced,uptodate,lru,active,mmap
0x000100000000086c 31 0 __RU_lA____M_________________I___ referenced,uptodate,lru,active,mmap,readahead
0x0000000000004878 2 0 ___UDlA____M__b__________________ uptodate,dirty,lru,active,mmap,swapbacked
0x0000000000001000 492 1 ____________a____________________ anonymous
0x0000000000005008 2 0 ___U________a_b__________________ uptodate,anonymous,swapbacked
0x0000000000005808 4 0 ___U_______Ma_b__________________ uptodate,mmap,anonymous,swapbacked
0x000000000000580c 1 0 __RU_______Ma_b__________________ referenced,uptodate,mmap,anonymous,swapbacked
0x0000000000005868 2839 11 ___U_lA____Ma_b__________________ uptodate,lru,active,mmap,anonymous,swapbacked
0x000000000000586c 29 0 __RU_lA____Ma_b__________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
total 513968 2007

# ./page-types --raw --list --no-summary --bits reserved
offset count flags
0 15 _____________________r___________
31 4 _____________________r___________
159 97 _____________________r___________
4096 2067 _____________________r___________
6752 2390 _____________________r___________
9355 3 _____________________r___________
9728 14526 _____________________r___________

This patch:

Introduce PageHuge(), which identifies huge/gigantic pages by their
dedicated compound destructor functions.

Also move prep_compound_gigantic_page() to hugetlb.c and make
__free_pages_ok() non-static.

Signed-off-by: Wu Fengguang
Cc: KOSAKI Motohiro
Cc: Andi Kleen
Cc: Matt Mackall
Cc: Alexey Dobriyan
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:36 +0800
a1dd268cf mm: use alloc_pages_exact() in alloc_large_system_hash() to avoid duplicated logic ... Browse Code »

alloc_large_system_hash() has logic for freeing pages at the end of an
excessively large power-of-two buffer that is a duplicate of what is in
alloc_pages_exact(). This patch converts alloc_large_system_hash() to use
alloc_pages_exact().

Signed-off-by: Mel Gorman
Acked-by: Hugh Dickins
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2009-06-17 10:47:36 +0800
72807a74c page allocator: sanity check order in the page allocator slow path ... Browse Code »

Callers may speculatively call different allocators in order of preference
trying to allocate a buffer of a given size. The order needed to allocate
this may be larger than what the page allocator can normally handle.
While the allocator mostly does the right thing, it should not direct
reclaim or wakeup kswapd with a bogus order. This patch sanity checks the
order in the slow path and returns NULL if it is too large.

Signed-off-by: Mel Gorman
Signed-off-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2009-06-17 10:47:36 +0800
092cead61 page allocator: move free_page_mlock() to page_alloc.c ... Browse Code »

Currently, free_page_mlock() is only called from page_alloc.c. Thus, we
can move it to page_alloc.c.

Cc: Lee Schermerhorn
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Dave Hansen
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-06-17 10:47:35 +0800