Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

30 May, 2012

40 commits

692569946 mm: document the meminfo and vmstat fields of relevance to transparent hugepages ... Browse Code »

Update Documentation/vm/transhuge.txt and
Documentation/filesystems/proc.txt with some information on monitoring
transparent huge page usage and the associated overhead.

Signed-off-by: Mel Gorman
Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-05-30 07:22:23 +0800
51300cef4 mm/page_alloc.c: cleanups ... Browse Code »

- make pageflag_names[] const

- remove null termination of pageflag_names[]

Cc: Johannes Weiner
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-05-30 07:22:23 +0800
acc50c110 mm: page_alloc: catch out-of-date list of page flag names ... Browse Code »

String tables with names of enum items are always prone to go out of
sync with the enums themselves. Ensure during compile time that the
name table of page flags has the same size as the page flags enum.

Signed-off-by: Johannes Weiner
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:23 +0800
be9cd873e mm/buddy: dump PG_compound_lock page flag ... Browse Code »

The array pageflag_names[] does conversion from page flags into their
corresponding names so that a meaningful representation of the
corresponding page flag can be printed. This mechanism is used while
dumping page frames. However, the array missed PG_compound_lock. So
the PG_compound_lock page flag would be printed as a digital number
instead of a meaningful string.

The patch fixes that and prints "compound_lock" for the PG_compound_lock
page flag.

Signed-off-by: Gavin Shan
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gavin Shan
2012-05-30 07:22:23 +0800
782182e53 mm: move readahead syscall to mm/readahead.c ... Browse Code »

It is better to define readahead(2) in mm/readahead.c than in
mm/filemap.c.

Signed-off-by: Cong Wang
Cc: Fengguang Wu
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cong Wang
2012-05-30 07:22:23 +0800
4fb5ef089 tmpfs: support SEEK_DATA and SEEK_HOLE ... Browse Code »

It's quite easy for tmpfs to scan the radix_tree to support llseek's new
SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are still
on my mind (in particular, the !PageUptodate-ness of pages fallocated but
still unwritten).

But I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
would be of any use to them on tmpfs. This code adds 92 lines and 752
bytes on x86_64 - is that bloat or worthwhile?

[akpm@linux-foundation.org: fix warning with CONFIG_TMPFS=n]
Signed-off-by: Hugh Dickins
Cc: Christoph Hellwig
Cc: Josef Bacik
Cc: Andi Kleen
Cc: Andreas Dilger
Cc: Dave Chinner
Cc: Marco Stornelli
Cc: Jeff liu
Cc: Chris Mason
Cc: Sunil Mushran
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-05-30 07:22:23 +0800
1aac14003 tmpfs: quit when fallocate fills memory ... Browse Code »

As it stands, a large fallocate() on tmpfs is liable to fill memory with
pages, freed on failure except when they run into swap, at which point
they become fixed into the file despite the failure. That feels quite
wrong, to be consuming resources precisely when they're in short supply.

Go the other way instead: shmem_fallocate() indicate the range it has
fallocated to shmem_writepage(), keeping count of pages it's allocating;
shmem_writepage() reactivate instead of swapping out pages fallocated by
this syscall (but happily swap out those from earlier occasions), keeping
count; shmem_fallocate() compare counts and give up once the reactivated
pages have started to coming back to writepage (approximately: some zones
would in fact recycle faster than others).

This is a little unusual, but works well: although we could consider the
failure to swap as a bug, and fix it later with SWAP_MAP_FALLOC handling
added in swapfile.c and memcontrol.c, I doubt that we shall ever want to.

(If there's no swap, an over-large fallocate() on tmpfs is limited in the
same way as writing: stopped by rlimit, or by tmpfs mount size if that was
set sensibly, or by __vm_enough_memory() heuristics if OVERCOMMIT_GUESS or
OVERCOMMIT_NEVER. If OVERCOMMIT_ALWAYS, then it is liable to OOM-kill
others as writing would, but stops and frees if interrupted.)

Now that everything is freed on failure, we can then skip updating ctime.

Signed-off-by: Hugh Dickins
Cc: Christoph Hellwig
Cc: Cong Wang
Cc: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-05-30 07:22:23 +0800
1635f6a74 tmpfs: undo fallocation on failure ... Browse Code »
2

In the previous episode, we left the already-fallocated pages attached to
the file when shmem_fallocate() fails part way through.

Now try to do better, by extending the earlier optimization of !Uptodate
pages (then always under page lock) to !Uptodate pages (outside of page
lock), representing fallocated pages. And don't waste time clearing them
at the time of fallocate(), leave that until later if necessary.

Adapt shmem_truncate_range() to shmem_undo_range(), so that a failing
fallocate can recognize and remove precisely those !Uptodate allocations
which it added (and were not independently allocated by racing tasks).

But unless we start playing with swapfile.c and memcontrol.c too, once one
of our fallocated pages reaches shmem_writepage(), we do then have to
instantiate it as an ordinarily allocated page, before swapping out. This
is unsatisfactory, but improved in the next episode.

Signed-off-by: Hugh Dickins
Cc: Christoph Hellwig
Cc: Cong Wang
Cc: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-05-30 07:22:23 +0800
e2d12e22c tmpfs: support fallocate preallocation ... Browse Code »

The systemd plumbers expressed a wish that tmpfs support preallocation.
Cong Wang wrote a patch, but several kernel guys expressed scepticism:
https://lkml.org/lkml/2011/11/18/137

Christoph Hellwig: What for exactly? Please explain why preallocating on
tmpfs would make any sense.

Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files
on the /dev/shm filesystem. The glibc fallback loop for -ENOSYS [or
-EOPNOTSUPP] on fallocate is just ugly.

Hugh Dickins: If tmpfs is going to support
fallocate(FALLOC_FL_PUNCH_HOLE), it would seem perverse to permit the
deallocation but fail the allocation. Christoph Hellwig: Agreed.

Now that we do have shmem_fallocate() for hole-punching, plumb in basic
support for preallocation mode too. It's fairly straightforward (though
quite a few details needed attention), except for when it fails part way
through. What a pity that fallocate(2) was not specified to return the
length allocated, permitting short fallocations!

As it is, when it fails part way through, we ought to free what has just
been allocated by this system call; but must be very sure not to free any
allocated earlier, or any allocated by racing accesses (not all excluded
by i_mutex).

But we cannot distinguish them: so in this patch simply leak allocations
on partial failure (they will be freed later if the file is removed).

An attractive alternative approach would have been for fallocate() not to
allocate pages at all, but note reservations by entries in the radix-tree.
But that would give less assurance, and, critically, would be hard to fit
with mem cgroups (who owns the reservations?): allocating pages lets
fallocate() behave in just the same way as write().

Based-on-patch-by: Cong Wang
Signed-off-by: Hugh Dickins
Cc: Christoph Hellwig
Cc: Cong Wang
Cc: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-05-30 07:22:23 +0800
17cf28afe mm/fs: remove truncate_range ... Browse Code »

Remove vmtruncate_range(), and remove the truncate_range method from
struct inode_operations: only tmpfs ever supported it, and tmpfs has now
converted over to using the fallocate method of file_operations.

Update Documentation accordingly, adding (setlease and) fallocate lines.
And while we're in mm.h, remove duplicate declarations of shmem_lock() and
shmem_file_setup(): everyone is now using the ones in shmem_fs.h.

Based-on-patch-by: Cong Wang
Signed-off-by: Hugh Dickins
Cc: Christoph Hellwig
Cc: Cong Wang
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-05-30 07:22:23 +0800
3f31d0757 mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE ... Browse Code »

Now tmpfs supports hole-punching via fallocate(), switch madvise_remove()
to use do_fallocate() instead of vmtruncate_range(): which extends
madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs.

There is one more user of vmtruncate_range() in our tree,
staging/android's ashmem_shrink(): convert it to use do_fallocate() too
(but if its unpinned areas are already unmapped - I don't know - then it
would do better to use shmem_truncate_range() directly).

Based-on-patch-by: Cong Wang
Signed-off-by: Hugh Dickins
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Colin Cross
Cc: John Stultz
Cc: Greg Kroah-Hartman
Cc: "Theodore Ts'o"
Cc: Andreas Dilger
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Dave Chinner
Cc: Ben Myers
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-05-30 07:22:22 +0800
83e4fa9c1 tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE ... Browse Code »

tmpfs has supported hole-punching since 2.6.16, via
madvise(,,MADV_REMOVE).

But nowadays fallocate(,FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE,,) is
the agreed way to punch holes.

So add shmem_fallocate() to support that, and tweak shmem_truncate_range()
to support partial pages at both the beginning and end of range (never
needed for madvise, which demands rounded addr and rounds up length).

Based-on-patch-by: Cong Wang
Signed-off-by: Hugh Dickins
Cc: Christoph Hellwig
Cc: Cong Wang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-05-30 07:22:22 +0800
ec9516fbc tmpfs: optimize clearing when writing ... Browse Code »

Nick proposed years ago that tmpfs should avoid clearing its pages where
write will overwrite them with new data, as ramfs has long done. But I
messed it up and just got bad data. Tried again recently, it works
fine.

Here's time output for writing 4GiB 16 times on this Core i5 laptop:

before: real 0m21.169s user 0m0.028s sys 0m21.057s
real 0m21.382s user 0m0.016s sys 0m21.289s
real 0m21.311s user 0m0.020s sys 0m21.217s

after: real 0m18.273s user 0m0.032s sys 0m18.165s
real 0m18.354s user 0m0.020s sys 0m18.265s
real 0m18.440s user 0m0.032s sys 0m18.337s

ramfs: real 0m16.860s user 0m0.028s sys 0m16.765s
real 0m17.382s user 0m0.040s sys 0m17.273s
real 0m17.133s user 0m0.044s sys 0m17.021s

Yes, I have done perf reports, but they need more explanation than they
deserve: in summary, clear_page vanishes, its cache loading shifts into
copy_user_generic_unrolled; shmem_getpage_gfp goes down, and
surprisingly mark_page_accessed goes way up - I think because they are
respectively where the cache gets to be reloaded after being purged by
clear or copy.

Suggested-by: Nick Piggin
Signed-off-by: Hugh Dickins
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-05-30 07:22:22 +0800
2f6e38f3c tmpfs: enable NOSEC optimization ... Browse Code »

Let tmpfs into the NOSEC optimization (avoiding file_remove_suid()
overhead on most common writes): set MS_NOSEC on its superblocks.

Signed-off-by: Hugh Dickins
Cc: Christoph Hellwig
Cc: Andi Kleen
Cc: Al Viro
Cc: Cong Wang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-05-30 07:22:22 +0800
bde05d1cc shmem: replace page if mapping excludes its zone ... Browse Code »
13

The GMA500 GPU driver uses GEM shmem objects, but with a new twist: the
backing RAM has to be below 4GB. Not a problem while the boards
supported only 4GB: but now Intel's D2700MUD boards support 8GB, and
their GMA3600 is managed by the GMA500 driver.

shmem/tmpfs has never pretended to support hardware restrictions on the
backing memory, but it might have appeared to do so before v3.1, and
even now it works fine until a page is swapped out then back in. When
read_cache_page_gfp() supplied a freshly allocated page for copy, that
compensated for whatever choice might have been made by earlier swapin
readahead; but swapoff was likely to destroy the illusion.

We'd like to continue to support GMA500, so now add a new
shmem_should_replace_page() check on the zone when about to move a page
from swapcache to filecache (in swapin and swapoff cases), with
shmem_replace_page() to allocate and substitute a suitable page (given
gma500/gem.c's mapping_set_gfp_mask GFP_KERNEL | __GFP_DMA32).

This does involve a minor extension to mem_cgroup_replace_page_cache()
(the page may or may not have already been charged); and I've removed a
comment and call to mem_cgroup_uncharge_cache_page(), which in fact is
always a no-op while PageSwapCache.

Also removed optimization of an unlikely path in shmem_getpage_gfp(),
now that we need to check PageSwapCache more carefully (a racing caller
might already have made the copy). And at one point shmem_unuse_inode()
needs to use the hitherto private page_swapcount(), to guard against
racing with inode eviction.

It would make sense to extend shmem_should_replace_page(), to cover
cpuset and NUMA mempolicy restrictions too, but set that aside for now:
needs a cleanup of shmem mempolicy handling, and more testing, and ought
to handle swap faults in do_swap_page() as well as shmem.

Signed-off-by: Hugh Dickins
Cc: Christoph Hellwig
Acked-by: KAMEZAWA Hiroyuki
Cc: Alan Cox
Cc: Stephane Marchesin
Cc: Andi Kleen
Cc: Dave Airlie
Cc: Daniel Vetter
Cc: Rob Clark
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-05-30 07:22:22 +0800
5ceb9ce6f mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks ... Browse Code »

When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE type
pageblock (and some MIGRATE_MOVABLE pages are left in it) waiting until an
allocation takes ownership of the block may take too long. The type of
the pageblock remains unchanged so the pageblock cannot be used as a
migration target during compaction.

Fix it by:

* Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE], and
COMPACT_SYNC) and then converting sync field in struct compact_control
to use it.

* Adding nr_pageblocks_skipped field to struct compact_control and
tracking how many destination pageblocks were of MIGRATE_UNMOVABLE type.
If COMPACT_ASYNC_MOVABLE mode compaction ran fully in
try_to_compact_pages() (COMPACT_COMPLETE) it implies that there is not a
suitable page for allocation. In this case then check how if there were
enough MIGRATE_UNMOVABLE pageblocks to try a second pass in
COMPACT_ASYNC_UNMOVABLE mode.

* Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC and
COMPACT_ASYNC_UNMOVABLE compaction modes) and building a count based on
finding PageBuddy pages, page_count(page) == 0 or PageLRU pages. If all
pages within the MIGRATE_UNMOVABLE pageblock are in one of those three
sets change the whole pageblock type to MIGRATE_MOVABLE.

My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means
131072 standard 4KiB pages in 'Normal' zone) is to:

- allocate 120000 pages for kernel's usage
- free every second page (60000 pages) of memory just allocated
- allocate and use 60000 pages from user space
- free remaining 60000 pages of kernel memory
(now we have fragmented memory occupied mostly by user space pages)
- try to allocate 100 order-9 (2048 KiB) pages for kernel's usage

The results:
- with compaction disabled I get 11 successful allocations
- with compaction enabled - 14 successful allocations
- with this patch I'm able to get all 100 successful allocations

NOTE: If we can make kswapd aware of order-0 request during compaction, we
can enhance kswapd with changing mode to COMPACT_ASYNC_FULL
(COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE). Please see the
following thread:

http://marc.info/?l=linux-mm&m=133552069417068&w=2

[minchan@kernel.org: minor cleanups]
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Marek Szyprowski
Signed-off-by: Bartlomiej Zolnierkiewicz
Signed-off-by: Kyungmin Park
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bartlomiej Zolnierkiewicz
2012-05-30 07:22:22 +0800
238305bb4 mm: remove sparsemem allocation details from the bootmem allocator ... Browse Code »

alloc_bootmem_section() derives allocation area constraints from the
specified sparsemem section. This is a bit specific for a generic memory
allocator like bootmem, though, so move it over to sparsemem.

As __alloc_bootmem_node_nopanic() already retries failed allocations with
relaxed area constraints, the fallback code in sparsemem.c can be removed
and the code becomes a bit more compact overall.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:22 +0800
e9079911e mm: bootmem: pass pgdat instead of pgdat->bdata down the stack ... Browse Code »

Pass down the node descriptor instead of the more specific bootmem node
descriptor down the call stack, like nobootmem does, when there is no good
reason for the two to be different.

Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:22 +0800
ba5398683 mm: nobootmem: unify allocation policy of (non-)panicking node allocations ... Browse Code »

While the panicking node-specific allocation function tries to satisfy
node+goal, goal, node, anywhere, the non-panicking function still does
node+goal, goal, anywhere.

Make it simpler: define the panicking version in terms of the non-panicking
one, like the node-agnostic interface, so they always behave the same way
apart from how to deal with allocation failure.

Signed-off-by: Johannes Weiner
Acked-by: Yinghai Lu
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:22 +0800
2c478eae9 mm: nobootmem: panic on node-specific allocation failure ... Browse Code »

__alloc_bootmem_node and __alloc_bootmem_low_node documentation claims
the functions panic on allocation failure. Do it.

Signed-off-by: Johannes Weiner
Acked-by: Yinghai Lu
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:22 +0800
421456edd mm: bootmem: unify allocation policy of (non-)panicking node allocations ... Browse Code »

While the panicking node-specific allocation function tries to satisfy
node+goal, goal, node, anywhere, the non-panicking function still does
node+goal, goal, anywhere.

Make it simpler: define the panicking version in terms of the
non-panicking one, like the node-agnostic interface, so they always behave
the same way apart from how to deal with allocation failure.

Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:22 +0800
ab3818432 mm: bootmem: allocate in order node+goal, goal, node, anywhere ... Browse Code »

Match the nobootmem version of __alloc_bootmem_node. Try to satisfy both
the node and the goal, then just the goal, then just the node, then
allocate anywhere before panicking.

Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:21 +0800
c12ab504a mm: bootmem: split out goal-to-node mapping from goal dropping ... Browse Code »

Matching the desired goal to the right node is one thing, dropping the
goal when it can not be satisfied is another. Split this into separate
functions so that subsequent patches can use the node-finding but drop and
handle the goal fallback on their own terms.

Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:21 +0800
c6785b6bf mm: bootmem: rename alloc_bootmem_core to alloc_bootmem_bdata ... Browse Code »

Callsites need to provide a bootmem_data_t *, make the naming more
descriptive.

Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:21 +0800
549381e19 mm: bootmem: remove redundant offset check when finally freeing bootmem ... Browse Code »

When bootmem releases an unaligned BITS_PER_LONG pages chunk of memory
to the page allocator, it checks the bitmap if there are still
unreserved pages in the chunk (set bits), but also if the offset in the
chunk indicates BITS_PER_LONG loop iterations already.

But since the consulted bitmap is only a one-word-excerpt of the full
per-node bitmap, there can not be more than BITS_PER_LONG bits set in
it. The additional offset check is unnecessary.

Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:21 +0800
6dccdcbe2 mm: bootmem: fix checking the bitmap when finally freeing bootmem ... Browse Code »

When bootmem releases an unaligned chunk of memory at the beginning of a
node to the page allocator, it iterates from that unaligned PFN but
checks an aligned word of the page bitmap. The checked bits do not
correspond to the PFNs and, as a result, reserved pages can be freed.

Properly shift the bitmap word so that the lowest bit corresponds to the
starting PFN before entering the freeing loop.

This bug has been around since commit 41546c17418f ("bootmem: clean up
free_all_bootmem_core") (2.6.27) without known reports.

Signed-off-by: Gavin Shan
Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gavin Shan
2012-05-30 07:22:21 +0800
955c1cd74 mm/page_alloc.c: remove pageblock_default_order() ... Browse Code »

This has always been broken: one version takes an unsigned int and the
other version takes no arguments. This bug was hidden because one
version of set_pageblock_order() was a macro which doesn't evaluate its
argument.

Simplify it all and remove pageblock_default_order() altogether.

Reported-by: rajman mekaco
Cc: Mel Gorman
Cc: KAMEZAWA Hiroyuki
Cc: Tejun Heo
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-05-30 07:22:21 +0800
209959740 mm: move is_vma_temporary_stack() declaration to huge_mm.h ... Browse Code »

When transparent_hugepage_enabled() is used outside mm/, such as in
arch/x86/xx/tlb.c:

+ if (!cpu_has_invlpg || vma->vm_flags & VM_HUGETLB
+ || transparent_hugepage_enabled(vma)) {
+ flush_tlb_mm(vma->vm_mm);

is_vma_temporary_stack() isn't referenced in huge_mm.h, so it has compile
errors:

arch/x86/mm/tlb.c: In function `flush_tlb_range':
arch/x86/mm/tlb.c:324:4: error: implicit declaration of function `is_vma_temporary_stack' [-Werror=implicit-function-declaration]

Since is_vma_temporay_stack() is just used in rmap.c and huge_memory.c, it
is better to move it to huge_mm.h from rmap.h to avoid such errors.

Signed-off-by: Alex Shi
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alex Shi
2012-05-30 07:22:21 +0800
e30d539b3 tools/vm/page-types.c: cleanups ... Browse Code »

Compiling page-type.c with a recent compiler produces many warnings,
mostly related to signed/unsigned comparisons. This patch cleans up most
of them.

One remaining warning is about an unused parameter. The file
doesn't define a __unused macro (or the like) yet. This can be addressed
later.

Signed-off-by: Ulrich Drepper
Acked-by: KOSAKI Motohiro
Acked-by: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2012-05-30 07:22:21 +0800
9295b7a07 kbuild: install kernel-page-flags.h ... Browse Code »

Programs using /proc/kpageflags need to know about the various flags. The
provides them and the comments in the file
indicate that it is supposed to be used by user-level code. But the file
is not installed.

Install the headers and mark the unstable flags as out-of-bounds. The
page-type tool is also adjusted to not duplicate the definitions

Signed-off-by: Ulrich Drepper
Acked-by: KOSAKI Motohiro
Acked-by: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2012-05-30 07:22:21 +0800
a62e2f4f5 mm: print physical addresses consistently with other parts of kernel ... Browse Code »

Print physical address info in a style consistent with the %pR style used
elsewhere in the kernel. For example:

-Zone PFN ranges:
+Zone ranges:
- DMA32 0x00000010 -> 0x00100000
+ DMA32 [mem 0x00010000-0xffffffff]
- Normal 0x00100000 -> 0x01080000
+ Normal [mem 0x100000000-0x107fffffff]

Signed-off-by: Bjorn Helgaas
Cc: Yinghai Lu
Cc: Konrad Rzeszutek Wilk
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bjorn Helgaas
2012-05-30 07:22:21 +0800
3af684c7c swiotlb: print physical addresses consistently with other parts of kernel ... Browse Code »

Print swiotlb info in a style consistent with the %pR style used elsewhere
in the kernel. For example:

-Placing 64MB software IO TLB between ffff88007a662000 - ffff88007e662000
-software IO TLB at phys 0x7a662000 - 0x7e662000
+software IO TLB [mem 0x7a662000-0x7e661fff] (64MB) mapped at [ffff88007a662000-ffff88007e661fff]

Signed-off-by: Bjorn Helgaas
Cc: Yinghai Lu
Cc: Konrad Rzeszutek Wilk
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bjorn Helgaas
2012-05-30 07:22:21 +0800
365811d6f x86: print physical addresses consistently with other parts of kernel ... Browse Code »

Print physical address info in a style consistent with the %pR style used
elsewhere in the kernel. For example:

-found SMP MP-table at [ffff8800000fce90] fce90
+found SMP MP-table at [mem 0x000fce90-0x000fce9f] mapped at [ffff8800000fce90]
-initial memory mapped : 0 - 20000000
+initial memory mapped: [mem 0x00000000-0x1fffffff]
-Base memory trampoline at [ffff88000009c000] 9c000 size 8192
+Base memory trampoline [mem 0x0009c000-0x0009dfff] mapped at [ffff88000009c000]
-SRAT: Node 0 PXM 0 0-80000000
+SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]

Signed-off-by: Bjorn Helgaas
Cc: Yinghai Lu
Cc: Konrad Rzeszutek Wilk
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bjorn Helgaas
2012-05-30 07:22:21 +0800
91eb0f67c x86: print e820 physical addresses consistently with other parts of kernel ... Browse Code »

Print physical address info in a style consistent with the %pR style used
elsewhere in the kernel. For example:

-BIOS-provided physical RAM map:
+e820: BIOS-provided physical RAM map:
- BIOS-e820: 0000000000000100 - 000000000009e000 (usable)
+BIOS-e820: [mem 0x0000000000000100-0x000000000009dfff] usable
-Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000)
+e820: [mem 0x90000000-0xfed1bfff] available for PCI devices
-reserve RAM buffer: 000000000009e000 - 000000000009ffff
+e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff]

Signed-off-by: Bjorn Helgaas
Cc: Yinghai Lu
Cc: Konrad Rzeszutek Wilk
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bjorn Helgaas
2012-05-30 07:22:20 +0800
02602a18c bug: completely remove code generated by disabled VM_BUG_ON() ... Browse Code »
5

Even if CONFIG_DEBUG_VM=n gcc genereates code for some VM_BUG_ON()

for example VM_BUG_ON(!PageCompound(page) || !PageHead(page)); in
do_huge_pmd_wp_page() generates 114 bytes of code.

But they mostly disappears when I split this VM_BUG_ON into two:

-VM_BUG_ON(!PageCompound(page) || !PageHead(page));
+VM_BUG_ON(!PageCompound(page));
+VM_BUG_ON(!PageHead(page));

weird... but anyway after this patch code disappears completely.

add/remove: 0/0 grow/shrink: 7/97 up/down: 135/-1784 (-1649)

Signed-off-by: Konstantin Khlebnikov
Cc: Linus Torvalds
Cc: Geert Uytterhoeven
Cc: "H. Peter Anvin"
Cc: Cong Wang
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2012-05-30 07:22:20 +0800
baf05aa92 bug: introduce BUILD_BUG_ON_INVALID() macro ... Browse Code »

Sometimes we want to check some expressions correctness at compile time.
"(void)(e);" or "if (e);" can be dangerous if the expression has
side-effects, and gcc sometimes generates a lot of code, even if the
expression has no effect.

This patch introduces macro BUILD_BUG_ON_INVALID() for such checks, it
forces a compilation error if expression is invalid without any extra
code.

[Cast to "long" required because sizeof does not work for bit-fields.]

Signed-off-by: Konstantin Khlebnikov
Cc: Linus Torvalds
Cc: Geert Uytterhoeven
Cc: "H. Peter Anvin"
Cc: Cong Wang
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2012-05-30 07:22:20 +0800
5febcbe99 Cross Memory Attach: make it Kconfigurable ... Browse Code »

Add a Kconfig option to allow people who don't want cross memory attach to
not have it included in their build.

Signed-off-by: Chris Yeoh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christopher Yeoh
2012-05-30 07:22:20 +0800
eb6332a54 Documentation: memcg: future proof hierarchical statistics documentation ... Browse Code »

The hierarchical versions of per-memcg counters in memory.stat are all
calculated the same way and are all named total_.

Documenting the pattern is easier for maintenance than listing each
counter twice.

Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
Acked-by: KOSAKI Motohiro
Acked-by: Ying Han
Randy Dunlap
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:20 +0800
6f60b69d8 mm, thp: drop page_table_lock to uncharge memcg pages ... Browse Code »

mm->page_table_lock is hotly contested for page fault tests and isn't
necessary to do mem_cgroup_uncharge_page() in do_huge_pmd_wp_page().

Signed-off-by: David Rientjes
Cc: KAMEZAWA Hiroyuki
Cc: Andrea Arcangeli
Acked-by: Johannes Weiner
Reviewed-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2012-05-30 07:22:20 +0800
096a7cf44 mm: rename is_mlocked_vma() to mlocked_vma_newpage() ... Browse Code »

Andrew pointed out that the is_mlocked_vma() is misnamed. A function
with name like that would expect bool return and no side-effects.

Since it is called on the fault path for new page, rename it in this
patch.

Signed-off-by: Ying Han
Reviewed-by: Rik van Riel
Acked-by: KOSAKI Motohiro
Acked-by: KAMEZAWA Hiroyuki
Reviewed-by: Minchan Kim
[akpm@linux-foundation.org: s/mlock_vma_newpage/mlock_vma_newpage/, per Minchan]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ying Han
2012-05-30 07:22:20 +0800