Eric Lee / smarc-fsl-linux-kernel

25 May, 2011

40 commits

9840e3723 mm: remove check_huge_range() ... Browse Code »

This function has been superseded by gather_hugetbl_stats() and is no
longer needed.

Signed-off-by: Stephen Wilson
Reviewed-by: KOSAKI Motohiro
Cc: Hugh Dickins
Cc: David Rientjes
Cc: Lee Schermerhorn
Cc: Alexey Dobriyan
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Wilson
2011-05-25 23:39:33 +0800
722e2ee09 mm: make gather_stats() type-safe and remove forward declaration ... Browse Code »

Improve the prototype of gather_stats() to take a struct numa_maps as
argument instead of a generic void *. Update all callers to make the
required type explicit.

Since gather_stats() is not needed before its definition and is scheduled
to be moved out of mempolicy.c the declaration is removed as well.

Signed-off-by: Stephen Wilson
Reviewed-by: KOSAKI Motohiro
Cc: Hugh Dickins
Cc: David Rientjes
Cc: Lee Schermerhorn
Cc: Alexey Dobriyan
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Wilson
2011-05-25 23:39:33 +0800
b1f72d185 mm: remove MPOL_MF_STATS ... Browse Code »

Mapping statistics in a NUMA environment is now computed using the generic
walk_page_range() logic. Remove the old/equivalent functionality.

Signed-off-by: Stephen Wilson
Reviewed-by: KOSAKI Motohiro
Cc: Hugh Dickins
Cc: David Rientjes
Cc: Lee Schermerhorn
Cc: Alexey Dobriyan
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Wilson
2011-05-25 23:39:33 +0800
29ea2f698 mm: use walk_page_range() instead of custom page table walking code ... Browse Code »
1

Converting show_numa_map() to use the generic routine decouples the
function from mempolicy.c, allowing it to be moved out of the mm subsystem
and into fs/proc.

Also, include KSM pages in /proc/pid/numa_maps statistics. The pagewalk
logic implemented by check_pte_range() failed to account for such pages as
they were not applicable to the page migration case.

Signed-off-by: Stephen Wilson
Reviewed-by: KOSAKI Motohiro
Cc: Hugh Dickins
Cc: David Rientjes
Cc: Lee Schermerhorn
Cc: Alexey Dobriyan
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Wilson
2011-05-25 23:39:32 +0800
d98f6cb67 mm: export get_vma_policy() ... Browse Code »

In commit 48fce3429d ("mempolicies: unexport get_vma_policy()")
get_vma_policy() was marked static as all clients were local to
mempolicy.c.

However, the decision to generate /proc/pid/numa_maps in the numa memory
policy code and outside the procfs subsystem introduces an artificial
interdependency between the two systems. Exporting get_vma_policy() once
again is the first step to clean up this interdependency.

Signed-off-by: Stephen Wilson
Reviewed-by: KOSAKI Motohiro
Cc: Hugh Dickins
Cc: David Rientjes
Cc: Lee Schermerhorn
Cc: Alexey Dobriyan
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Wilson
2011-05-25 23:39:32 +0800
c856507f2 mm: remove last trace of shmem_get_unmapped_area ... Browse Code »

Remove noMMU declaration of shmem_get_unmapped_area() from mm.h: it fell
out of use in 2.6.21 and ceased to exist in 2.6.29.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2011-05-25 23:39:31 +0800
b09e0fa4b tmpfs: implement generic xattr support ... Browse Code »

Implement generic xattrs for tmpfs filesystems. The Feodra project, while
trying to replace suid apps with file capabilities, realized that tmpfs,
which is used on the build systems, does not support file capabilities and
thus cannot be used to build packages which use file capabilities. Xattrs
are also needed for overlayfs.

The xattr interface is a bit odd. If a filesystem does not implement any
{get,set,list}xattr functions the VFS will call into some random LSM hooks
and the running LSM can then implement some method for handling xattrs.
SELinux for example provides a method to support security.selinux but no
other security.* xattrs.

As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
xattr handler routines specifically to handle acls. Because of this tmpfs
would loose the VFS/LSM helpers to support the running LSM. To make up
for that tmpfs had stub functions that did nothing but call into the LSM
hooks which implement the helpers.

This new patch does not use the LSM fallback functions and instead just
implements a native get/set/list xattr feature for the full security.* and
trusted.* namespace like a normal filesystem. This means that tmpfs can
now support both security.selinux and security.capability, which was not
previously possible.

The basic implementation is that I attach a:

struct shmem_xattr {
struct list_head list; /* anchored by shmem_inode_info->xattr_list */
char *name;
size_t size;
char value[0];
};

Into the struct shmem_inode_info for each xattr that is set. This
implementation could easily support the user.* namespace as well, except
some care needs to be taken to prevent large amounts of unswappable memory
being allocated for unprivileged users.

[mszeredi@suse.cz: new config option, suport trusted.*, support symlinks]
Signed-off-by: Eric Paris
Signed-off-by: Miklos Szeredi
Acked-by: Serge Hallyn
Tested-by: Serge Hallyn
Cc: Kyle McMartin
Acked-by: Hugh Dickins
Tested-by: Jordi Pujol
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Paris
2011-05-25 23:39:31 +0800
4eb317072 memblock/nobootmem: remove unneeded code from alloc_bootmem_node_high() ... Browse Code »

The bootmem wrapper with memblock supports top-down now, so we no longer
need this trick.

Signed-off-by: Yinghai LU
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Olaf Hering
Cc: Tejun Heo
Cc: Lucas De Marchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2011-05-25 23:39:31 +0800
8bba154ef memblock/nobootmem: allow alloc_bootmem() to take 0 as low limit ... Browse Code »

The bootmem wrapper with memblock supports top-down now, so we do not need
to set the low limit to __pa(MAX_DMA_ADDRESS).

The logic should be: good to allocate above __pa(MAX_DMA_ADDRESS), but it
is ok if we can not find memory above 16M on system that has a small
amount of RAM.

Signed-off-by: Yinghai LU
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Olaf Hering
Cc: Tejun Heo
Cc: Lucas De Marchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2011-05-25 23:39:30 +0800
172703b08 mm: delete non-atomic mm counter implementation ... Browse Code »

The problem with having two different types of counters is that developers
adding new code need to keep in mind whether it's safe to use both the
atomic and non-atomic implementations. For example, when adding new
callers of the *_mm_counter() functions a developer needs to ensure that
those paths are always executed with page_table_lock held, in case we're
using the non-atomic implementation of mm counters.

Hugh Dickins introduced the atomic mm counters in commit f412ac08c986
("[PATCH] mm: fix rss and mmlist locking"). When asked why he left the
non-atomic counters around he said,

| The only reason was to avoid adding costly atomic operations into a
| configuration that had no need for them there: the page_table_lock
| sufficed.
|
| Certainly it would be simpler just to delete the non-atomic variant.
|
| And I think it's fair to say that any configuration on which we're
| measuring performance to that degree (rather than "does it boot fast?"
| type measurements), would already be going the split ptlocks route.

Removing the non-atomic counters eases the maintenance burden because
developers no longer have to mindful of the two implementations when using
*_mm_counter().

Note that all architectures provide a means of atomically updating
atomic_long_t variables, even if they have to revert to the generic
spinlock implementation because they don't support 64-bit atomic
instructions (see lib/atomic64.c).

Signed-off-by: Matt Fleming
Acked-by: Dave Hansen
Acked-by: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matt Fleming
2011-05-25 23:39:30 +0800
a197b59ae mm: fail GFP_DMA allocations when ZONE_DMA is not configured ... Browse Code »

The page allocator will improperly return a page from ZONE_NORMAL even
when __GFP_DMA is passed if CONFIG_ZONE_DMA is disabled. The caller
expects DMA memory, perhaps for ISA devices with 16-bit address registers,
and may get higher memory resulting in undefined behavior.

This patch causes the page allocator to return NULL in such circumstances
with a warning emitted to the kernel log on the first occurrence.

Signed-off-by: David Rientjes
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: KAMEZAWA Hiroyuki
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-05-25 23:39:29 +0800
ba93fa81b mm: do not define PFN_SECTION_SHIFT if !CONFIG_SPARSEMEM ... Browse Code »

Do not define PFN_SECTION_SHIFT if !CONFIG_SPARSEMEM.

Signed-off-by: Daniel Kiper
Acked-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Kiper
2011-05-25 23:39:29 +0800
e3c40f379 mm: pfn_to_section_nr()/section_nr_to_pfn() is valid only in CONFIG_SPARSEMEM context ... Browse Code »

pfn_to_section_nr()/section_nr_to_pfn() is valid only in CONFIG_SPARSEMEM
context. Move it to proper place.

Signed-off-by: Daniel Kiper
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Kiper
2011-05-25 23:39:29 +0800
bf4e8902e mm: enable set_page_section() only if CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP ... Browse Code »

set_page_section() is meaningful only in CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP context. Move it to proper place and amend
accordingly functions which are using it.

Signed-off-by: Daniel Kiper
Acked-by: Dave Hansen
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Kiper
2011-05-25 23:39:28 +0800
a3bc42f58 mm: remove dependency on CONFIG_FLATMEM from online_page() ... Browse Code »

online_pages() is only compiled for CONFIG_MEMORY_HOTPLUG_SPARSE, so there
is no need to support CONFIG_FLATMEM code within it.

This patch removes code that is never used.

Signed-off-by: Daniel Kiper
Acked-by: Dave Hansen
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Kiper
2011-05-25 23:39:28 +0800
821ed6bbe mm: filter unevictable page out in deactivate_page() ... Browse Code »

It's pointless that deactive_page's operates on unevictable pages. This
patch removes unnecessary overhead which might be a bit problem in case
that there are many unevictable page in system(ex, mprotect workload)

[akpm@linux-foundation.org: tidy up comment]
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Minchan Kim
Reviewed-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2011-05-25 23:39:27 +0800
2cbea1d3a readahead: trigger mmap sequential readahead on PG_readahead ... Browse Code »

Previously the mmap sequential readahead is triggered by updating
ra->prev_pos on each page fault and compare it with current page offset.

It costs dirtying the cache line on each _minor_ page fault. So remove
the ra->prev_pos recording, and instead tag PG_readahead to trigger the
possible sequential readahead. It's not only more simple, but also will
work more reliably and reduce cache line bouncing on concurrent page
faults on shared struct file.

In the mosbench exim benchmark which does multi-threaded page faults on
shared struct file, the ra->mmap_miss and ra->prev_pos updates are found
to cause excessive cache line bouncing on tmpfs, which actually disabled
readahead totally (shmem_backing_dev_info.ra_pages == 0).

So remove the ra->prev_pos recording, and instead tag PG_readahead to
trigger the possible sequential readahead. It's not only more simple, but
also will work more reliably on concurrent reads on shared struct file.

Signed-off-by: Wu Fengguang
Tested-by: Tim Chen
Reported-by: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2011-05-25 23:39:27 +0800
207d04baa readahead: reduce unnecessary mmap_miss increases ... Browse Code »

The original INT_MAX is too large, reduce it to

- avoid unnecessarily dirtying/bouncing the cache line

- restore mmap read-around faster on changed access pattern

Background: in the mosbench exim benchmark which does multi-threaded page
faults on shared struct file, the ra->mmap_miss updates are found to cause
excessive cache line bouncing on tmpfs. The ra state updates are needless
for tmpfs because it actually disabled readahead totally
(shmem_backing_dev_info.ra_pages == 0).

Tested-by: Tim Chen
Signed-off-by: Andi Kleen
Signed-off-by: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2011-05-25 23:39:26 +0800
275b12bf5 readahead: return early when readahead is disabled ... Browse Code »

Reduce readahead overheads by returning early in do_sync_mmap_readahead().

tmpfs has ra_pages=0 and it can page fault really fast (not constraint by
IO if not swapping).

Signed-off-by: Wu Fengguang
Tested-by: Tim Chen
Reported-by: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2011-05-25 23:39:26 +0800
1495f230f vmscan: change shrinker API by passing shrink_control struct ... Browse Code »

Change each shrinker's API by consolidating the existing parameters into
shrink_control struct. This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han
Cc: KOSAKI Motohiro
Cc: Minchan Kim
Acked-by: Pavel Emelyanov
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Acked-by: Rik van Riel
Cc: Johannes Weiner
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Steven Whitehouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ying Han
2011-05-25 23:39:26 +0800
a09ed5e00 vmscan: change shrink_slab() interfaces by passing shrink_control ... Browse Code »

Consolidate the existing parameters to shrink_slab() into a new
shrink_control struct. This is needed later to pass the same struct to
shrinkers.

Signed-off-by: Ying Han
Cc: KOSAKI Motohiro
Cc: Minchan Kim
Acked-by: Pavel Emelyanov
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Acked-by: Rik van Riel
Cc: Johannes Weiner
Cc: Hugh Dickins
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ying Han
2011-05-25 23:39:25 +0800
7b1de5868 readahead: readahead page allocations are OK to fail ... Browse Code »

Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.

readahead page allocations are completely optional. They are OK to fail
and in particular shall not trigger OOM on themselves.

Reported-by: Dave Young
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Wu Fengguang
Reviewed-by: Minchan Kim
Reviewed-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2011-05-25 23:39:25 +0800
6d3163ce8 mm: check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE ... Browse Code »
1

This fixes a problem where the first pageblock got marked MIGRATE_RESERVE
even though it only had a few free pages. eg, On current ARM port, The
kernel starts at offset 0x8000 to leave room for boot parameters, and the
memory is freed later.

This in turn caused no contiguous memory to be reserved and frequent
kswapd wakeups that emptied the caches to get more contiguous memory.

Unfortunatelly, ARM needs order-2 allocation for pgd (see
arm/mm/pgd.c#pgd_alloc()). Therefore the issue is not minor nor easy
avoidable.

[kosaki.motohiro@jp.fujitsu.com: added some explanation]
[kosaki.motohiro@jp.fujitsu.com: add !pfn_valid_within() to check]
[minchan.kim@gmail.com: check end_pfn in pageblock_is_reserved]
Signed-off-by: John Stultz
Signed-off-by: Arve Hjønnevåg
Signed-off-by: KOSAKI Motohiro
Acked-by: Mel Gorman
Acked-by: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arve Hjønnevåg
2011-05-25 23:39:24 +0800
0091a47da m32r, mm: set all online nodes in N_NORMAL_MEMORY ... Browse Code »

For m32r, N_NORMAL_MEMORY represents all nodes that have present memory
since it does not support HIGHMEM. This patch sets the bit at the time
the node is initialized.

If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it
uses this nodemask to setup per-cache kmem_cache_node data structures.

Signed-off-by: David Rientjes
Cc: Hirokazu Takata
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-05-25 23:39:24 +0800
ad8b4b285 alpha, mm: set all online nodes in N_NORMAL_MEMORY ... Browse Code »

For alpha, N_NORMAL_MEMORY represents all nodes that have present memory
since it does not support HIGHMEM. This patch sets the bit at the time
the node is initialized.

If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it
uses this nodemask to setup per-cache kmem_cache_node data structures.

Signed-off-by: David Rientjes
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-05-25 23:39:23 +0800
0c917313a mm: strictly require elevated page refcount in isolate_lru_page() ... Browse Code »

isolate_lru_page() must be called only with stable reference to the page,
this is what is written in the comment above it, this is reasonable.

current isolate_lru_page() users and its page extra reference sources:

mm/huge_memory.c:
__collapse_huge_page_isolate() - reference from pte

mm/memcontrol.c:
mem_cgroup_move_parent() - get_page_unless_zero()
mem_cgroup_move_charge_pte_range() - reference from pte

mm/memory-failure.c:
soft_offline_page() - fixed, reference from get_any_page()
delete_from_lru_cache() - reference from caller or get_page_unless_zero()
[ seems like there bug, because __memory_failure() can call
page_action() for hpages tail, but it is ok for
isolate_lru_page(), tail getted and not in lru]

mm/memory_hotplug.c:
do_migrate_range() - fixed, get_page_unless_zero()

mm/mempolicy.c:
migrate_page_add() - reference from pte

mm/migrate.c:
do_move_page_to_node_array() - reference from follow_page()

mlock.c: - various external references

mm/vmscan.c:
putback_lru_page() - reference from isolate_lru_page()

It seems that all isolate_lru_page() users are ready now for this
restriction. So, let's replace redundant get_page_unless_zero() with
get_page() and add page initial reference count check with VM_BUG_ON()

Signed-off-by: Konstantin Khlebnikov
Cc: Andi Kleen
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Cc: Lee Schermerhorn
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2011-05-25 23:39:23 +0800
bd486285f mem-hwpoison: fix page refcount around isolate_lru_page() ... Browse Code »

Drop first page reference only after calling isolate_lru_page() to keep
page stable reference while isolating.

Signed-off-by: Konstantin Khlebnikov
Cc: Andi Kleen
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Cc: Lee Schermerhorn
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2011-05-25 23:39:23 +0800
700c2a46e mem-hotplug: call isolate_lru_page with elevated refcount ... Browse Code »

isolate_lru_page() must be called only with stable reference to page. So,
let's grab normal page reference.

Signed-off-by: Konstantin Khlebnikov
Cc: Andi Kleen
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Cc: Lee Schermerhorn
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2011-05-25 23:39:22 +0800
22943ab11 mm: print vmalloc() state after allocation failures ... Browse Code »

I was tracking down a page allocation failure that ended up in vmalloc().
Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
of memory, we'll still get a warning with "order:0" in it. That's not
very useful.

During recovery, vmalloc() also nicely frees all of the memory that it got
up to the point of the failure. That is wonderful, but it also quickly
hides any issues. We have a much different sitation if vmalloc()
repeatedly fails 10GB in to:

vmalloc(100 * 1<] warn_alloc_failed+0x146/0x170
[ 68.126464] [] ? printk+0x6c/0x70
[ 68.126791] [] ? alloc_pages_current+0x94/0xe0
[ 68.127661] [] __vmalloc_node_range+0x237/0x290
...

The 'order' variable is added for clarity when calling warn_alloc_failed()
to avoid having an unexplained '0' as an argument.

The 'tmp_mask' is because adding an open-coded '| __GFP_NOWARN' would take
us over 80 columns for the alloc_pages_node() call. If we are going to
add a line, it might as well be one that makes the sucker easier to read.

As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
solely on an unverified value passed in from userspace. Granted, it's
under CAP_SYS_ADMIN, but it still frightens me a bit.

Signed-off-by: Dave Hansen
Cc: Johannes Weiner
Cc: David Rientjes
Cc: Michal Nazarewicz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2011-05-25 23:39:22 +0800
a238ab5b0 mm: break out page allocation warning code ... Browse Code »

This originally started as a simple patch to give vmalloc() some more
verbose output on failure on top of the plain page allocator messages.
Johannes suggested that it might be nicer to lead with the vmalloc() info
_before_ the page allocator messages.

But, I do think there's a lot of value in what __alloc_pages_slowpath()
does with its filtering and so forth.

This patch creates a new function which other allocators can call instead
of relying on the internal page allocator warnings. It also gives this
function private rate-limiting which separates it from other
printk_ratelimit() users.

Signed-off-by: Dave Hansen
Cc: Johannes Weiner
Cc: David Rientjes
Cc: Michal Nazarewicz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2011-05-25 23:39:21 +0800
de03c72cf mm: convert mm->cpu_vm_cpumask into cpumask_var_t ... Browse Code »

cpumask_t is very big struct and cpu_vm_mask is placed wrong position.
It might lead to reduce cache hit ratio.

This patch has two change.
1) Move the place of cpumask into last of mm_struct. Because usually cpumask
is accessed only front bits when the system has cpu-hotplug capability
2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory
footprint if cpumask_size() will use nr_cpumask_bits properly in future.

In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var.
It may help to detect out of tree cpu_vm_mask users.

This patch has no functional change.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KOSAKI Motohiro
Cc: David Howells
Cc: Koichi Yasutake
Cc: Hugh Dickins
Cc: Chris Metcalf
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2011-05-25 23:39:21 +0800
692e0b354 mm: thp: optimize memcg charge in khugepaged ... Browse Code »

We don't need to hold the mmmap_sem through mem_cgroup_newpage_charge(),
the mmap_sem is only hold for keeping the vma stable and we don't need the
vma stable anymore after we return from alloc_hugepage_vma().

Signed-off-by: Andrea Arcangeli
Cc: Johannes Weiner
Cc: Hugh Dickins
Cc: David Rientjes
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-05-25 23:39:21 +0800
9547d01bf mm: uninline large generic tlb.h functions ... Browse Code »

Some of these functions have grown beyond inline sanity, move them
out-of-line.

Signed-off-by: Peter Zijlstra
Requested-by: Andrew Morton
Requested-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-05-25 23:39:20 +0800
88c22088b mm: optimize page_lock_anon_vma() fast-path ... Browse Code »

Optimize the page_lock_anon_vma() fast path to be one atomic op, instead
of two.

Signed-off-by: Peter Zijlstra
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: Martin Schwidefsky
Cc: Russell King
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Tony Luck
Cc: Hugh Dickins
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-05-25 23:39:20 +0800
2b575eb64 mm: convert anon_vma->lock to a mutex ... Browse Code »

Straightforward conversion of anon_vma->lock to a mutex.

Signed-off-by: Peter Zijlstra
Acked-by: Hugh Dickins
Reviewed-by: KOSAKI Motohiro
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: Martin Schwidefsky
Cc: Russell King
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Tony Luck
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: Nick Piggin
Cc: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-05-25 23:39:19 +0800
746b18d42 mm: use refcounts for page_lock_anon_vma() ... Browse Code »

Convert page_lock_anon_vma() over to use refcounts. This is done to
prepare for the conversion of anon_vma from spinlock to mutex.

Sadly this inceases the cost of page_lock_anon_vma() from one to two
atomics, a follow up patch addresses this, lets keep that simple for now.

Signed-off-by: Peter Zijlstra
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: KOSAKI Motohiro
Acked-by: Hugh Dickins
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: Martin Schwidefsky
Cc: Russell King
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Tony Luck
Cc: Mel Gorman
Cc: Nick Piggin
Cc: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-05-25 23:39:19 +0800
6111e4ca6 mm: improve page_lock_anon_vma() comment ... Browse Code »

A slightly more verbose comment to go along with the trickery in
page_lock_anon_vma().

Signed-off-by: Peter Zijlstra
Reviewed-by: KOSAKI Motohiro
Reviewed-by: KAMEZAWA Hiroyuki
Acked-by: Mel Gorman
Acked-by: Hugh Dickins
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: Martin Schwidefsky
Cc: Russell King
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Tony Luck
Cc: Nick Piggin
Cc: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-05-25 23:39:18 +0800
25aeeb046 mm: revert page_lock_anon_vma() lock annotation ... Browse Code »

Its beyond ugly and gets in the way.

Signed-off-by: Peter Zijlstra
Acked-by: Hugh Dickins
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: Martin Schwidefsky
Cc: Russell King
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Tony Luck
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: Namhyung Kim
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-05-25 23:39:18 +0800
3d48ae45e mm: Convert i_mmap_lock to a mutex ... Browse Code »

Straightforward conversion of i_mmap_lock to a mutex.

Signed-off-by: Peter Zijlstra
Acked-by: Hugh Dickins
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: Martin Schwidefsky
Cc: Russell King
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Tony Luck
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-05-25 23:39:18 +0800
97a894136 mm: Remove i_mmap_lock lockbreak ... Browse Code »
43

Hugh says:
"The only significant loser, I think, would be page reclaim (when
concurrent with truncation): could spin for a long time waiting for
the i_mmap_mutex it expects would soon be dropped? "

Counter points:
- cpu contention makes the spin stop (need_resched())
- zap pages should be freeing pages at a higher rate than reclaim
ever can

I think the simplification of the truncate code is definitely worth it.

Effectively reverts: 2aa15890f3c ("mm: prevent concurrent
unmap_mapping_range() on the same inode") and takes out the code that
caused its problem.

Signed-off-by: Peter Zijlstra
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: Martin Schwidefsky
Cc: Russell King
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Tony Luck
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-05-25 23:39:17 +0800