Doug / smarc-fsl-linux-kernel | Embedian Git Server

04 Jul, 2013

40 commits

0c9885347 mm: concentrate modification of totalram_pages into the mm core ... Browse Code »

Concentrate code to modify totalram_pages into the mm core, so the arch
memory initialized code doesn't need to take care of it. With these
changes applied, only following functions from mm core modify global
variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
free_all_bootmem_node(), adjust_managed_page_count().

With this patch applied, it will be much more easier for us to keep
totalram_pages and zone->managed_pages in consistence.

Signed-off-by: Jiang Liu
Acked-by: David Howells
Cc: "H. Peter Anvin"
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Marek Szyprowski
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tang Chen
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wen Congyang
Cc: Will Deacon
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800
3dcc0571c mm: correctly update zone->managed_pages ... Browse Code »

Enhance adjust_managed_page_count() to adjust totalhigh_pages for
highmem pages. And change code which directly adjusts totalram_pages to
use adjust_managed_page_count() because it adjusts totalram_pages,
totalhigh_pages and zone->managed_pages altogether in a safe way.

Remove inc_totalhigh_pages() and dec_totalhigh_pages() from xen/balloon
driver bacause adjust_managed_page_count() has already adjusted
totalhigh_pages.

This patch also fixes two bugs:

1) enhances virtio_balloon driver to adjust totalhigh_pages when
reserve/unreserve pages.
2) enhance memory_hotplug.c to adjust totalhigh_pages when hot-removing
memory.

We still need to deal with modifications of totalram_pages in file
arch/powerpc/platforms/pseries/cmm.c, but need help from PPC experts.

[akpm@linux-foundation.org: remove ifdef, per Wanpeng Li, virtio_balloon.c cleanup, per Sergei]
[akpm@linux-foundation.org: export adjust_managed_page_count() to modules, for drivers/virtio/virtio_balloon.c]
Signed-off-by: Jiang Liu
Cc: Chris Metcalf
Cc: Rusty Russell
Cc: "Michael S. Tsirkin"
Cc: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge
Cc: Wen Congyang
Cc: Tang Chen
Cc: Yasuaki Ishimatsu
Cc: Mel Gorman
Cc: Minchan Kim
Cc: "H. Peter Anvin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jianguo Wu
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Marek Szyprowski
Cc: Michel Lespinasse
Cc: Rik van Riel
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: Yinghai Lu
Cc: Russell King
Cc: Sergei Shtylyov
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800
170a5a7eb mm: make __free_pages_bootmem() only available at boot time ... Browse Code »

In order to simpilify management of totalram_pages and
zone->managed_pages, make __free_pages_bootmem() only available at boot
time. With this change applied, __free_pages_bootmem() will only be
used by bootmem.c and nobootmem.c at boot time, so mark it as __init.
Other callers of __free_pages_bootmem() have been converted to use
free_reserved_page(), which handles totalram_pages and
zone->managed_pages in a safer way.

This patch also fix a bug in free_pagetable() for x86_64, which should
increase zone->managed_pages instead of zone->present_pages when freeing
reserved pages.

And now we have managed_pages_count_lock to protect totalram_pages and
zone->managed_pages, so remove the redundant ppb_lock lock in
put_page_bootmem(). This greatly simplifies the locking rules.

Signed-off-by: Jiang Liu
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Yinghai Lu
Cc: Wen Congyang
Cc: Tang Chen
Cc: Yasuaki Ishimatsu
Cc: Mel Gorman
Cc: Minchan Kim
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Marek Szyprowski
Cc: Michel Lespinasse
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tejun Heo
Cc: Will Deacon
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800
c3d5f5f0c mm: use a dedicated lock to protect totalram_pages and zone->managed_pages ... Browse Code »

Currently lock_memory_hotplug()/unlock_memory_hotplug() are used to
protect totalram_pages and zone->managed_pages. Other than the memory
hotplug driver, totalram_pages and zone->managed_pages may also be
modified at runtime by other drivers, such as Xen balloon,
virtio_balloon etc. For those cases, memory hotplug lock is a little
too heavy, so introduce a dedicated lock to protect totalram_pages and
zone->managed_pages.

Now we have a simplified locking rules totalram_pages and
zone->managed_pages as:

1) no locking for read accesses because they are unsigned long.
2) no locking for write accesses at boot time in single-threaded context.
3) serialize write accesses at runtime by acquiring the dedicated
managed_page_count_lock.

Also adjust zone->managed_pages when freeing reserved pages into the
buddy system, to keep totalram_pages and zone->managed_pages in
consistence.

[akpm@linux-foundation.org: don't export adjust_managed_page_count to modules (for now)]
Signed-off-by: Jiang Liu
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Rik van Riel
Cc: Minchan Kim
Cc: "H. Peter Anvin"
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Marek Szyprowski
Cc: Rusty Russell
Cc: Tang Chen
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wen Congyang
Cc: Will Deacon
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800
7b4b2a0d6 mm: accurately calculate zone->managed_pages for highmem zones ... Browse Code »

Commit "mm: introduce new field 'managed_pages' to struct zone" assumes
that all highmem pages will be freed into the buddy system by function
mem_init(). But that's not always true, some architectures may reserve
some highmem pages during boot. For example PPC may allocate highmem
pages for giagant HugeTLB pages, and several architectures have code to
check PageReserved flag to exclude highmem pages allocated during boot
when freeing highmem pages into the buddy system.

So treat highmem pages in the same way as normal pages, that is to:
1) reset zone->managed_pages to zero in mem_init().
2) recalculate managed_pages when freeing pages into the buddy system.

Signed-off-by: Jiang Liu
Cc: "H. Peter Anvin"
Cc: Tejun Heo
Cc: Joonsoo Kim
Cc: Yinghai Lu
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Kamezawa Hiroyuki
Cc: Marek Szyprowski
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Konrad Rzeszutek Wilk
Cc: Michel Lespinasse
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tang Chen
Cc: Thomas Gleixner
Cc: Wen Congyang
Cc: Will Deacon
Cc: Yasuaki Ishimatsu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800
4f9f47745 mm: use managed_pages to calculate default zonelist order ... Browse Code »

Use zone->managed_pages instead of zone->present_pages to calculate
default zonelist order because managed_pages means allocatable pages.

Signed-off-by: Jiang Liu
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Marek Szyprowski
Cc: "H. Peter Anvin"
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Michel Lespinasse
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tang Chen
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wen Congyang
Cc: Will Deacon
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800
834405c3b mm: fix some trivial typos in comments ... Browse Code »

Fix some trivial typos in comments.

Signed-off-by: Jiang Liu
Cc: Wen Congyang
Cc: Tang Chen
Cc: Yasuaki Ishimatsu
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Marek Szyprowski
Cc: "H. Peter Anvin"
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Michel Lespinasse
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: Yinghai Lu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800
abd1b6d65 mm/tile: use common help functions to free reserved pages ... Browse Code »

Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu
Cc: Chris Metcalf
Cc: Wen Congyang
Cc: "H. Peter Anvin"
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Marek Szyprowski
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tang Chen
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800
c88442ec4 mm/x86: use free_reserved_area() to simplify code ... Browse Code »

Use common help function free_reserved_area() to simplify code.

Signed-off-by: Jiang Liu
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Yinghai Lu
Cc: Tang Chen
Cc: Wen Congyang
Cc: Jianguo Wu
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Jeremy Fitzhardinge
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Marek Szyprowski
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tejun Heo
Cc: Will Deacon
Cc: Yasuaki Ishimatsu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800
9af5b8076 mm/ARM64: kill poison_init_mem() ... Browse Code »

Use free_reserved_area() to poison initmem memory pages and kill
poison_init_mem() on ARM64.

Signed-off-by: Jiang Liu
Acked-by: Catalin Marinas
Cc: Will Deacon
Cc: "H. Peter Anvin"
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Chris Metcalf
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Marek Szyprowski
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tang Chen
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wen Congyang
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:32 +0800
dbe67df4b mm: enhance free_reserved_area() to support poisoning memory with zero ... Browse Code »

Address more review comments from last round of code review.
1) Enhance free_reserved_area() to support poisoning freed memory with
pattern '0'. This could be used to get rid of poison_init_mem()
on ARM64.
2) A previous patch has disabled memory poison for initmem on s390
by mistake, so restore to the original behavior.
3) Remove redundant PAGE_ALIGN() when calling free_reserved_area().

Signed-off-by: Jiang Liu
Cc: Geert Uytterhoeven
Cc: "H. Peter Anvin"
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: David Howells
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Marek Szyprowski
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tang Chen
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wen Congyang
Cc: Will Deacon
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:32 +0800
11199692d mm: change signature of free_reserved_area() to fix building warnings ... Browse Code »

Change signature of free_reserved_area() according to Russell King's
suggestion to fix following build warnings:

arch/arm/mm/init.c: In function 'mem_init':
arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
^
In file included from include/linux/mman.h:4:0,
from arch/arm/mm/init.c:15:
include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
extern unsigned long free_reserved_area(unsigned long start, unsigned long end,

mm/page_alloc.c: In function 'free_reserved_area':
>> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
In file included from arch/mips/include/asm/page.h:49:0,
from include/linux/mmzone.h:20,
from include/linux/gfp.h:4,
from include/linux/mm.h:8,
from mm/page_alloc.c:18:
arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
mm/page_alloc.c: In function 'free_area_init_nodes':
mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]

Also address some minor code review comments.

Signed-off-by: Jiang Liu
Reported-by: Arnd Bergmann
Cc: "H. Peter Anvin"
Cc: "Michael S. Tsirkin"
Cc:
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Marek Szyprowski
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tang Chen
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wen Congyang
Cc: Will Deacon
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:32 +0800
dcf6b7ddd swap: discard while swapping only if SWAP_FLAG_DISCARD_PAGES ... Browse Code »

Considering the use cases where the swap device supports discard:
a) and can do it quickly;
b) but it's slow to do in small granularities (or concurrent with other
I/O);
c) but the implementation is so horrendous that you don't even want to
send one down;

And assuming that the sysadmin considers it useful to send the discards down
at all, we would (probably) want the following solutions:

i. do the fine-grained discards for freed swap pages, if device is
capable of doing so optimally;
ii. do single-time (batched) swap area discards, either at swapon
or via something like fstrim (not implemented yet);
iii. allow doing both single-time and fine-grained discards; or
iv. turn it off completely (default behavior)

As implemented today, one can only enable/disable discards for swap, but
one cannot select, for instance, solution (ii) on a swap device like (b)
even though the single-time discard is regarded to be interesting, or
necessary to the workload because it would imply (1), and the device is
not capable of performing it optimally.

This patch addresses the scenario depicted above by introducing a way to
ensure the (probably) wanted solutions (i, ii, iii and iv) can be flexibly
flagged through swapon(8) to allow a sysadmin to select the best suitable
swap discard policy accordingly to system constraints.

This patch introduces SWAP_FLAG_DISCARD_PAGES and SWAP_FLAG_DISCARD_ONCE
new flags to allow more flexibe swap discard policies being flagged
through swapon(8). The default behavior is to keep both single-time, or
batched, area discards (SWAP_FLAG_DISCARD_ONCE) and fine-grained discards
for page-clusters (SWAP_FLAG_DISCARD_PAGES) enabled, in order to keep
consistentcy with older kernel behavior, as well as maintain compatibility
with older swapon(8). However, through the new introduced flags the best
suitable discard policy can be selected accordingly to any given swap
device constraint.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Rafael Aquini
Acked-by: KOSAKI Motohiro
Cc: Hugh Dickins
Cc: Shaohua Li
Cc: Karel Zak
Cc: Jeff Moyer
Cc: Rik van Riel
Cc: Larry Woodman
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael Aquini
2013-07-04 07:07:32 +0800
917d9290a mm: tune vm_committed_as percpu_counter batching size ... Browse Code »

Currently the per cpu counter's batch size for memory accounting is
configured as twice the number of cpus in the system. However, for
system with very large memory, it is more appropriate to make it
proportional to the memory size per cpu in the system.

For example, for a x86_64 system with 64 cpus and 128 GB of memory, the
batch size is only 2*64 pages (0.5 MB). So any memory accounting
changes of more than 0.5MB will overflow the per cpu counter into the
global counter. Instead, for the new scheme, the batch size is
configured to be 0.4% of the memory/cpu = 8MB (128 GB/64 /256), which is
more inline with the memory size.

I've done a repeated brk test of 800KB (from will-it-scale test suite)
with 80 concurrent processes on a 4 socket Westmere machine with a total
of 40 cores. Without the patch, about 80% of cpu is spent on spin-lock
contention within the vm_committed_as counter. With the patch, there's
a 73x speedup on the benchmark and the lock contention drops off almost
entirely.

[akpm@linux-foundation.org: fix section mismatch]
Signed-off-by: Tim Chen
Cc: Tejun Heo
Cc: Eric Dumazet
Cc: Dave Hansen
Cc: Andi Kleen
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tim Chen
2013-07-04 07:07:32 +0800
2415cf12e mm/hugetlb: use already existing interface huge_page_shift ... Browse Code »

Use the already existing interface huge_page_shift instead of h->order +
PAGE_SHIFT.

Signed-off-by: Wanpeng Li
Cc: KAMEZAWA Hiroyuki
Cc: David Rientjes
Cc: Benjamin Herrenschmidt
Reviewed-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2013-07-04 07:07:32 +0800
5f1e31d2f mm/hugetlb: remove hugetlb_prefault ... Browse Code »

hugetlb_prefault() is not used any more, this patch removes it.

Signed-off-by: Wanpeng Li
Reviewed-by: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2013-07-04 07:07:32 +0800
4c42efa26 mm/pageblock: remove get/set_pageblock_flags ... Browse Code »

get_pageblock_flags and set_pageblock_flags are not used any more, this
patch removes them.

Signed-off-by: Wanpeng Li
Reviewed-by: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2013-07-04 07:07:32 +0800
cea27eb2a mm/memory-hotplug: fix lowmem count overflow when offline pages ... Browse Code »

The logic for the memory-remove code fails to correctly account the
Total High Memory when a memory block which contains High Memory is
offlined as shown in the example below. The following patch fixes it.

Before logic memory remove:

MemTotal: 7603740 kB
MemFree: 6329612 kB
Buffers: 94352 kB
Cached: 872008 kB
SwapCached: 0 kB
Active: 626932 kB
Inactive: 519216 kB
Active(anon): 180776 kB
Inactive(anon): 222944 kB
Active(file): 446156 kB
Inactive(file): 296272 kB
Unevictable: 0 kB
Mlocked: 0 kB
HighTotal: 7294672 kB
HighFree: 5704696 kB
LowTotal: 309068 kB
LowFree: 624916 kB

After logic memory remove:

MemTotal: 7079452 kB
MemFree: 5805976 kB
Buffers: 94372 kB
Cached: 872000 kB
SwapCached: 0 kB
Active: 626936 kB
Inactive: 519236 kB
Active(anon): 180780 kB
Inactive(anon): 222944 kB
Active(file): 446156 kB
Inactive(file): 296292 kB
Unevictable: 0 kB
Mlocked: 0 kB
HighTotal: 7294672 kB
HighFree: 5181024 kB
LowTotal: 4294752076 kB
LowFree: 624952 kB

[mhocko@suse.cz: fix CONFIG_HIGHMEM=n build]
Signed-off-by: Wanpeng Li
Reviewed-by: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Cc: David Rientjes
Cc: [2.6.24+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2013-07-04 07:07:32 +0800
4996eed86 mm/memory_hotplug.c: change normal message to use pr_debug ... Browse Code »

During early boot-up, iomem_resource is set up from the boot descriptor
table, such as EFI Memory Table and e820. Later,
acpi_memory_device_add() calls add_memory() for each ACPI memory device
object as it enumerates ACPI namespace. This add_memory() call is
expected to fail in register_memory_resource() at boot since
iomem_resource has been set up from EFI/e820. As a result, add_memory()
returns -EEXIST, which acpi_memory_device_add() handles as the normal
case.

This scheme works fine, but the following error message is logged for
every ACPI memory device object during boot-up.

"System RAM resource %pR cannot be added\n"

This patch changes register_memory_resource() to use pr_debug() for the
message as it shows up under the normal case.

Signed-off-by: Toshi Kani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshi Kani
2013-07-04 07:07:31 +0800
f15bdfa80 mm/memory-failure.c: fix memory leak in successful soft offlining ... Browse Code »

After a successful page migration by soft offlining, the source page is
not properly freed and it's never reusable even if we unpoison it
afterward.

This is caused by the race between freeing page and setting PG_hwpoison.
In successful soft offlining, the source page is put (and the refcount
becomes 0) by putback_lru_page() in unmap_and_move(), where it's linked
to pagevec and actual freeing back to buddy is delayed. So if
PG_hwpoison is set for the page before freeing, the freeing does not
functions as expected (in such case freeing aborts in
free_pages_prepare() check.)

This patch tries to make sure to free the source page before setting
PG_hwpoison on it. To avoid reallocating, the page keeps
MIGRATE_ISOLATE until after setting PG_hwpoison.

This patch also removes obsolete comments about "keeping elevated
refcount" because what they say is not true. Unlike memory_failure(),
soft_offline_page() uses no special page isolation code, and the
soft-offlined pages have no elevated.

Signed-off-by: Naoya Horiguchi
Cc: Andi Kleen
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2013-07-04 07:07:31 +0800
9bde916bc mm/nommu.c: add additional check for vread() just like vwrite() has done ... Browse Code »

vwrite() checks for overflow. vread() should do the same thing.

Since vwrite() checks the source buffer address, vread() should check
the destination buffer address.

Signed-off-by: Chen Gang
Cc: Al Viro
Cc: Michel Lespinasse
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen Gang
2013-07-04 07:07:31 +0800
dacbde096 mm/page_alloc.c: add additional checking and return value for the 'table->data' ... Browse Code »

- check the length of the procfs data before copying it into a fixed
size array.

- when __parse_numa_zonelist_order() fails, save the error code for
return.

- 'char*' --> 'char *' coding style fix

Signed-off-by: Chen Gang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chen Gang
2013-07-04 07:07:31 +0800
c53954a09 mm: remove lru parameter from __lru_cache_add and lru_cache_add_lru ... Browse Code »

Similar to __pagevec_lru_add, this patch removes the LRU parameter from
__lru_cache_add and lru_cache_add_lru as the caller does not control the
exact LRU the page gets added to. lru_cache_add_lru gets renamed to
lru_cache_add the name is silly without the lru parameter. With the
parameter removed, it is required that the caller indicate if they want
the page added to the active or inactive list by setting or clearing
PageActive respectively.

[akpm@linux-foundation.org: Suggested the patch]
[gang.chen@asianux.com: fix used-unintialized warning]
Signed-off-by: Mel Gorman
Signed-off-by: Chen Gang
Cc: Jan Kara
Cc: Rik van Riel
Acked-by: Johannes Weiner
Cc: Alexey Lyahkov
Cc: Andrew Perepechko
Cc: Robin Dong
Cc: Theodore Tso
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Bernd Schubert
Cc: David Howells
Cc: Trond Myklebust
Cc: Mel Gorman
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2013-07-04 07:07:31 +0800
a0b8cab3b mm: remove lru parameter from __pagevec_lru_add and remove parts of pagevec API ... Browse Code »

Now that the LRU to add a page to is decided at LRU-add time, remove the
misleading lru parameter from __pagevec_lru_add. A consequence of this
is that the pagevec_lru_add_file, pagevec_lru_add_anon and similar
helpers are misleading as the caller no longer has direct control over
what LRU the page is added to. Unused helpers are removed by this patch
and existing users of pagevec_lru_add_file() are converted to use
lru_cache_add_file() directly and use the per-cpu pagevecs instead of
creating their own pagevec.

Signed-off-by: Mel Gorman
Reviewed-by: Jan Kara
Reviewed-by: Rik van Riel
Acked-by: Johannes Weiner
Cc: Alexey Lyahkov
Cc: Andrew Perepechko
Cc: Robin Dong
Cc: Theodore Tso
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Bernd Schubert
Cc: David Howells
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2013-07-04 07:07:31 +0800
059285a25 mm: activate !PageLRU pages on mark_page_accessed if page is on local pagevec ... Browse Code »

If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
may fail to move a page to the active list as expected. Now that the
LRU is selected at LRU drain time, mark pages PageActive if they are on
the local pagevec so it gets moved to the correct list at LRU drain
time. Using a debugging patch it was found that for a simple git
checkout based workload that pages were never added to the active file
list in practice but with this patch applied they are.

before after
LRU Add Active File 0 750583
LRU Add Active Anon 2640587 2702818
LRU Add Inactive File 8833662 8068353
LRU Add Inactive Anon 207 200

Note that only pages on the local pagevec are considered on purpose. A
!PageLRU page could be in the process of being released, reclaimed,
migrated or on a remote pagevec that is currently being drained.
Marking it PageActive is vunerable to races where PageLRU and Active
bits are checked at the wrong time. Page reclaim will trigger
VM_BUG_ONs but depending on when the race hits, it could also free a
PageActive page to the page allocator and trigger a bad_page warning.
Similarly a potential race exists between a per-cpu drain on a pagevec
list and an activation on a remote CPU.

lru_add_drain_cpu
__pagevec_lru_add
lru = page_lru(page);
mark_page_accessed
if (PageLRU(page))
activate_page
else
SetPageActive
SetPageLRU(page);
add_page_to_lru_list(page, lruvec, lru);

In this case a PageActive page is added to the inactivate list and later
the inactive/active stats will get skewed. While the PageActive checks
in vmscan could be removed and potentially dealt with, a skew in the
statistics would be very difficult to detect. Hence this patch deals
just with the common case where a page being marked accessed has just
been added to the local pagevec.

Signed-off-by: Mel Gorman
Cc: Jan Kara
Cc: Rik van Riel
Acked-by: Johannes Weiner
Cc: Alexey Lyahkov
Cc: Andrew Perepechko
Cc: Robin Dong
Cc: Theodore Tso
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Bernd Schubert
Cc: David Howells
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2013-07-04 07:07:31 +0800
13f7f7898 mm: pagevec: defer deciding which LRU to add a page to until pagevec drain time ... Browse Code »

mark_page_accessed() cannot activate an inactive page that is located on
an inactive LRU pagevec. Hints from filesystems may be ignored as a
result. In preparation for fixing that problem, this patch removes the
per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
added to is deferred until the pagevec is drained.

This means that fewer pagevecs are available and potentially there is
greater contention on the LRU lock. However, this only applies in the
case where there is an almost perfect mix of file, anon, active and
inactive pages being added to the LRU. In practice I expect that we are
adding stream of pages of a particular time and that the changes in
contention will barely be measurable.

Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Cc: Jan Kara
Acked-by: Johannes Weiner
Cc: Alexey Lyahkov
Cc: Andrew Perepechko
Cc: Robin Dong
Cc: Theodore Tso
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Bernd Schubert
Cc: David Howells
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2013-07-04 07:07:31 +0800
c6286c983 mm: add tracepoints for LRU activation and insertions ... Browse Code »

Andrew Perepechko reported a problem whereby pages are being prematurely
evicted as the mark_page_accessed() hint is ignored for pages that are
currently on a pagevec --
http://www.spinics.net/lists/linux-ext4/msg37340.html .

Alexey Lyahkov and Robin Dong have also reported problems recently that
could be due to hot pages reaching the end of the inactive list too
quickly and be reclaimed.

Rather than addressing this on a per-filesystem basis, this series aims
to fix the mark_page_accessed() interface by deferring what LRU a page
is added to pagevec drain time and allowing mark_page_accessed() to call
SetPageActive on a pagevec page.

Patch 1 adds two tracepoints for LRU page activation and insertion. Using
these processes it's possible to build a model of pages in the
LRU that can be processed offline.

Patch 2 defers making the decision on what LRU to add a page to until when
the pagevec is drained.

Patch 3 searches the local pagevec for pages to mark PageActive on
mark_page_accessed. The changelog explains why only the local
pagevec is examined.

Patches 4 and 5 tidy up the API.

postmark, a dd-based test and fs-mark both single and threaded mode were
run but none of them showed any performance degradation or gain as a
result of the patch.

Using patch 1, I built a *very* basic model of the LRU to examine
offline what the average age of different page types on the LRU were in
milliseconds. Of course, capturing the trace distorts the test as it's
written to local disk but it does not matter for the purposes of this
test. The average age of pages in milliseconds were

vanilla deferdrain
Average age mapped anon: 1454 1250
Average age mapped file: 127841 155552
Average age unmapped anon: 85 235
Average age unmapped file: 73633 38884
Average age unmapped buffers: 74054 116155

The LRU activity was mostly files which you'd expect for a dd-based
workload. Note that the average age of buffer pages is increased by the
series and it is expected this is due to the fact that the buffer pages
are now getting added to the active list when drained from the pagevecs.
Note that the average age of the unmapped file data is decreased as they
are still added to the inactive list and are reclaimed before the
buffers.

There is no guarantee this is a universal win for all workloads and it
would be nice if the filesystem people gave some thought as to whether
this decision is generally a win or a loss.

This patch:

Using these tracepoints it is possible to model LRU activity and the
average residency of pages of different types. This can be used to
debug problems related to premature reclaim of pages of particular
types.

Signed-off-by: Mel Gorman
Reviewed-by: Rik van Riel
Cc: Jan Kara
Cc: Johannes Weiner
Cc: Alexey Lyahkov
Cc: Andrew Perepechko
Cc: Robin Dong
Cc: Theodore Tso
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Bernd Schubert
Cc: David Howells
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2013-07-04 07:07:31 +0800
f968ef1c5 memcg: update TODO list in Documentation ... Browse Code »

hugetlb cgroup has already been implemented.

Signed-off-by: Li Zefan
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Rob Landley
Cc: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2013-07-04 07:07:31 +0800
83086978c vmcore: support mmap() on /proc/vmcore ... Browse Code »

This patch introduces mmap_vmcore().

Don't permit writable nor executable mapping even with mprotect()
because this mmap() is aimed at reading crash dump memory. Non-writable
mapping is also requirement of remap_pfn_range() when mapping linear
pages on non-consecutive physical pages; see is_cow_mapping().

Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
remap_vmalloc_range_pertial at the same time for a single vma.
do_munmap() can correctly clean partially remapped vma with two
functions in abnormal case. See zap_pte_range(), vm_normal_page() and
their comments for details.

On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
limitation comes from the fact that the third argument of
remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

[akpm@linux-foundation.org: use min(), switch to conventional error-unwinding approach]
Signed-off-by: HATAYAMA Daisuke
Acked-by: Vivek Goyal
Cc: KOSAKI Motohiro
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Tested-by: Maxim Uvarov
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
591ff7166 vmcore: calculate vmcore file size from buffer size and total size of vmcore objects ... Browse Code »

The previous patches newly added holes before each chunk of memory and
the holes need to be count in vmcore file size. There are two ways to
count file size in such a way:

1) suppose m is a poitner to the last vmcore object in vmcore_list.
Then file size is (m->offset + m->size), or

2) calculate sum of size of buffers for ELF header, program headers,
ELF note segments and objects in vmcore_list.

Although 1) is more direct and simpler than 2), 2) seems better in that
it reflects internal object structure of /proc/vmcore. Thus, this patch
changes get_vmcore_size_elf{64, 32} so that it calculates size in the
way of 2).

As a result, both get_vmcore_size_elf{64, 32} have the same definition.
Merge them as get_vmcore_size.

Signed-off-by: HATAYAMA Daisuke
Acked-by: Vivek Goyal
Cc: KOSAKI Motohiro
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
ef9e78fd2 vmcore: allow user process to remap ELF note segment buffer ... Browse Code »

Now ELF note segment has been copied in the buffer on vmalloc memory.
To allow user process to remap the ELF note segment buffer with
remap_vmalloc_page, the corresponding VM area object has to have
VM_USERMAP flag set.

[akpm@linux-foundation.org: use the conventional comment layout]
Signed-off-by: HATAYAMA Daisuke
Acked-by: Vivek Goyal
Cc: KOSAKI Motohiro
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
087350c9d vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory ... Browse Code »

The reasons why we don't allocate ELF note segment in the 1st kernel
(old memory) on page boundary is to keep backward compatibility for old
kernels, and that if doing so, we waste not a little memory due to
round-up operation to fit the memory to page boundary since most of the
buffers are in per-cpu area.

ELF notes are per-cpu, so total size of ELF note segments depends on
number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
and there's already system with 4192 CPUs in SGI, where total size
amounts to 1MB. This can be larger in the near future or possibly even
now on another architecture that has larger size of note per a single
cpu. Thus, to avoid the case where memory allocation for large block
fails, we allocate vmcore objects on vmalloc memory.

This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
to the ELF note segment buffer and its size. There's no longer the
vmcore object that corresponds to the ELF note segment in vmcore_list.
Accordingly, read_vmcore() has new case for ELF note segment and
set_vmcore_list_offsets_elf{64,32}() and other helper functions starts
calculating offset from sum of size of ELF headers and size of ELF note
segment.

[akpm@linux-foundation.org: use min(), fix error-path vzalloc() leaks]
Signed-off-by: HATAYAMA Daisuke
Acked-by: Vivek Goyal
Cc: KOSAKI Motohiro
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
e69e9d4ae vmalloc: introduce remap_vmalloc_range_partial ... Browse Code »

We want to allocate ELF note segment buffer on the 2nd kernel in vmalloc
space and remap it to user-space in order to reduce the risk that memory
allocation fails on system with huge number of CPUs and so with huge ELF
note segment that exceeds 11-order block size.

Although there's already remap_vmalloc_range for the purpose of
remapping vmalloc memory to user-space, we need to specify user-space
range via vma.
Mmap on /proc/vmcore needs to remap range across multiple objects, so
the interface that requires vma to cover full range is problematic.

This patch introduces remap_vmalloc_range_partial that receives user-space
range as a pair of base address and size and can be used for mmap on
/proc/vmcore case.

remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.

[akpm@linux-foundation.org: use PAGE_ALIGNED()]
Signed-off-by: HATAYAMA Daisuke
Cc: KOSAKI Motohiro
Cc: Vivek Goyal
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
cef2ac3f6 vmalloc: make find_vm_area check in range ... Browse Code »

Currently, __find_vmap_area searches for the kernel VM area starting at
a given address. This patch changes this behavior so that it searches
for the kernel VM area to which the address belongs. This change is
needed by remap_vmalloc_range_partial to be introduced in later patch
that receives any position of kernel VM area as target address.

This patch changes the condition (addr > va->va_start) to the equivalent
(addr >= va->va_end) by taking advantage of the fact that each kernel VM
area is non-overlapping.

Signed-off-by: HATAYAMA Daisuke
Acked-by: KOSAKI Motohiro
Cc: Vivek Goyal
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
7f614cd1e vmcore: treat memory chunks referenced by PT_LOAD program header entries in page… ... Browse Code »

…-size boundary in vmcore_list

Treat memory chunks referenced by PT_LOAD program header entries in
page-size boundary in vmcore_list. Formally, for each range [start,
end], we set up the corresponding vmcore object in vmcore_list to
[rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].

This change affects layout of /proc/vmcore. The gaps generated by the
rearrangement are newly made visible to applications as holes.
Concretely, they are two ranges [rounddown(start, PAGE_SIZE), start] and
[end, roundup(end, PAGE_SIZE)].

Suppose variable m points at a vmcore object in vmcore_list, and
variable phdr points at the program header of PT_LOAD type the variable
m corresponds to. Then, pictorially:

m->offset +---------------+
| hole |
phdr->p_offset = +---------------+
m->offset + (paddr - start) | |\
| kernel memory | phdr->p_memsz
| |/
+---------------+
| hole |
m->offset + m->size +---------------+

where m->offset and m->offset + m->size are always page-size aligned.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
f2bdacdd5 vmcore: allocate buffer for ELF headers on page-size alignment ... Browse Code »

Allocate ELF headers on page-size boundary using __get_free_pages()
instead of kmalloc().

Later patch will merge PT_NOTE entries into a single unique one and
decrease the buffer size actually used. Keep original buffer size in
variable elfcorebuf_sz_orig to kfree the buffer later and actually used
buffer size with rounded up to page-size boundary in variable
elfcorebuf_sz separately.

The size of part of the ELF buffer exported from /proc/vmcore is
elfcorebuf_sz.

The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
elfcorebuf_sz_orig], is filled with 0.

Use size of the ELF headers as an initial offset value in
set_vmcore_list_offsets_elf{64,32} and
process_ptload_program_headers_elf{64,32} in order to indicate that the
offset includes the holes towards the page boundary.

As a result, both set_vmcore_list_offsets_elf{64,32} have the same
definition. Merge them as set_vmcore_list_offsets.

[akpm@linux-foundation.org: add free_elfcorebuf(), cleanups]
Signed-off-by: HATAYAMA Daisuke
Acked-by: Vivek Goyal
Cc: KOSAKI Motohiro
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
b27eb1866 vmcore: clean up read_vmcore() ... Browse Code »

Rewrite part of read_vmcore() that reads objects in vmcore_list in the
same way as part reading ELF headers, by which some duplicated and
redundant codes are removed.

Signed-off-by: HATAYAMA Daisuke
Acked-by: Vivek Goyal
Cc: KOSAKI Motohiro
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
0fa73b86e include/linux/mm.h: add PAGE_ALIGNED() helper ... Browse Code »

To test whether an address is aligned to PAGE_SIZE.

Cc: HATAYAMA Daisuke
Cc: "Eric W. Biederman" ,
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2013-07-04 07:07:30 +0800
d702909f0 memory_hotplug: use pgdat_resize_lock() in __offline_pages() ... Browse Code »

mmzone.h documents node_size_lock (which pgdat_resize_lock() locks) as
follows:

* Must be held any time you expect node_start_pfn, node_present_pages
* or node_spanned_pages stay constant. [...]

So actually hold it when we update node_present_pages in __offline_pages().

[akpm@linux-foundation.org: fix build]
Signed-off-by: Cody P Schafer
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cody P Schafer
2013-07-04 07:07:29 +0800
aa47228a1 memory_hotplug: use pgdat_resize_lock() in online_pages() ... Browse Code »

mmzone.h documents node_size_lock (which pgdat_resize_lock() locks) as
follows:

* Must be held any time you expect node_start_pfn, node_present_pages
* or node_spanned_pages stay constant. [...]

So actually hold it when we update node_present_pages in online_pages().

Signed-off-by: Cody P Schafer
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cody P Schafer
2013-07-04 07:07:29 +0800