Eric Lee / smarc-fsl-linux-kernel

25 May, 2010

3 commits

4eaf3f643 mem-hotplug: fix potential race while building zonelist for new populated zone ... Browse Code »

Add global mutex zonelists_mutex to fix the possible race:

CPU0 CPU1 CPU2
(1) zone->present_pages += online_pages;
(2) build_all_zonelists();
(3) alloc_page();
(4) free_page();
(5) build_all_zonelists();
(6) __build_all_zonelists();
(7) zone->pageset = alloc_percpu();

In step (3,4), zone->pageset still points to boot_pageset, so bad
things may happen if 2+ nodes are in this state. Even if only 1 node
is accessing the boot_pageset, (3) may still consume too much memory
to fail the memory allocations in step (7).

Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
since there is a new fresh memory block added in step(6).

[haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
Signed-off-by: Haicheng Li
Signed-off-by: Wu Fengguang
Reviewed-by: Andi Kleen
Cc: Christoph Lameter
Cc: Mel Gorman
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Haicheng Li
2010-05-25 23:07:02 +0800
1f522509c mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset ... Browse Code »

For each new populated zone of hotadded node, need to update its pagesets
with dynamically allocated per_cpu_pageset struct for all possible CPUs:

1) Detach zone->pageset from the shared boot_pageset
at end of __build_all_zonelists().

2) Use mutex to protect zone->pageset when it's still
shared in onlined_pages()

Otherwises, multiple zones of different nodes would share same boot strapping
boot_pageset for same CPU, which will finally cause below kernel panic:

------------[ cut here ]------------
kernel BUG at mm/page_alloc.c:1239!
invalid opcode: 0000 [#1] SMP
...
Call Trace:
[] __alloc_pages_nodemask+0x131/0x7b0
[] alloc_pages_current+0x87/0xd0
[] __page_cache_alloc+0x67/0x70
[] __do_page_cache_readahead+0x120/0x260
[] ra_submit+0x21/0x30
[] ondemand_readahead+0x166/0x2c0
[] page_cache_async_readahead+0x80/0xa0
[] generic_file_aio_read+0x364/0x670
[] nfs_file_read+0xca/0x130
[] do_sync_read+0xfa/0x140
[] vfs_read+0xb5/0x1a0
[] sys_read+0x51/0x80
[] system_call_fastpath+0x16/0x1b
RIP [] get_page_from_freelist+0x883/0x900
RSP
---[ end trace 4bda28328b9990db ]

[akpm@linux-foundation.org: merge fix]
Signed-off-by: Haicheng Li
Signed-off-by: Wu Fengguang
Reviewed-by: Andi Kleen
Reviewed-by: Christoph Lameter
Cc: Mel Gorman
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Haicheng Li
2010-05-25 23:07:01 +0800
cf23422b9 cpu/mem hotplug: enable CPUs online before local memory online ... Browse Code »

Enable users to online CPUs even if the CPUs belongs to a numa node which
doesn't have onlined local memory.

The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
in system boot/init period, or at the time of local memory online. For a
numa node without onlined local memory, its zonelists are not initialized
at present. As a result, any memory allocation operations executed by
CPUs within this node will fail. In fact, an out-of-memory error is
triggered when attempt to online CPUs before memory comes to online.

This patch tries to create zonelists for such numa nodes, so that the
memory allocation for this node can be fallback'ed to other nodes.

[akpm@linux-foundation.org: remove unneeded export]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: minskey guo
Cc: Minchan Kim
Cc: Yasunori Goto
Cc: Andi Kleen
Cc: Christoph Lameter
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

minskey guo
2010-05-25 23:07:00 +0800

13 Mar, 2010

1 commit

718a38211 mm: introduce dump_page() and print symbolic flag names ... Browse Code »

- introduce dump_page() to print the page info for debugging some error
condition.

- convert three mm users: bad_page(), print_bad_pte() and memory offline
failure.

- print an extra field: the symbolic names of page->flags

Example dump_page() output:

[ 157.521694] page:ffffea0000a7cba8 count:2 mapcount:1 mapping:ffff88001c901791 index:0x147
[ 157.525570] page flags: 0x100000000100068(uptodate|lru|active|swapbacked)

Signed-off-by: Wu Fengguang
Cc: Ingo Molnar
Cc: Alex Chiang
Cc: Rik van Riel
Cc: Andi Kleen
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2010-03-13 07:52:28 +0800

07 Mar, 2010

1 commit

d96ae5309 memory-hotplug: create /sys/firmware/memmap entry for new memory ... Browse Code »

A memmap is a directory in sysfs which includes 3 text files: start, end
and type. For example:

start: 0x100000
end: 0x7e7b1cff
type: System RAM

Interface firmware_map_add was not called explicitly. Remove it and add
function firmware_map_add_hotplug as hotplug interface of memmap.

Each memory entry has a memmap in sysfs, When we hot-add new memory, sysfs
does not export memmap entry for it. We add a call in function add_memory
to function firmware_map_add_hotplug.

Add a new function add_sysfs_fw_map_entry() to create memmap entry, it
will be called when initialize memmap and hot-add memory.

[akpm@linux-foundation.org: un-kernedoc a no longer kerneldoc comment]
Signed-off-by: Shaohui Zheng
Acked-by: Andi Kleen
Acked-by: Yasunori Goto
Reviewed-by: Wu Fengguang
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

akpm@linux-foundation.org
2010-03-07 03:26:25 +0800

16 Dec, 2009

5 commits

23ce932a5 mm: fix section mismatch in memory_hotplug.c ... Browse Code »

__free_pages_bootmem() is a __meminit function - which has been called
from put_pages_bootmem thus causes a section mismatch warning.

We were warned by the following warning:

LD mm/built-in.o
WARNING: mm/built-in.o(.text+0x26b22): Section mismatch in reference
from the function put_page_bootmem() to the function
.meminit.text:__free_pages_bootmem()
The function put_page_bootmem() references
the function __meminit __free_pages_bootmem().
This is often because put_page_bootmem lacks a __meminit
annotation or the annotation of __free_pages_bootmem is wrong.

Signed-off-by: Rakib Mullick
Cc: Yasunori Goto
Cc: Badari Pulavarty
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rakib Mullick
2009-12-16 00:53:20 +0800
b4e655a4a mm: memory_hotplug: make offline_pages() static ... Browse Code »

It has no references outside memory_hotplug.c.

Cc: "Rafael J. Wysocki"
Cc: Andi Kleen
Cc: Gerald Schaefer
Cc: KOSAKI Motohiro
Cc: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2009-12-16 00:53:20 +0800
62b61f611 ksm: memory hotremove migration only ... Browse Code »

The previous patch enables page migration of ksm pages, but that soon gets
into trouble: not surprising, since we're using the ksm page lock to lock
operations on its stable_node, but page migration switches the page whose
lock is to be used for that. Another layer of locking would fix it, but
do we need that yet?

Do we actually need page migration of ksm pages? Yes, memory hotremove
needs to offline sections of memory: and since we stopped allocating ksm
pages with GFP_HIGHUSER, they will tend to be GFP_HIGHUSER_MOVABLE
candidates for migration.

But KSM is currently unconscious of NUMA issues, happily merging pages
from different NUMA nodes: at present the rule must be, not to use
MADV_MERGEABLE where you care about NUMA. So no, NUMA page migration of
ksm pages does not make sense yet.

So, to complete support for ksm swapping we need to make hotremove safe.
ksm_memory_callback() take ksm_thread_mutex when MEM_GOING_OFFLINE and
release it when MEM_OFFLINE or MEM_CANCEL_OFFLINE. But if mapped pages
are freed before migration reaches them, stable_nodes may be left still
pointing to struct pages which have been removed from the system: the
stable_node needs to identify a page by pfn rather than page pointer, then
it can safely prune them when MEM_OFFLINE.

And make NUMA migration skip PageKsm pages where it skips PageReserved.
But it's only when we reach unmap_and_move() that the page lock is taken
and we can be sure that raised pagecount has prevented a PageAnon from
being upgraded: so add offlining arg to migrate_pages(), to migrate ksm
page when offlining (has sufficient locking) but reject it otherwise.

Signed-off-by: Hugh Dickins
Cc: Izik Eidus
Cc: Andrea Arcangeli
Cc: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-12-16 00:53:20 +0800
8fe23e057 mm: clear node in N_HIGH_MEMORY and stop kswapd when all memory is offlined ... Browse Code »

When memory is hot-removed, its node must be cleared in N_HIGH_MEMORY if
there are no present pages left.

In such a situation, kswapd must also be stopped since it has nothing left
to do.

Signed-off-by: David Rientjes
Signed-off-by: Lee Schermerhorn
Cc: Christoph Lameter
Cc: Yasunori Goto
Cc: Mel Gorman
Cc: Rafael J. Wysocki
Cc: Rik van Riel
Cc: KAMEZAWA Hiroyuki
Cc: Lee Schermerhorn
Cc: Mel Gorman
Cc: Randy Dunlap
Cc: Nishanth Aravamudan
Cc: Andi Kleen
Cc: David Rientjes
Cc: Adam Litke
Cc: Andy Whitcroft
Cc: Eric Whitney
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2009-12-16 00:53:13 +0800
6d9c285a6 mm: move inc_zone_page_state(NR_ISOLATED) to just isolated place ... Browse Code »

Christoph pointed out inc_zone_page_state(NR_ISOLATED) should be placed
in right after isolate_page().

This patch does it.

Reviewed-by: Christoph Lameter
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-12-16 00:53:12 +0800

18 Nov, 2009

2 commits

6ad696d2c mm: allow memory hotplug and hibernation in the same kernel ... Browse Code »
44

Allow memory hotplug and hibernation in the same kernel

Memory hotplug and hibernation were exclusive in Kconfig. This is
obviously a problem for distribution kernels who want to support both in
the same image.

After some discussions with Rafael and others the only problem is with
parallel memory hotadd or removal while a hibernation operation is in
process. It was also working for s390 before.

This patch removes the Kconfig level exclusion, and simply makes the
memory add / remove functions grab the pm_mutex to exclude against
hibernation.

Fixes a regression - old kernels didn't exclude memory hotadd and
hibernation.

Signed-off-by: Andi Kleen
Cc: Gerald Schaefer
Cc: KOSAKI Motohiro
Cc: Yasunori Goto
Acked-by: Rafael J. Wysocki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2009-11-18 09:40:33 +0800
e13193319 mm/memory_hotplug: fix section mismatch ... Browse Code »

With CONFIG_MEMORY_HOTPLUG I got following warning:

WARNING: vmlinux.o(.text+0x1276b0): Section mismatch in reference from
the function hotadd_new_pgdat() to the function
.meminit.text:free_area_init_node()
The function hotadd_new_pgdat() references
the function __meminit free_area_init_node().
This is often because hotadd_new_pgdat lacks a __meminit
annotation or the annotation of free_area_init_node is wrong.

Use __ref to fix this.

Signed-off-by: Hidetoshi Seto
Cc: Minchan Kim
Cc: Mel Gorman
Cc: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hidetoshi Seto
2009-11-18 09:40:33 +0800

23 Sep, 2009

1 commit

908eedc61 walk system ram range ... Browse Code »

Originally, walk_memory_resource() was introduced to traverse all memory
of "System RAM" for detecting memory hotplug/unplug range. For doing so,
flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
memory hotplug.

But for using other purpose, /proc/kcore, this may includes some firmware
area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the
check strict to find out busy "System RAM".

Note: PPC64 keeps their own walk_memory_resouce(), which walk through
ppc64's lmb informaton. Because old kclist_add() is called per lmb, this
patch makes no difference in behavior, finally.

And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
Because pfn_valid() just show "there is memmap or not* and cannot be used
for "there is physical memory or not", this function is useful in generic
to scan physical memory range.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: WANG Cong
Cc: Américo Wang
Cc: David Rientjes
Cc: Roland Dreier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-09-23 22:39:41 +0800

22 Sep, 2009

2 commits

4738e1b9c memory hotplug: fix updating of num_physpages for hot plugged memory ... Browse Code »

Sizing of memory allocations shouldn't depend on the number of physical
pages found in a system, as that generally includes (perhaps a huge amount
of) non-RAM pages. The amount of what actually is usable as storage
should instead be used as a basis here.

In line with that, the memory hotplug code should update num_physpages in
a way that it retains its original (post-boot) meaning; in particular,
decreasing the value should at best be done with great care - this patch
doesn't try to ever decrease this value at all as it doesn't really seem
meaningful to do so.

Signed-off-by: Jan Beulich
Acked-by: Rusty Russell
Cc: Yasunori Goto
Cc: Badari Pulavarty
Cc: Minchan Kim
Cc: Mel Gorman
Cc: Dave Hansen
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2009-09-22 22:17:38 +0800
112067f09 memory hotplug: update zone pcp at memory online ... Browse Code »

In my test, 128M memory is hot added, but zone's pcp batch is 0, which is
an obvious error. When pages are onlined, zone pcp should be updated
accordingly.

[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Shaohua Li
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Yakui Zhao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2009-09-22 22:17:25 +0800

17 Jun, 2009

2 commits

bce7394a3 page-allocator: reset wmark_min and inactive ratio of zone when hotplug happens ... Browse Code »

Solve two problems.

Whenever memory hotplug sucessfully happens, zone->present_pages
have to be changed.

1) Now memory hotplug calls setup_per_zone_wmark_min only when
online_pages called, not offline_pages.

It breaks balance.

2) If zone->present_pages is changed, we also have to change
zone->inactive_ratio. That's because inactive_ratio depends on
zone->present_pages.

Signed-off-by: Minchan Kim
Acked-by: Yasunori Goto
Cc: Rik van Riel
Cc: KOSAKI Motohiro
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2009-06-17 10:47:42 +0800
bc75d33f0 page-allocator: clean up functions related to pages_min ... Browse Code »

Change the names of two functions. It doesn't affect behavior.

Presently, setup_per_zone_pages_min() changes low, high of zone as well as
min. So a better name is setup_per_zone_wmarks(). That's because Mel
changed zone->pages_[hig/low/min] to zone->watermark array in "page
allocator: replace the watermark-related union in struct zone with a
watermark[] array".

* setup_per_zone_pages_min => setup_per_zone_wmarks

Of course, we have to change init_per_zone_pages_min, too. There are not
pages_min any more.

* init_per_zone_pages_min => init_per_zone_wmark_min

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Minchan Kim
Acked-by: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Rik van Riel
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2009-06-17 10:47:41 +0800

07 Jan, 2009

2 commits

3c1d43787 mm: remove GFP_HIGHUSER_PAGECACHE ... Browse Code »

GFP_HIGHUSER_PAGECACHE is just an alias for GFP_HIGHUSER_MOVABLE, making
that harder to track down: remove it, and its out-of-work brothers
GFP_NOFS_PAGECACHE and GFP_USER_PAGECACHE.

Since we're making that improvement to hotremove_migrate_alloc(), I think
we can now also remove one of the "o"s from its comment.

Signed-off-by: Hugh Dickins
Acked-by: Mel Gorman
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-01-07 07:59:01 +0800
c04fc586c mm: show node to memory section relationship with symlinks in sysfs ... Browse Code »

Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX. For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
- Provides information needed to determine the specific node
on which a defective DIMM is located. This will reduce system
downtime when the node or defective DIMM is swapped out.
- Prevents unintended onlining of a memory section that was
previously offlined due to a defective DIMM. This could happen
during node hot-add when the user or node hot-add assist script
onlines _all_ offlined sections due to user or script inability
to identify the specific memory sections located on the hot-added
node. The consequences of reintroducing the defective memory
could be ugly.
- Provides information needed to vary the amount and distribution
of memory on specific nodes for testing or debugging purposes.
Future:
- Will provide information needed to identify the memory
sections that need to be offlined prior to physical removal
of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems. Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.

Signed-off-by: Gary Hade
Signed-off-by: Badari Pulavarty
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gary Hade
2009-01-07 07:59:00 +0800

01 Dec, 2008

1 commit

31168481c meminit section warnings ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2008-12-01 02:03:35 +0800

20 Nov, 2008

1 commit

f481891fd cpuset: update top cpuset's mems after adding a node ... Browse Code »

After adding a node into the machine, top cpuset's mems isn't updated.

By reviewing the code, we found that the update function

cpuset_track_online_nodes()

was invoked after node_states[N_ONLINE] changes. It is wrong because
N_ONLINE just means node has pgdat, and if node has/added memory, we use
N_HIGH_MEMORY. So, We should invoke the update function after
node_states[N_HIGH_MEMORY] changes, just like its commit says.

This patch fixes it. And we use notifier of memory hotplug instead of
direct calling of cpuset_track_online_nodes().

Signed-off-by: Miao Xie
Acked-by: Yasunori Goto
Cc: David Rientjes
Cc: Paul Menage
Signed-off-by: Linus Torvalds

Miao Xie
2008-11-20 10:49:58 +0800

20 Oct, 2008

3 commits

de7f0cba9 memory hotplug: release memory regions in PAGES_PER_SECTION chunks ... Browse Code »

During hotplug memory remove, memory regions should be released on a
PAGES_PER_SECTION size chunks. This mirrors the code in add_memory where
resources are requested on a PAGES_PER_SECTION size.

Attempting to release the entire memory region fails because there is not
a single resource for the total number of pages being removed. Instead
the resources for the pages are split in PAGES_PER_SECTION size chunks as
requested during memory add.

Signed-off-by: Nathan Fontenot
Signed-off-by: Badari Pulavarty
Acked-by: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nathan Fontenot
2008-10-20 23:52:32 +0800
62695a84e vmscan: move isolate_lru_page() to vmscan.c ... Browse Code »

On large memory systems, the VM can spend way too much time scanning
through pages that it cannot (or should not) evict from memory. Not only
does it use up CPU time, but it also provokes lock contention and can
leave large systems under memory presure in a catatonic state.

This patch series improves VM scalability by:

1) putting filesystem backed, swap backed and unevictable pages
onto their own LRUs, so the system only scans the pages that it
can/should evict from memory

2) switching to two handed clock replacement for the anonymous LRUs,
so the number of pages that need to be scanned when the system
starts swapping is bound to a reasonable number

3) keeping unevictable pages off the LRU completely, so the
VM does not waste CPU time scanning them. ramfs, ramdisk,
SHM_LOCKED shared memory segments and mlock()ed VMA pages
are keept on the unevictable list.

This patch:

isolate_lru_page logically belongs to be in vmscan.c than migrate.c.

It is tough, because we don't need that function without memory migration
so there is a valid argument to have it in migrate.c. However a
subsequent patch needs to make use of it in the core mm, so we can happily
move it to vmscan.c.

Also, make the function a little more generic by not requiring that it
adds an isolated page to a given list. Callers can do that.

Note that we now have '__isolate_lru_page()', that does
something quite different, visible outside of vmscan.c
for use with memory controller. Methinks we need to
rationalize these names/purposes. --lts

[akpm@linux-foundation.org: fix mm/memory_hotplug.c build]
Signed-off-by: Nick Piggin
Signed-off-by: Rik van Riel
Signed-off-by: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2008-10-20 23:50:25 +0800
71088785c mm: cleanup to make remove_memory() arch-neutral ... Browse Code »

There is nothing architecture specific about remove_memory().
remove_memory() function is common for all architectures which support
hotplug memory remove. Instead of duplicating it in every architecture,
collapse them into arch neutral function.

[akpm@linux-foundation.org: fix the export]
Signed-off-by: Badari Pulavarty
Cc: Yasunori Goto
Cc: Gary Hade
Cc: Mel Gorman
Cc: Yasunori Goto
Cc: "Luck, Tony"
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2008-10-20 23:50:25 +0800

25 Jul, 2008

5 commits

5c755e9fd memory-hotplug: add sysfs removable attribute for hotplug memory remove ... Browse Code »

Memory may be hot-removed on a per-memory-block basis, particularly on
POWER where the SPARSEMEM section size often matches the memory-block
size. A user-level agent must be able to identify which sections of
memory are likely to be removable before attempting the potentially
expensive operation. This patch adds a file called "removable" to the
memory directory in sysfs to help such an agent. In this patch, a memory
block is considered removable if;

o It contains only MOVABLE pageblocks
o It contains only pageblocks with free pages regardless of pageblock type

On the other hand, a memory block starting with a PageReserved() page will
never be considered removable. Without this patch, the user-agent is
forced to choose a memory block to remove randomly.

Sample output of the sysfs files:

./memory/memory0/removable: 0
./memory/memory1/removable: 0
./memory/memory2/removable: 0
./memory/memory3/removable: 0
./memory/memory4/removable: 0
./memory/memory5/removable: 0
./memory/memory6/removable: 0
./memory/memory7/removable: 1
./memory/memory8/removable: 0
./memory/memory9/removable: 0
./memory/memory10/removable: 0
./memory/memory11/removable: 0
./memory/memory12/removable: 0
./memory/memory13/removable: 0
./memory/memory14/removable: 0
./memory/memory15/removable: 0
./memory/memory16/removable: 0
./memory/memory17/removable: 1
./memory/memory18/removable: 1
./memory/memory19/removable: 1
./memory/memory20/removable: 1
./memory/memory21/removable: 1
./memory/memory22/removable: 1

Signed-off-by: Badari Pulavarty
Signed-off-by: Mel Gorman
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2008-07-25 01:47:21 +0800
2f7f24eca memory-hotplug: don't calculate vm_total_pages twice when rebuilding zonelists in online_pages() ... Browse Code »

If zonelist is required to be rebuilt in online_pages(), there is no need
to recalculate vm_total_pages in that function, as it has been updated in
the call build_all_zonelists().

Signed-off-by: Kent Liu
Acked-by: KAMEZAWA Hiroyuki
Cc: Yasunori Goto
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Liu
2008-07-25 01:47:21 +0800
af370fb8c memory hotplug: small fixes to bootmem freeing for memory hotremove ... Browse Code »

- Change some naming
* Magic -> types
* MIX_INFO -> MIX_SECTION_INFO
* Change definition of bootmem type from direct hex value

- __free_pages_bootmem() becomes __meminit.

Signed-off-by: Yasunori Goto
Cc: Andy Whitcroft
Cc: Badari Pulavarty
Cc: Yinghai Lu
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2008-07-25 01:47:21 +0800
d92bc3185 mm: make register_page_bootmem_info_section() static ... Browse Code »

Make the needlessly global register_page_bootmem_info_section() static.

Signed-off-by: Adrian Bunk
Acked-by: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-07-25 01:47:21 +0800
9109fb7b3 mm: drop unneeded pgdat argument from free_area_init_node() ... Browse Code »

free_area_init_node() gets passed in the node id as well as the node
descriptor. This is redundant as the function can trivially get the node
descriptor itself by means of NODE_DATA() and the node's id.

I checked all the users and NODE_DATA() seems to be usable everywhere
from where this function is called.

Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2008-07-25 01:47:16 +0800

15 May, 2008

3 commits

76cdd58e5 memory_hotplug: always initialize pageblock bitmap ... Browse Code »

Trying to online a new memory section that was added via memory hotplug
sometimes results in crashes when the new pages are added via __free_page.
Reason for that is that the pageblock bitmap isn't initialized and hence
contains random stuff. That means that get_pageblock_migratetype()
returns also random stuff and therefore

list_add(&page->lru,
&zone->free_area[order].free_list[migratetype]);

in __free_one_page() tries to do a list_add to something that isn't even
necessarily a list.

This happens since 86051ca5eaf5e560113ec7673462804c54284456 ("mm: fix
usemap initialization") which makes sure that the pageblock bitmap gets
only initialized for pages present in a zone. Unfortunately for hot-added
memory the zones "grow" after the memmap and the pageblock memmap have
been initialized. Which means that the new pages have an unitialized
bitmap. To solve this the calls to grow_zone_span() and grow_pgdat_span()
are moved to __add_zone() just before the initialization happens.

The patch also moves the two functions since __add_zone() is the only
caller and I didn't want to add a forward declaration.

Signed-off-by: Heiko Carstens
Cc: Andy Whitcroft
Cc: Dave Hansen
Cc: Gerald Schaefer
Cc: KAMEZAWA Hiroyuki
Cc: Yasunori Goto
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2008-05-15 10:11:15 +0800
fd8a4221a memory_hotplug: check for walk_memory_resource() failure in online_pages() ... Browse Code »

Add a check to online_pages() to test for failure of
walk_memory_resource(). This fixes a condition where a failure
of walk_memory_resource() can lead to online_pages() returning
success without the requested pages being onlined.

Signed-off-by: Geoff Levand
Cc: Yasunori Goto
Cc: KAMEZAWA Hiroyuki
Cc: Dave Hansen
Cc: Keith Mannthey
Cc: Christoph Lameter
Cc: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Geoff Levand
2008-05-15 10:11:14 +0800
c3723ca38 memory hotplug: memmap_init_zone called twice ... Browse Code »

__add_zone calls memmap_init_zone twice if memory gets attached to an empty
zone. Once via init_currently_empty_zone and once explictly right after that
call.

Looks like this is currently not a bug, however the call is superfluous and
might lead to subtle bugs if memmap_init_zone gets changed. So make sure it
is called only once.

Cc: Yasunori Goto
Acked-by: KAMEZAWA Hiroyuki
Cc: Dave Hansen
Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2008-05-15 10:11:14 +0800

29 Apr, 2008

1 commit

1e5ad9a3b mm/memory_hotplug.c must #include "internal.h" ... Browse Code »

This patch fixes the following compile error caused by commit
04753278769f3b6c3b79a080edb52f21d83bf6e2 ("memory hotplug: register
section/node id to free"):

CC mm/memory_hotplug.o
/home/bunk/linux/kernel-2.6/git/linux-2.6/mm/memory_hotplug.c: In function ‘put_page_bootmem’:
/home/bunk/linux/kernel-2.6/git/linux-2.6/mm/memory_hotplug.c:82: error: implicit declaration of function ‘__free_pages_bootmem’
/home/bunk/linux/kernel-2.6/git/linux-2.6/mm/memory_hotplug.c: At top level:
/home/bunk/linux/kernel-2.6/git/linux-2.6/mm/memory_hotplug.c:87: warning: no previous prototype for ‘register_page_bootmem_info_section’
make[2]: *** [mm/memory_hotplug.o] Error 1

[ Andrew: "Argh. The -mm-only memory-hotplug-add-removable-to-sysfs-
to-show-memblock-removability.patch debugging patch adds that include
so nobody hit this before. ]

Signed-off-by: Adrian Bunk
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-04-29 04:44:29 +0800

28 Apr, 2008

4 commits

0c0a4a517 memory hotplug: free memmaps allocated by bootmem ... Browse Code »

This patch is to free memmaps which is allocated by bootmem.

Freeing usemap is not necessary. The pages of usemap may be necessary for
other sections.

If removing section is last section on the node, its section is the final user
of usemap page. (usemaps are allocated on its section by previous patch.) But
it shouldn't be freed too, because the section must be logical offline state
which all pages are isolated against page allocater. If it is freed, page
alloctor may use it which will be removed physically soon. It will be
disaster. So, this patch keeps it as it is.

Signed-off-by: Yasunori Goto
Cc: Badari Pulavarty
Cc: Yinghai Lu
Cc: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2008-04-28 23:58:26 +0800
047532787 memory hotplug: register section/node id to free ... Browse Code »

This patch set is to free pages which is allocated by bootmem for
memory-hotremove. Some structures of memory management are allocated by
bootmem. ex) memmap, etc.

To remove memory physically, some of them must be freed according to
circumstance. This patch set makes basis to free those pages, and free
memmaps.

Basic my idea is using remain members of struct page to remember information
of users of bootmem (section number or node id). When the section is
removing, kernel can confirm it. By this information, some issues can be
solved.

1) When the memmap of removing section is allocated on other
section by bootmem, it should/can be free.
2) When the memmap of removing section is allocated on the
same section, it shouldn't be freed. Because the section has to be
logical memory offlined already and all pages must be isolated against
page allocater. If it is freed, page allocator may use it which will
be removed physically soon.
3) When removing section has other section's memmap,
kernel will be able to show easily which section should be removed
before it for user. (Not implemented yet)
4) When the above case 2), the page isolation will be able to check and skip
memmap's page when logical memory offline (offline_pages()).
Current page isolation code fails in this case because this page is
just reserved page and it can't distinguish this pages can be
removed or not. But, it will be able to do by this patch.
(Not implemented yet.)
5) The node information like pgdat has similar issues. But, this
will be able to be solved too by this.
(Not implemented yet, but, remembering node id in the pages.)

Fortunately, current bootmem allocator just keeps PageReserved flags,
and doesn't use any other members of page struct. The users of
bootmem doesn't use them too.

This patch:

This is to register information which is node or section's id. Kernel can
distinguish which node/section uses the pages allcated by bootmem. This is
basis for hot-remove sections or nodes.

Signed-off-by: Yasunori Goto
Cc: Badari Pulavarty
Cc: Yinghai Lu
Cc: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2008-04-28 23:58:25 +0800
180c06efc hotplug-memory: make online_page() common ... Browse Code »

All architectures use an effectively identical definition of online_page(), so
just make it common code. x86-64, ia64, powerpc and sh are actually
identical; x86-32 is slightly different.

x86-32's differences arise because it puts its hotplug pages in the highmem
zone. We can handle this in the generic code by inspecting the page to see if
its in highmem, and update the totalhigh_pages count appropriately. This
leaves init_32.c:free_new_highpage with a single caller, so I folded it into
add_one_highpage_init.

I also removed an incorrect comment referring to the NUMA case; any NUMA
details have already been dealt with by the time online_page() is called.

[akpm@linux-foundation.org: fix indenting]
Signed-off-by: Jeremy Fitzhardinge
Acked-by: Dave Hansen
Reviewed-by: KAMEZAWA Hiroyuki
Tested-by: KAMEZAWA Hiroyuki
Cc: Yasunori Goto
Cc: Christoph Lameter
Acked-by: Ingo Molnar
Acked-by: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeremy Fitzhardinge
2008-04-28 23:58:17 +0800
ea01ea937 hotplug memory remove: generic __remove_pages() support ... Browse Code »

Generic helper function to remove section mappings and sysfs entries for the
section of the memory we are removing. offline_pages() correctly adjusted
zone and marked the pages reserved.

TODO: Yasunori Goto is working on patches to free up allocations from bootmem.

Signed-off-by: Badari Pulavarty
Acked-by: Yasunori Goto
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2008-04-28 23:58:17 +0800

20 Apr, 2008

1 commit

da19cbcf7 driver core: memory: semaphore to mutex ... Browse Code »

Signed-off-by: Daniel Walker
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman

Daniel Walker
2008-04-20 10:10:19 +0800

06 Feb, 2008

1 commit

9f8f21725 Page allocator: clean up pcp draining functions ... Browse Code »

- Add comments explaing how drain_pages() works.

- Eliminate useless functions

- Rename drain_all_local_pages to drain_all_pages(). It does drain
all pages not only those of the local processor.

- Eliminate useless interrupt off / on sequences. drain_pages()
disables interrupts on its own. The execution thread is
pinned to processor by the caller. So there is no need to
disable interrupts.

- Put drain_all_pages() declaration in gfp.h and remove the
declarations from suspend.h and from mm/memory_hotplug.c

- Make software suspend call drain_all_pages(). The draining
of processor local pages is may not the right approach if
software suspend wants to support SMP. If they call drain_all_pages
then we can make drain_pages() static.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Christoph Lameter
Acked-by: Mel Gorman
Cc: "Rafael J. Wysocki"
Cc: Daniel Walker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2008-02-06 01:44:17 +0800

15 Nov, 2007

1 commit

887c3cb18 Add IORESOUCE_BUSY flag for System RAM ... Browse Code »

i386 and x86-64 registers System RAM as IORESOURCE_MEM | IORESOURCE_BUSY.

But ia64 registers it as IORESOURCE_MEM only.
In addition, memory hotplug code registers new memory as IORESOURCE_MEM too.

This difference causes a failure of memory unplug of x86-64. This patch
fixes it.

This patch adds IORESOURCE_BUSY to avoid potential overlap mapping by PCI
device.

Signed-off-by: Yasunori Goto
Signed-off-by: Badari Pulavarty
Cc: Luck, Tony"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2007-11-15 10:45:39 +0800