Doug / smarc-fsl-linux-kernel | Embedian Git Server

03 Dec, 2010

1 commit

20d6c96b5 mem-hotplug: introduce {un}lock_memory_hotplug() ... Browse Code »

Presently hwpoison is using lock_system_sleep() to prevent a race with
memory hotplug. However lock_system_sleep() is a no-op if
CONFIG_HIBERNATION=n. Therefore we need a new lock.

Signed-off-by: KOSAKI Motohiro
Cc: Andi Kleen
Cc: Kamezawa Hiroyuki
Suggested-by: Hugh Dickins
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-12-03 06:51:15 +0800

27 Oct, 2010

6 commits

f3ab2636c mm: do_migrate_range: reduce list_empty() check ... Browse Code »

Simple code for reducing list_empty(&source) check.

Signed-off-by: Bob Liu
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Wu Fengguang
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2010-10-27 07:52:11 +0800
809c44497 mm: do_migrate_range: exit loop if not_managed is true ... Browse Code »

If not_managed is true all pages will be putback to lru, so break the loop
earlier to skip other pages isolate.

Signed-off-by: Bob Liu
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Wu Fengguang
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2010-10-27 07:52:11 +0800
7bbc0905e mm/memory_hotplug.c: make scan_lru_pages() static ... Browse Code »

Reported-by: KOSAKI Motohiro
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2010-10-27 07:52:10 +0800
49ac82558 memory hotplug: unify is_removable and offline detection code ... Browse Code »

Now, sysfs interface of memory hotplug shows whether the section is
removable or not. But it checks only migrateype of pages and doesn't
check details of cluster of pages.

Next, memory hotplug's set_migratetype_isolate() has the same kind of
check, too.

This patch adds the function __count_unmovable_pages() and makes above 2
checks to use the same logic. Then, is_removable and hotremove code uses
the same logic. No changes in the hotremove logic itself.

TODO: need to find a way to check RECLAMABLE. But, considering bit,
calling shrink_slab() against a range before starting memory hotremove
sounds better. If so, this patch's logic doesn't need to be changed.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Reported-by: Michal Hocko
Cc: Wu Fengguang
Cc: Mel Gorman
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-27 07:52:06 +0800
cf608ac19 mm: compaction: fix COMPACTPAGEFAILED counting ... Browse Code »

Presently update_nr_listpages() doesn't have a role. That's because lists
passed is always empty just after calling migrate_pages. The
migrate_pages cleans up page list which have failed to migrate before
returning by aaa994b3.

[PATCH] page migration: handle freeing of pages in migrate_pages()

Do not leave pages on the lists passed to migrate_pages(). Seems that we will
not need any postprocessing of pages. This will simplify the handling of
pages by the callers of migrate_pages().

At that time, we thought we don't need any postprocessing of pages. But
the situation is changed. The compaction need to know the number of
failed to migrate for COMPACTPAGEFAILED stat

This patch makes new rule for caller of migrate_pages to call
putback_lru_pages. So caller need to clean up the lists so it has a
chance to postprocess the pages. [suggested by Christoph Lameter]

Signed-off-by: Minchan Kim
Cc: Hugh Dickins
Cc: Andi Kleen
Reviewed-by: Mel Gorman
Reviewed-by: Wu Fengguang
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2010-10-27 07:52:06 +0800
f8f72ad53 mm: fix return value of scan_lru_pages in memory unplug ... Browse Code »

scan_lru_pages returns pfn. So, it's type should be "unsigned long"
not "int".

Note: I guess this has been work until now because memory hotplug tester's
machine has not very big memory....
physical address < 32bit << PAGE_SHIFT.

Reported-by: KOSAKI Motohiro
Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: KOSAKI Motohiro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-27 07:52:03 +0800

19 Oct, 2010

1 commit

10ccd8469 memory_hotplug: drop spurious calls to flush_scheduled_work() ... Browse Code »

lru_add_drain_all() uses schedule_on_each_cpu() which is synchronous.
There is no reason to call flush_scheduled_work() after
lru_add_drain_all(). Drop the spurious calls.

This is to prepare for the deprecation and removal of
flush_scheduled_work().

Signed-off-by: Tejun Heo
Acked-by: KAMEZAWA Hiroyuki
Reviewed-by: Minchan Kim
Acked-by: Mel Gorman

Tejun Heo
2010-10-19 17:08:41 +0800

10 Sep, 2010

1 commit

0dcc48c15 memory hotplug: fix next block calculation in is_removable ... Browse Code »

next_active_pageblock() is for finding next _used_ freeblock. It skips
several blocks when it finds there are a chunk of free pages lager than
pageblock. But it has 2 bugs.

1. We have no lock. page_order(page) - pageblock_order can be minus.
2. pageblocks_stride += is wrong. it should skip page_order(p) of pages.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Michal Hocko
Cc: Wu Fengguang
Cc: Mel Gorman
Cc: KAMEZAWA Hiroyuki
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-09-10 09:57:24 +0800

25 May, 2010

3 commits

4eaf3f643 mem-hotplug: fix potential race while building zonelist for new populated zone ... Browse Code »

Add global mutex zonelists_mutex to fix the possible race:

CPU0 CPU1 CPU2
(1) zone->present_pages += online_pages;
(2) build_all_zonelists();
(3) alloc_page();
(4) free_page();
(5) build_all_zonelists();
(6) __build_all_zonelists();
(7) zone->pageset = alloc_percpu();

In step (3,4), zone->pageset still points to boot_pageset, so bad
things may happen if 2+ nodes are in this state. Even if only 1 node
is accessing the boot_pageset, (3) may still consume too much memory
to fail the memory allocations in step (7).

Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
since there is a new fresh memory block added in step(6).

[haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
Signed-off-by: Haicheng Li
Signed-off-by: Wu Fengguang
Reviewed-by: Andi Kleen
Cc: Christoph Lameter
Cc: Mel Gorman
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Haicheng Li
2010-05-25 23:07:02 +0800
1f522509c mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset ... Browse Code »

For each new populated zone of hotadded node, need to update its pagesets
with dynamically allocated per_cpu_pageset struct for all possible CPUs:

1) Detach zone->pageset from the shared boot_pageset
at end of __build_all_zonelists().

2) Use mutex to protect zone->pageset when it's still
shared in onlined_pages()

Otherwises, multiple zones of different nodes would share same boot strapping
boot_pageset for same CPU, which will finally cause below kernel panic:

------------[ cut here ]------------
kernel BUG at mm/page_alloc.c:1239!
invalid opcode: 0000 [#1] SMP
...
Call Trace:
[] __alloc_pages_nodemask+0x131/0x7b0
[] alloc_pages_current+0x87/0xd0
[] __page_cache_alloc+0x67/0x70
[] __do_page_cache_readahead+0x120/0x260
[] ra_submit+0x21/0x30
[] ondemand_readahead+0x166/0x2c0
[] page_cache_async_readahead+0x80/0xa0
[] generic_file_aio_read+0x364/0x670
[] nfs_file_read+0xca/0x130
[] do_sync_read+0xfa/0x140
[] vfs_read+0xb5/0x1a0
[] sys_read+0x51/0x80
[] system_call_fastpath+0x16/0x1b
RIP [] get_page_from_freelist+0x883/0x900
RSP
---[ end trace 4bda28328b9990db ]

[akpm@linux-foundation.org: merge fix]
Signed-off-by: Haicheng Li
Signed-off-by: Wu Fengguang
Reviewed-by: Andi Kleen
Reviewed-by: Christoph Lameter
Cc: Mel Gorman
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Haicheng Li
2010-05-25 23:07:01 +0800
cf23422b9 cpu/mem hotplug: enable CPUs online before local memory online ... Browse Code »

Enable users to online CPUs even if the CPUs belongs to a numa node which
doesn't have onlined local memory.

The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
in system boot/init period, or at the time of local memory online. For a
numa node without onlined local memory, its zonelists are not initialized
at present. As a result, any memory allocation operations executed by
CPUs within this node will fail. In fact, an out-of-memory error is
triggered when attempt to online CPUs before memory comes to online.

This patch tries to create zonelists for such numa nodes, so that the
memory allocation for this node can be fallback'ed to other nodes.

[akpm@linux-foundation.org: remove unneeded export]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: minskey guo
Cc: Minchan Kim
Cc: Yasunori Goto
Cc: Andi Kleen
Cc: Christoph Lameter
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

minskey guo
2010-05-25 23:07:00 +0800

13 Mar, 2010

1 commit

718a38211 mm: introduce dump_page() and print symbolic flag names ... Browse Code »

- introduce dump_page() to print the page info for debugging some error
condition.

- convert three mm users: bad_page(), print_bad_pte() and memory offline
failure.

- print an extra field: the symbolic names of page->flags

Example dump_page() output:

[ 157.521694] page:ffffea0000a7cba8 count:2 mapcount:1 mapping:ffff88001c901791 index:0x147
[ 157.525570] page flags: 0x100000000100068(uptodate|lru|active|swapbacked)

Signed-off-by: Wu Fengguang
Cc: Ingo Molnar
Cc: Alex Chiang
Cc: Rik van Riel
Cc: Andi Kleen
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2010-03-13 07:52:28 +0800

07 Mar, 2010

1 commit

d96ae5309 memory-hotplug: create /sys/firmware/memmap entry for new memory ... Browse Code »

A memmap is a directory in sysfs which includes 3 text files: start, end
and type. For example:

start: 0x100000
end: 0x7e7b1cff
type: System RAM

Interface firmware_map_add was not called explicitly. Remove it and add
function firmware_map_add_hotplug as hotplug interface of memmap.

Each memory entry has a memmap in sysfs, When we hot-add new memory, sysfs
does not export memmap entry for it. We add a call in function add_memory
to function firmware_map_add_hotplug.

Add a new function add_sysfs_fw_map_entry() to create memmap entry, it
will be called when initialize memmap and hot-add memory.

[akpm@linux-foundation.org: un-kernedoc a no longer kerneldoc comment]
Signed-off-by: Shaohui Zheng
Acked-by: Andi Kleen
Acked-by: Yasunori Goto
Reviewed-by: Wu Fengguang
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

akpm@linux-foundation.org
2010-03-07 03:26:25 +0800

16 Dec, 2009

5 commits

23ce932a5 mm: fix section mismatch in memory_hotplug.c ... Browse Code »

__free_pages_bootmem() is a __meminit function - which has been called
from put_pages_bootmem thus causes a section mismatch warning.

We were warned by the following warning:

LD mm/built-in.o
WARNING: mm/built-in.o(.text+0x26b22): Section mismatch in reference
from the function put_page_bootmem() to the function
.meminit.text:__free_pages_bootmem()
The function put_page_bootmem() references
the function __meminit __free_pages_bootmem().
This is often because put_page_bootmem lacks a __meminit
annotation or the annotation of __free_pages_bootmem is wrong.

Signed-off-by: Rakib Mullick
Cc: Yasunori Goto
Cc: Badari Pulavarty
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rakib Mullick
2009-12-16 00:53:20 +0800
b4e655a4a mm: memory_hotplug: make offline_pages() static ... Browse Code »

It has no references outside memory_hotplug.c.

Cc: "Rafael J. Wysocki"
Cc: Andi Kleen
Cc: Gerald Schaefer
Cc: KOSAKI Motohiro
Cc: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2009-12-16 00:53:20 +0800
62b61f611 ksm: memory hotremove migration only ... Browse Code »

The previous patch enables page migration of ksm pages, but that soon gets
into trouble: not surprising, since we're using the ksm page lock to lock
operations on its stable_node, but page migration switches the page whose
lock is to be used for that. Another layer of locking would fix it, but
do we need that yet?

Do we actually need page migration of ksm pages? Yes, memory hotremove
needs to offline sections of memory: and since we stopped allocating ksm
pages with GFP_HIGHUSER, they will tend to be GFP_HIGHUSER_MOVABLE
candidates for migration.

But KSM is currently unconscious of NUMA issues, happily merging pages
from different NUMA nodes: at present the rule must be, not to use
MADV_MERGEABLE where you care about NUMA. So no, NUMA page migration of
ksm pages does not make sense yet.

So, to complete support for ksm swapping we need to make hotremove safe.
ksm_memory_callback() take ksm_thread_mutex when MEM_GOING_OFFLINE and
release it when MEM_OFFLINE or MEM_CANCEL_OFFLINE. But if mapped pages
are freed before migration reaches them, stable_nodes may be left still
pointing to struct pages which have been removed from the system: the
stable_node needs to identify a page by pfn rather than page pointer, then
it can safely prune them when MEM_OFFLINE.

And make NUMA migration skip PageKsm pages where it skips PageReserved.
But it's only when we reach unmap_and_move() that the page lock is taken
and we can be sure that raised pagecount has prevented a PageAnon from
being upgraded: so add offlining arg to migrate_pages(), to migrate ksm
page when offlining (has sufficient locking) but reject it otherwise.

Signed-off-by: Hugh Dickins
Cc: Izik Eidus
Cc: Andrea Arcangeli
Cc: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-12-16 00:53:20 +0800
8fe23e057 mm: clear node in N_HIGH_MEMORY and stop kswapd when all memory is offlined ... Browse Code »

When memory is hot-removed, its node must be cleared in N_HIGH_MEMORY if
there are no present pages left.

In such a situation, kswapd must also be stopped since it has nothing left
to do.

Signed-off-by: David Rientjes
Signed-off-by: Lee Schermerhorn
Cc: Christoph Lameter
Cc: Yasunori Goto
Cc: Mel Gorman
Cc: Rafael J. Wysocki
Cc: Rik van Riel
Cc: KAMEZAWA Hiroyuki
Cc: Lee Schermerhorn
Cc: Mel Gorman
Cc: Randy Dunlap
Cc: Nishanth Aravamudan
Cc: Andi Kleen
Cc: David Rientjes
Cc: Adam Litke
Cc: Andy Whitcroft
Cc: Eric Whitney
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2009-12-16 00:53:13 +0800
6d9c285a6 mm: move inc_zone_page_state(NR_ISOLATED) to just isolated place ... Browse Code »

Christoph pointed out inc_zone_page_state(NR_ISOLATED) should be placed
in right after isolate_page().

This patch does it.

Reviewed-by: Christoph Lameter
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-12-16 00:53:12 +0800

18 Nov, 2009

2 commits

6ad696d2c mm: allow memory hotplug and hibernation in the same kernel ... Browse Code »

Allow memory hotplug and hibernation in the same kernel

Memory hotplug and hibernation were exclusive in Kconfig. This is
obviously a problem for distribution kernels who want to support both in
the same image.

After some discussions with Rafael and others the only problem is with
parallel memory hotadd or removal while a hibernation operation is in
process. It was also working for s390 before.

This patch removes the Kconfig level exclusion, and simply makes the
memory add / remove functions grab the pm_mutex to exclude against
hibernation.

Fixes a regression - old kernels didn't exclude memory hotadd and
hibernation.

Signed-off-by: Andi Kleen
Cc: Gerald Schaefer
Cc: KOSAKI Motohiro
Cc: Yasunori Goto
Acked-by: Rafael J. Wysocki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2009-11-18 09:40:33 +0800
e13193319 mm/memory_hotplug: fix section mismatch ... Browse Code »

With CONFIG_MEMORY_HOTPLUG I got following warning:

WARNING: vmlinux.o(.text+0x1276b0): Section mismatch in reference from
the function hotadd_new_pgdat() to the function
.meminit.text:free_area_init_node()
The function hotadd_new_pgdat() references
the function __meminit free_area_init_node().
This is often because hotadd_new_pgdat lacks a __meminit
annotation or the annotation of free_area_init_node is wrong.

Use __ref to fix this.

Signed-off-by: Hidetoshi Seto
Cc: Minchan Kim
Cc: Mel Gorman
Cc: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hidetoshi Seto
2009-11-18 09:40:33 +0800

23 Sep, 2009

1 commit

908eedc61 walk system ram range ... Browse Code »

Originally, walk_memory_resource() was introduced to traverse all memory
of "System RAM" for detecting memory hotplug/unplug range. For doing so,
flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
memory hotplug.

But for using other purpose, /proc/kcore, this may includes some firmware
area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the
check strict to find out busy "System RAM".

Note: PPC64 keeps their own walk_memory_resouce(), which walk through
ppc64's lmb informaton. Because old kclist_add() is called per lmb, this
patch makes no difference in behavior, finally.

And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
Because pfn_valid() just show "there is memmap or not* and cannot be used
for "there is physical memory or not", this function is useful in generic
to scan physical memory range.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: WANG Cong
Cc: Américo Wang
Cc: David Rientjes
Cc: Roland Dreier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-09-23 22:39:41 +0800

22 Sep, 2009

2 commits

4738e1b9c memory hotplug: fix updating of num_physpages for hot plugged memory ... Browse Code »

Sizing of memory allocations shouldn't depend on the number of physical
pages found in a system, as that generally includes (perhaps a huge amount
of) non-RAM pages. The amount of what actually is usable as storage
should instead be used as a basis here.

In line with that, the memory hotplug code should update num_physpages in
a way that it retains its original (post-boot) meaning; in particular,
decreasing the value should at best be done with great care - this patch
doesn't try to ever decrease this value at all as it doesn't really seem
meaningful to do so.

Signed-off-by: Jan Beulich
Acked-by: Rusty Russell
Cc: Yasunori Goto
Cc: Badari Pulavarty
Cc: Minchan Kim
Cc: Mel Gorman
Cc: Dave Hansen
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2009-09-22 22:17:38 +0800
112067f09 memory hotplug: update zone pcp at memory online ... Browse Code »

In my test, 128M memory is hot added, but zone's pcp batch is 0, which is
an obvious error. When pages are onlined, zone pcp should be updated
accordingly.

[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Shaohua Li
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Yakui Zhao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2009-09-22 22:17:25 +0800

17 Jun, 2009

2 commits

bce7394a3 page-allocator: reset wmark_min and inactive ratio of zone when hotplug happens ... Browse Code »

Solve two problems.

Whenever memory hotplug sucessfully happens, zone->present_pages
have to be changed.

1) Now memory hotplug calls setup_per_zone_wmark_min only when
online_pages called, not offline_pages.

It breaks balance.

2) If zone->present_pages is changed, we also have to change
zone->inactive_ratio. That's because inactive_ratio depends on
zone->present_pages.

Signed-off-by: Minchan Kim
Acked-by: Yasunori Goto
Cc: Rik van Riel
Cc: KOSAKI Motohiro
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2009-06-17 10:47:42 +0800
bc75d33f0 page-allocator: clean up functions related to pages_min ... Browse Code »

Change the names of two functions. It doesn't affect behavior.

Presently, setup_per_zone_pages_min() changes low, high of zone as well as
min. So a better name is setup_per_zone_wmarks(). That's because Mel
changed zone->pages_[hig/low/min] to zone->watermark array in "page
allocator: replace the watermark-related union in struct zone with a
watermark[] array".

* setup_per_zone_pages_min => setup_per_zone_wmarks

Of course, we have to change init_per_zone_pages_min, too. There are not
pages_min any more.

* init_per_zone_pages_min => init_per_zone_wmark_min

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Minchan Kim
Acked-by: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Rik van Riel
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2009-06-17 10:47:41 +0800

07 Jan, 2009

2 commits

3c1d43787 mm: remove GFP_HIGHUSER_PAGECACHE ... Browse Code »

GFP_HIGHUSER_PAGECACHE is just an alias for GFP_HIGHUSER_MOVABLE, making
that harder to track down: remove it, and its out-of-work brothers
GFP_NOFS_PAGECACHE and GFP_USER_PAGECACHE.

Since we're making that improvement to hotremove_migrate_alloc(), I think
we can now also remove one of the "o"s from its comment.

Signed-off-by: Hugh Dickins
Acked-by: Mel Gorman
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-01-07 07:59:01 +0800
c04fc586c mm: show node to memory section relationship with symlinks in sysfs ... Browse Code »

Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX. For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
- Provides information needed to determine the specific node
on which a defective DIMM is located. This will reduce system
downtime when the node or defective DIMM is swapped out.
- Prevents unintended onlining of a memory section that was
previously offlined due to a defective DIMM. This could happen
during node hot-add when the user or node hot-add assist script
onlines _all_ offlined sections due to user or script inability
to identify the specific memory sections located on the hot-added
node. The consequences of reintroducing the defective memory
could be ugly.
- Provides information needed to vary the amount and distribution
of memory on specific nodes for testing or debugging purposes.
Future:
- Will provide information needed to identify the memory
sections that need to be offlined prior to physical removal
of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems. Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.

Signed-off-by: Gary Hade
Signed-off-by: Badari Pulavarty
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gary Hade
2009-01-07 07:59:00 +0800

01 Dec, 2008

1 commit

31168481c meminit section warnings ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2008-12-01 02:03:35 +0800

20 Nov, 2008

1 commit

f481891fd cpuset: update top cpuset's mems after adding a node ... Browse Code »

After adding a node into the machine, top cpuset's mems isn't updated.

By reviewing the code, we found that the update function

cpuset_track_online_nodes()

was invoked after node_states[N_ONLINE] changes. It is wrong because
N_ONLINE just means node has pgdat, and if node has/added memory, we use
N_HIGH_MEMORY. So, We should invoke the update function after
node_states[N_HIGH_MEMORY] changes, just like its commit says.

This patch fixes it. And we use notifier of memory hotplug instead of
direct calling of cpuset_track_online_nodes().

Signed-off-by: Miao Xie
Acked-by: Yasunori Goto
Cc: David Rientjes
Cc: Paul Menage
Signed-off-by: Linus Torvalds

Miao Xie
2008-11-20 10:49:58 +0800

20 Oct, 2008

3 commits

de7f0cba9 memory hotplug: release memory regions in PAGES_PER_SECTION chunks ... Browse Code »

During hotplug memory remove, memory regions should be released on a
PAGES_PER_SECTION size chunks. This mirrors the code in add_memory where
resources are requested on a PAGES_PER_SECTION size.

Attempting to release the entire memory region fails because there is not
a single resource for the total number of pages being removed. Instead
the resources for the pages are split in PAGES_PER_SECTION size chunks as
requested during memory add.

Signed-off-by: Nathan Fontenot
Signed-off-by: Badari Pulavarty
Acked-by: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nathan Fontenot
2008-10-20 23:52:32 +0800
62695a84e vmscan: move isolate_lru_page() to vmscan.c ... Browse Code »

On large memory systems, the VM can spend way too much time scanning
through pages that it cannot (or should not) evict from memory. Not only
does it use up CPU time, but it also provokes lock contention and can
leave large systems under memory presure in a catatonic state.

This patch series improves VM scalability by:

1) putting filesystem backed, swap backed and unevictable pages
onto their own LRUs, so the system only scans the pages that it
can/should evict from memory

2) switching to two handed clock replacement for the anonymous LRUs,
so the number of pages that need to be scanned when the system
starts swapping is bound to a reasonable number

3) keeping unevictable pages off the LRU completely, so the
VM does not waste CPU time scanning them. ramfs, ramdisk,
SHM_LOCKED shared memory segments and mlock()ed VMA pages
are keept on the unevictable list.

This patch:

isolate_lru_page logically belongs to be in vmscan.c than migrate.c.

It is tough, because we don't need that function without memory migration
so there is a valid argument to have it in migrate.c. However a
subsequent patch needs to make use of it in the core mm, so we can happily
move it to vmscan.c.

Also, make the function a little more generic by not requiring that it
adds an isolated page to a given list. Callers can do that.

Note that we now have '__isolate_lru_page()', that does
something quite different, visible outside of vmscan.c
for use with memory controller. Methinks we need to
rationalize these names/purposes. --lts

[akpm@linux-foundation.org: fix mm/memory_hotplug.c build]
Signed-off-by: Nick Piggin
Signed-off-by: Rik van Riel
Signed-off-by: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2008-10-20 23:50:25 +0800
71088785c mm: cleanup to make remove_memory() arch-neutral ... Browse Code »

There is nothing architecture specific about remove_memory().
remove_memory() function is common for all architectures which support
hotplug memory remove. Instead of duplicating it in every architecture,
collapse them into arch neutral function.

[akpm@linux-foundation.org: fix the export]
Signed-off-by: Badari Pulavarty
Cc: Yasunori Goto
Cc: Gary Hade
Cc: Mel Gorman
Cc: Yasunori Goto
Cc: "Luck, Tony"
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2008-10-20 23:50:25 +0800

25 Jul, 2008

5 commits

5c755e9fd memory-hotplug: add sysfs removable attribute for hotplug memory remove ... Browse Code »

Memory may be hot-removed on a per-memory-block basis, particularly on
POWER where the SPARSEMEM section size often matches the memory-block
size. A user-level agent must be able to identify which sections of
memory are likely to be removable before attempting the potentially
expensive operation. This patch adds a file called "removable" to the
memory directory in sysfs to help such an agent. In this patch, a memory
block is considered removable if;

o It contains only MOVABLE pageblocks
o It contains only pageblocks with free pages regardless of pageblock type

On the other hand, a memory block starting with a PageReserved() page will
never be considered removable. Without this patch, the user-agent is
forced to choose a memory block to remove randomly.

Sample output of the sysfs files:

./memory/memory0/removable: 0
./memory/memory1/removable: 0
./memory/memory2/removable: 0
./memory/memory3/removable: 0
./memory/memory4/removable: 0
./memory/memory5/removable: 0
./memory/memory6/removable: 0
./memory/memory7/removable: 1
./memory/memory8/removable: 0
./memory/memory9/removable: 0
./memory/memory10/removable: 0
./memory/memory11/removable: 0
./memory/memory12/removable: 0
./memory/memory13/removable: 0
./memory/memory14/removable: 0
./memory/memory15/removable: 0
./memory/memory16/removable: 0
./memory/memory17/removable: 1
./memory/memory18/removable: 1
./memory/memory19/removable: 1
./memory/memory20/removable: 1
./memory/memory21/removable: 1
./memory/memory22/removable: 1

Signed-off-by: Badari Pulavarty
Signed-off-by: Mel Gorman
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2008-07-25 01:47:21 +0800
2f7f24eca memory-hotplug: don't calculate vm_total_pages twice when rebuilding zonelists in online_pages() ... Browse Code »

If zonelist is required to be rebuilt in online_pages(), there is no need
to recalculate vm_total_pages in that function, as it has been updated in
the call build_all_zonelists().

Signed-off-by: Kent Liu
Acked-by: KAMEZAWA Hiroyuki
Cc: Yasunori Goto
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Liu
2008-07-25 01:47:21 +0800
af370fb8c memory hotplug: small fixes to bootmem freeing for memory hotremove ... Browse Code »

- Change some naming
* Magic -> types
* MIX_INFO -> MIX_SECTION_INFO
* Change definition of bootmem type from direct hex value

- __free_pages_bootmem() becomes __meminit.

Signed-off-by: Yasunori Goto
Cc: Andy Whitcroft
Cc: Badari Pulavarty
Cc: Yinghai Lu
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2008-07-25 01:47:21 +0800
d92bc3185 mm: make register_page_bootmem_info_section() static ... Browse Code »

Make the needlessly global register_page_bootmem_info_section() static.

Signed-off-by: Adrian Bunk
Acked-by: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-07-25 01:47:21 +0800
9109fb7b3 mm: drop unneeded pgdat argument from free_area_init_node() ... Browse Code »

free_area_init_node() gets passed in the node id as well as the node
descriptor. This is redundant as the function can trivially get the node
descriptor itself by means of NODE_DATA() and the node's id.

I checked all the users and NODE_DATA() seems to be usable everywhere
from where this function is called.

Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2008-07-25 01:47:16 +0800

15 May, 2008

2 commits

76cdd58e5 memory_hotplug: always initialize pageblock bitmap ... Browse Code »

Trying to online a new memory section that was added via memory hotplug
sometimes results in crashes when the new pages are added via __free_page.
Reason for that is that the pageblock bitmap isn't initialized and hence
contains random stuff. That means that get_pageblock_migratetype()
returns also random stuff and therefore

list_add(&page->lru,
&zone->free_area[order].free_list[migratetype]);

in __free_one_page() tries to do a list_add to something that isn't even
necessarily a list.

This happens since 86051ca5eaf5e560113ec7673462804c54284456 ("mm: fix
usemap initialization") which makes sure that the pageblock bitmap gets
only initialized for pages present in a zone. Unfortunately for hot-added
memory the zones "grow" after the memmap and the pageblock memmap have
been initialized. Which means that the new pages have an unitialized
bitmap. To solve this the calls to grow_zone_span() and grow_pgdat_span()
are moved to __add_zone() just before the initialization happens.

The patch also moves the two functions since __add_zone() is the only
caller and I didn't want to add a forward declaration.

Signed-off-by: Heiko Carstens
Cc: Andy Whitcroft
Cc: Dave Hansen
Cc: Gerald Schaefer
Cc: KAMEZAWA Hiroyuki
Cc: Yasunori Goto
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2008-05-15 10:11:15 +0800
fd8a4221a memory_hotplug: check for walk_memory_resource() failure in online_pages() ... Browse Code »

Add a check to online_pages() to test for failure of
walk_memory_resource(). This fixes a condition where a failure
of walk_memory_resource() can lead to online_pages() returning
success without the requested pages being onlined.

Signed-off-by: Geoff Levand
Cc: Yasunori Goto
Cc: KAMEZAWA Hiroyuki
Cc: Dave Hansen
Cc: Keith Mannthey
Cc: Christoph Lameter
Cc: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Geoff Levand
2008-05-15 10:11:14 +0800