Eric Lee / smarc-fsl-linux-kernel

15 Apr, 2015

1 commit

30467e0b3 mm, hotplug: fix concurrent memory hot-add deadlock ... Browse Code »

There's a deadlock when concurrently hot-adding memory through the probe
interface and switching a memory block from offline to online.

When hot-adding memory via the probe interface, add_memory() first takes
mem_hotplug_begin() and then device_lock() is later taken when registering
the newly initialized memory block. This creates a lock dependency of (1)
mem_hotplug.lock (2) dev->mutex.

When switching a memory block from offline to online, dev->mutex is first
grabbed in device_online() when the write(2) transitions an existing
memory block from offline to online, and then online_pages() will take
mem_hotplug_begin().

This creates a lock inversion between mem_hotplug.lock and dev->mutex.
Vitaly reports that this deadlock can happen when kworker handling a probe
event races with systemd-udevd switching a memory block's state.

This patch requires the state transition to take mem_hotplug_begin()
before dev->mutex. Hot-adding memory via the probe interface creates a
memory block while holding mem_hotplug_begin(), there is no way to take
dev->mutex first in this case.

online_pages() and offline_pages() are only called when transitioning
memory block state. We now require that mem_hotplug_begin() is taken
before calling them -- this requires exporting the mem_hotplug_begin() and
mem_hotplug_done() to generic code. In all hot-add and hot-remove cases,
mem_hotplug_begin() is done prior to device_online(). This is all that is
needed to avoid the deadlock.

Signed-off-by: David Rientjes
Reported-by: Vitaly Kuznetsov
Tested-by: Vitaly Kuznetsov
Cc: Greg Kroah-Hartman
Cc: "Rafael J. Wysocki"
Cc: "K. Y. Srinivasan"
Cc: Yasuaki Ishimatsu
Cc: Tang Chen
Cc: Vlastimil Babka
Cc: Zhang Zhen
Cc: Vladimir Davydov
Cc: Wang Nan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2015-04-15 07:49:00 +0800

10 Oct, 2014

1 commit

ed2f24009 memory-hotplug: add sysfs valid_zones attribute ... Browse Code »

Currently memory-hotplug has two limits:

1. If the memory block is in ZONE_NORMAL, you can change it to
ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.

2. If the memory block is in ZONE_MOVABLE, you can change it to
ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.

With this patch, we can easy to know a memory block can be onlined to
which zone, and don't need to know the above two limits.

Updated the related Documentation.

[akpm@linux-foundation.org: use conventional comment layout]
[akpm@linux-foundation.org: fix build with CONFIG_MEMORY_HOTREMOVE=n]
[akpm@linux-foundation.org: remove unused local zone_prev]
Signed-off-by: Zhang Zhen
Cc: Dave Hansen
Cc: David Rientjes
Cc: Toshi Kani
Cc: Yasuaki Ishimatsu
Cc: Naoya Horiguchi
Cc: Wang Nan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhang Zhen
2014-10-10 10:25:52 +0800

07 Aug, 2014

2 commits

632644007 memory-hotplug: add zone_for_memory() for selecting zone for new memory ... Browse Code »

This series of patches fixes a problem when adding memory in bad manner.
For example: for a x86_64 machine booted with "mem=400M" and with 2GiB
memory installed, following commands cause problem:

# echo 0x40000000 > /sys/devices/system/memory/probe
[ 28.613895] init_memory_mapping: [mem 0x40000000-0x47ffffff]
# echo 0x48000000 > /sys/devices/system/memory/probe
[ 28.693675] init_memory_mapping: [mem 0x48000000-0x4fffffff]
# echo online_movable > /sys/devices/system/memory/memory9/state
# echo 0x50000000 > /sys/devices/system/memory/probe
[ 29.084090] init_memory_mapping: [mem 0x50000000-0x57ffffff]
# echo 0x58000000 > /sys/devices/system/memory/probe
[ 29.151880] init_memory_mapping: [mem 0x58000000-0x5fffffff]
# echo online_movable > /sys/devices/system/memory/memory11/state
# echo online> /sys/devices/system/memory/memory8/state
# echo online> /sys/devices/system/memory/memory10/state
# echo offline> /sys/devices/system/memory/memory9/state
[ 30.558819] Offlined Pages 32768
# free
total used free shared buffers cached
Mem: 780588 18014398509432020 830552 0 0 51180
-/+ buffers/cache: 18014398509380840 881732
Swap: 0 0 0

This is because the above commands probe higher memory after online a
section with online_movable, which causes ZONE_HIGHMEM (or ZONE_NORMAL
for systems without ZONE_HIGHMEM) overlaps ZONE_MOVABLE.

After the second online_movable, the problem can be observed from
zoneinfo:

# cat /proc/zoneinfo
...
Node 0, zone Movable
pages free 65491
min 250
low 312
high 375
scanned 0
spanned 18446744073709518848
present 65536
managed 65536
...

This series of patches solve the problem by checking ZONE_MOVABLE when
choosing zone for new memory. If new memory is inside or higher than
ZONE_MOVABLE, makes it go there instead.

After applying this series of patches, following are free and zoneinfo
result (after offlining memory9):

bash-4.2# free
total used free shared buffers cached
Mem: 780956 80112 700844 0 0 51180
-/+ buffers/cache: 28932 752024
Swap: 0 0 0

bash-4.2# cat /proc/zoneinfo

Node 0, zone DMA
pages free 3389
min 14
low 17
high 21
scanned 0
spanned 4095
present 3998
managed 3977
nr_free_pages 3389
...
start_pfn: 1
inactive_ratio: 1
Node 0, zone DMA32
pages free 73724
min 341
low 426
high 511
scanned 0
spanned 98304
present 98304
managed 92958
nr_free_pages 73724
...
start_pfn: 4096
inactive_ratio: 1
Node 0, zone Normal
pages free 32630
min 120
low 150
high 180
scanned 0
spanned 32768
present 32768
managed 32768
nr_free_pages 32630
...
start_pfn: 262144
inactive_ratio: 1
Node 0, zone Movable
pages free 65476
min 241
low 301
high 361
scanned 0
spanned 98304
present 65536
managed 65536
nr_free_pages 65476
...
start_pfn: 294912
inactive_ratio: 1

This patch (of 7):

Introduce zone_for_memory() in arch independent code for
arch_add_memory() use.

Many arch_add_memory() function simply selects ZONE_HIGHMEM or
ZONE_NORMAL and add new memory into it. However, with the existance of
ZONE_MOVABLE, the selection method should be carefully considered: if
new, higher memory is added after ZONE_MOVABLE is setup, the default
zone and ZONE_MOVABLE may overlap each other.

should_add_memory_movable() checks the status of ZONE_MOVABLE. If it
has already contain memory, compare the address of new memory and
movable memory. If new memory is higher than movable, it should be
added into ZONE_MOVABLE instead of default zone.

Signed-off-by: Wang Nan
Cc: Zhang Yanfei
Cc: Dave Hansen
Cc: Ingo Molnar
Cc: Yinghai Lu
Cc: "Mel Gorman"
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: "Luck, Tony"
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Chris Metcalf
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wang Nan
2014-08-07 09:01:21 +0800
4f7c6b49c mem-hotplug: introduce MMOP_OFFLINE to replace the hard coding -1 ... Browse Code »

In store_mem_state(), we have:

...
334 else if (!strncmp(buf, "offline", min_t(int, count, 7)))
335 online_type = -1;
...
355 case -1:
356 ret = device_offline(&mem->dev);
357 break;
...

Here, "offline" is hard coded as -1.

This patch does the following renaming:

ONLINE_KEEP -> MMOP_ONLINE_KEEP
ONLINE_KERNEL -> MMOP_ONLINE_KERNEL
ONLINE_MOVABLE -> MMOP_ONLINE_MOVABLE

and introduces MMOP_OFFLINE = -1 to avoid hard coding.

Signed-off-by: Tang Chen
Cc: Hu Tao
Cc: Greg Kroah-Hartman
Cc: Lai Jiangshan
Cc: Yasuaki Ishimatsu
Cc: Gu Zheng
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2014-08-07 09:01:16 +0800

05 Jun, 2014

1 commit

bfc8c9013 mem-hotplug: implement get/put_online_mems ... Browse Code »

kmem_cache_{create,destroy,shrink} need to get a stable value of
cpu/node online mask, because they init/destroy/access per-cpu/node
kmem_cache parts, which can be allocated or destroyed on cpu/mem
hotplug. To protect against cpu hotplug, these functions use
{get,put}_online_cpus. However, they do nothing to synchronize with
memory hotplug - taking the slab_mutex does not eliminate the
possibility of race as described in patch 2.

What we need there is something like get_online_cpus, but for memory.
We already have lock_memory_hotplug, which serves for the purpose, but
it's a bit of a hammer right now, because it's backed by a mutex. As a
result, it imposes some limitations to locking order, which are not
desirable, and can't be used just like get_online_cpus. That's why in
patch 1 I substitute it with get/put_online_mems, which work exactly
like get/put_online_cpus except they block not cpu, but memory hotplug.

[ v1 can be found at https://lkml.org/lkml/2014/4/6/68. I NAK'ed it by
myself, because it used an rw semaphore for get/put_online_mems,
making them dead lock prune. ]

This patch (of 2):

{un}lock_memory_hotplug, which is used to synchronize against memory
hotplug, is currently backed by a mutex, which makes it a bit of a
hammer - threads that only want to get a stable value of online nodes
mask won't be able to proceed concurrently. Also, it imposes some
strong locking ordering rules on it, which narrows down the set of its
usage scenarios.

This patch introduces get/put_online_mems, which are the same as
get/put_online_cpus, but for memory hotplug, i.e. executing a code
inside a get/put_online_mems section will guarantee a stable value of
online nodes, present pages, etc.

lock_memory_hotplug()/unlock_memory_hotplug() are removed altogether.

Signed-off-by: Vladimir Davydov
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Tang Chen
Cc: Zhang Yanfei
Cc: Toshi Kani
Cc: Xishi Qiu
Cc: Jiang Liu
Cc: Rafael J. Wysocki
Cc: David Rientjes
Cc: Wen Congyang
Cc: Yasuaki Ishimatsu
Cc: Lai Jiangshan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-06-05 07:53:59 +0800

13 Nov, 2013

2 commits

85b35feae mm/sparsemem: use PAGES_PER_SECTION to remove redundant nr_pages parameter ... Browse Code »

For below functions,

- sparse_add_one_section()
- kmalloc_section_memmap()
- __kmalloc_section_memmap()
- __kfree_section_memmap()

they are always invoked to operate on one memory section, so it is
redundant to always pass a nr_pages parameter, which is the page numbers
in one section. So we can directly use predefined macro PAGES_PER_SECTION
instead of passing the parameter.

Signed-off-by: Zhang Yanfei
Cc: Wen Congyang
Cc: Tang Chen
Cc: Toshi Kani
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Yasunori Goto
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhang Yanfei
2013-11-13 11:09:06 +0800
01b0f1970 cpu/mem hotplug: add try_online_node() for cpu_up() ... Browse Code »

cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call
mem_online_node() to put its node online if offlined and then call
build_all_zonelists() to initialize the zone list.

These steps are specific to memory hotplug, and should be managed in
mm/memory_hotplug.c. lock_memory_hotplug() should also be held for the
whole steps.

For this reason, this patch replaces mem_online_node() with
try_online_node(), which performs the whole steps with
lock_memory_hotplug() held. try_online_node() is named after
try_offline_node() as they have similar purpose.

There is no functional change in this patch.

Signed-off-by: Toshi Kani
Reviewed-by: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshi Kani
2013-11-13 11:09:04 +0800

02 Jun, 2013

3 commits

aba6efc47 Memory hotplug: Move alternative function definitions to header ... Browse Code »

Move the definitions of offline_pages() and remove_memory()
for CONFIG_MEMORY_HOTREMOVE to memory_hotplug.h, where they belong,
and make them static inline.

Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2013-06-02 04:24:07 +0800
242831eb1 Memory hotplug / ACPI: Simplify memory removal ... Browse Code »

Now that the memory offlining should be taken care of by the
companion device offlining code in acpi_scan_hot_remove(), the
ACPI memory hotplug driver doesn't need to offline it in
remove_memory() any more. Moreover, since the return value of
remove_memory() is not used, it's better to make it be a void
function and trigger a BUG() if the memory scheduled for removal is
not offline.

Change the code in accordance with the above observations.

Signed-off-by: Rafael J. Wysocki
Reviewed-by: Toshi Kani

Rafael J. Wysocki
2013-06-02 03:37:10 +0800
ea50be593 Driver core / MM: Drop offline_memory_block() ... Browse Code »

Since offline_memory_block(mem) is functionally equivalent to
device_offline(&mem->dev), make the only caller of the former use
the latter instead and drop offline_memory_block() entirely.

Signed-off-by: Rafael J. Wysocki
Acked-by: Greg Kroah-Hartman
Acked-by: Toshi Kani

Rafael J. Wysocki
2013-06-02 03:37:10 +0800

12 May, 2013

1 commit

e2ff39400 ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes ... Browse Code »

During ACPI memory hotplug configuration bind memory blocks residing
in modules removable through the standard ACPI mechanism to struct
acpi_device objects associated with ACPI namespace objects
representing those modules. Accordingly, unbind those memory blocks
from the struct acpi_device objects when the memory modules in
question are being removed.

When "offline" operation for devices representing memory blocks is
introduced, this will allow the ACPI core's device hot-remove code to
use it to carry out remove_memory() for those memory blocks and check
the results of that before it actually removes the modules holding
them from the system.

Since walk_memory_range() is used for accessing all memory blocks
corresponding to a given ACPI namespace object, it is exported from
memory_hotplug.c so that the code in acpi_memhotplug.c can use it.

Signed-off-by: Rafael J. Wysocki
Tested-by: Vasilis Liaskovitis
Reviewed-by: Toshi Kani

Rafael J. Wysocki
2013-05-12 20:14:38 +0800

30 Apr, 2013

1 commit

4edd7ceff mm, hotplug: avoid compiling memory hotremove functions when disabled ... Browse Code »

__remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE. PowerPC
pseries will return -EOPNOTSUPP if unsupported.

Adding an #ifdef causes several other functions it depends on to also
become unnecessary, which saves in .text when disabled (it's disabled in
most defconfigs besides powerpc, including x86). remove_memory_block()
becomes static since it is not referenced outside of
drivers/base/memory.c.

Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
and disabled.

Signed-off-by: David Rientjes
Acked-by: Toshi Kani
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Greg Kroah-Hartman
Cc: Wen Congyang
Cc: Tang Chen
Cc: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2013-04-30 06:54:37 +0800

24 Feb, 2013

5 commits

90b30cdc1 memory-hotplug: export the function try_offline_node() ... Browse Code »

try_offline_node() will be needed in the tristate
drivers/acpi/processor_driver.c.

The node will be offlined when all memory/cpu on the node have been
hotremoved. So we need the function try_offline_node() in cpu-hotplug
path.

If the memory-hotplug is disabled, and cpu-hotplug is enabled

1. no memory no the node
we don't online the node, and cpu's node is the nearest node.

2. the node contains some memory
the node has been onlined, and cpu's node is still needed
to migrate the sleep task on the cpu to the same node.

So we do nothing in try_offline_node() in this case.

[rientjes@google.com: export the function try_offline_node() fix]
Signed-off-by: Wen Congyang
Signed-off-by: Tang Chen
Cc: Yasuaki Ishimatsu
Cc: David Rientjes
Cc: Jiang Liu
Cc: Minchan Kim
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Peter Zijlstra
Cc: Len Brown
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wen Congyang
2013-02-24 09:50:13 +0800
60a5a19e7 memory-hotplug: remove sysfs file of node ... Browse Code »

Introduce a new function try_offline_node() to remove sysfs file of node
when all memory sections of this node are removed. If some memory
sections of this node are not removed, this function does nothing.

Signed-off-by: Wen Congyang
Signed-off-by: Tang Chen
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Yasuaki Ishimatsu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2013-02-24 09:50:13 +0800
46723bfa5 memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap ... Browse Code »

For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by
get_page_bootmem(). So the patch searches pages of virtual mapping and
registers the pages by get_page_bootmem().

NOTE: register_page_bootmem_memmap() is not implemented for ia64,
ppc, s390, and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE
and revert register_page_bootmem_info_node() when platform doesn't
support it.

It's implemented by adding a new Kconfig option named
CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected
by memory-hotplug feature fully supported archs(currently only on
x86_64).

Since we have 2 config options called MEMORY_HOTPLUG and
MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately,
and codes in function register_page_bootmem_info_node() are only
used for collecting infomation for hot-remove, so reside it under
MEMORY_HOTREMOVE.

Besides page_isolation.c selected by MEMORY_ISOLATION under
MEMORY_HOTPLUG is also such case, move it too.

[mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE]
[linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()]
[mhocko@suse.cz: remove the arch specific functions without any implementation]
[linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed]
[rientjes@google.com: fix defined but not used warning]
Signed-off-by: Wen Congyang
Signed-off-by: Yasuaki Ishimatsu
Signed-off-by: Tang Chen
Reviewed-by: Wu Jianguo
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Michal Hocko
Signed-off-by: Lin Feng
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasuaki Ishimatsu
2013-02-24 09:50:12 +0800
24d335ca3 memory-hotplug: introduce new arch_remove_memory() for removing page table ... Browse Code »

For removing memory, we need to remove page tables. But it depends on
architecture. So the patch introduce arch_remove_memory() for removing
page table. Now it only calls __remove_pages().

Note: __remove_pages() for some archtecuture is not implemented
(I don't know how to implement it for s390).

Signed-off-by: Wen Congyang
Signed-off-by: Tang Chen
Acked-by: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Yasuaki Ishimatsu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wen Congyang
2013-02-24 09:50:12 +0800
6677e3eaf memory-hotplug: check whether all memory blocks are offlined or not when removing memory ... Browse Code »

We remove the memory like this:

1. lock memory hotplug
2. offline a memory block
3. unlock memory hotplug
4. repeat 1-3 to offline all memory blocks
5. lock memory hotplug
6. remove memory(TODO)
7. unlock memory hotplug

All memory blocks must be offlined before removing memory. But we don't
hold the lock in the whole operation. So we should check whether all
memory blocks are offlined before step6. Otherwise, kernel maybe
panicked.

Offlining a memory block and removing a memory device can be two
different operations. Users can just offline some memory blocks without
removing the memory device. For this purpose, the kernel has held
lock_memory_hotplug() in __offline_pages(). To reuse the code for
memory hot-remove, we repeat step 1-3 to offline all the memory blocks,
repeatedly lock and unlock memory hotplug, but not hold the memory
hotplug lock in the whole operation.

Signed-off-by: Wen Congyang
Signed-off-by: Yasuaki Ishimatsu
Signed-off-by: Tang Chen
Acked-by: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasuaki Ishimatsu
2013-02-24 09:50:11 +0800

12 Dec, 2012

1 commit

511c2aba8 mm, memory-hotplug: dynamic configure movable memory and portion memory ... Browse Code »

Add online_movable and online_kernel for logic memory hotplug. This is
the dynamic version of "movablecore" & "kernelcore".

We have the same reason to introduce it as to introduce "movablecore" &
"kernelcore". It has the same motive as "movablecore" & "kernelcore", but
it is dynamic/running-time:

o We can configure memory as kernelcore or movablecore after boot.

Userspace workload is increased, we need more hugepage, we can't use
"online_movable" to add memory and allow the system use more
THP(transparent-huge-page), vice-verse when kernel workload is increase.

Also help for virtualization to dynamic configure host/guest's memory,
to save/(reduce waste) memory.

Memory capacity on Demand

o When a new node is physically online after boot, we need to use
"online_movable" or "online_kernel" to configure/portion it as we
expected when we logic-online it.

This configuration also helps for physically-memory-migrate.

o all benefit as the same as existed "movablecore" & "kernelcore".

o Preparing for movable-node, which is very important for power-saving,
hardware partitioning and high-available-system(hardware fault
management).

(Note, we don't introduce movable-node here.)

Action behavior:
When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.

When it is online by "online_kernel", the kernel can use it.
When it is online by "online", the zone type doesn't changed.

Current constraints:
Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be online from ZONE_NORMAL to ZONE_MOVABLE.

[akpm@linux-foundation.org: use min_t, cleanups]
Signed-off-by: Lai Jiangshan
Signed-off-by: Wen Congyang
Cc: Yasuaki Ishimatsu
Cc: Lai Jiangshan
Cc: Jiang Liu
Cc: KOSAKI Motohiro
Cc: Minchan Kim
Cc: Mel Gorman
Cc: David Rientjes
Cc: Yinghai Lu
Cc: Rusty Russell
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2012-12-12 09:22:28 +0800

09 Oct, 2012

2 commits

e90bdb7f5 memory-hotplug: update memory block's state and notify userspace ... Browse Code »

remove_memory() will be called when hot removing a memory device. But
even if offlining memory, we cannot notice it. So the patch updates the
memory block's state and sends notification to userspace.

Additionally, the memory device may contain more than one memory block.
If the memory block has been offlined, __offline_pages() will fail. So we
should try to offline one memory block at a time.

Thus remove_memory() also check each memory block's state. So there is no
need to check the memory block's state before calling remove_memory().

Signed-off-by: Wen Congyang
Signed-off-by: Yasuaki Ishimatsu
Cc: David Rientjes
Cc: Jiang Liu
Cc: Len Brown
Cc: Christoph Lameter
Cc: Minchan Kim
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wen Congyang
2012-10-09 15:23:02 +0800
a16cee10c memory-hotplug: preparation to notify memory block's state at memory hot remove ... Browse Code »

remove_memory() is called in two cases:
1. echo offline >/sys/devices/system/memory/memoryXX/state
2. hot remove a memory device

In the 1st case, the memory block's state is changed and the notification
that memory block's state changed is sent to userland after calling
remove_memory(). So user can notice memory block is changed.

But in the 2nd case, the memory block's state is not changed and the
notification is not also sent to userspcae even if calling
remove_memory(). So user cannot notice memory block is changed.

For adding the notification at memory hot remove, the patch just prepare
as follows:
1st case uses offline_pages() for offlining memory.
2nd case uses remove_memory() for offlining memory and changing memory block's
state and notifing the information.

The patch does not implement notification to remove_memory().

Signed-off-by: Wen Congyang
Signed-off-by: Yasuaki Ishimatsu
Cc: David Rientjes
Cc: Jiang Liu
Cc: Len Brown
Cc: Christoph Lameter
Cc: Minchan Kim
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wen Congyang
2012-10-09 15:23:02 +0800

05 Mar, 2012

1 commit

187f1882b BUG: headers with BUG/BUG_ON etc. need linux/bug.h ... Browse Code »

If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
other BUG variant in a static inline (i.e. not in a #define) then
that header really should be including and not just
expecting it to be implicitly present.

We can make this change risk-free, since if the files using these
headers didn't have exposure to linux/bug.h already, they would have
been causing compile failures/warnings.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-03-05 06:54:34 +0800

26 Jul, 2011

1 commit

9d0ad8ca4 mm: extend memory hotplug API to allow memory hotplug in virtual machines ... Browse Code »

This patch contains online_page_callback and apropriate functions for
registering/unregistering online page callbacks. It allows to do some
machine specific tasks during online page stage which is required to
implement memory hotplug in virtual machines. Currently this patch is
required by latest memory hotplug support for Xen balloon driver patch
which will be posted soon.

Additionally, originial online_page() function was splited into
following functions doing "atomic" operations:

- __online_page_set_limits() - set new limits for memory management code,
- __online_page_increment_counters() - increment totalram_pages and totalhigh_pages,
- __online_page_free() - free page to allocator.

It was done to:
- not duplicate existing code,
- ease hotplug code devolpment by usage of well defined interface,
- avoid stupid bugs which are unavoidable when the same code
(by design) is developed in many places.

[akpm@linux-foundation.org: use explicit indirect-call syntax]
Signed-off-by: Daniel Kiper
Reviewed-by: Konrad Rzeszutek Wilk
Cc: Ian Campbell
Cc: Jeremy Fitzhardinge
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Kiper
2011-07-26 11:57:08 +0800

15 Jan, 2011

1 commit

597fb188c Merge branch 'slub/hotplug' into slab/urgent Browse Code »

Pekka Enberg
2011-01-15 19:28:17 +0800

14 Jan, 2011

1 commit

5f24ce5fd thp: remove PG_buddy ... Browse Code »

PG_buddy can be converted to _mapcount == -2. So the PG_compound_lock can
be added to page->flags without overflowing (because of the sparse section
bits increasing) with CONFIG_X86_PAE=y and CONFIG_X86_PAT=y. This also
has to move the memory hotplug code from _mapcount to lru.next to avoid
any risk of clashes. We can't use lru.next for PG_buddy removal, but
memory hotplug can use lru.next even more easily than the mapcount
instead.

Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:43 +0800

11 Jan, 2011

1 commit

925268a06 memory hotplug: one more lock on memory hotplug ... Browse Code »

Now, memory_hotplug_(un)lock() is used for add/remove/offline pages
for avoiding races with hibernation. But this should be held in
online_pages(), too. It seems asymmetric.

There are cases where one has to avoid a race with memory hotplug
notifier and his own local code, and hotplug v.s. hotplug.
This will add a generic solution for avoiding races. In other view,
having lock here has no big impacts. online pages is tend to be
done by udev script at el against each memory section one by one.

Then, it's better to have lock here, too.

Cc: # 2.6.37
Reviewed-by: Christoph Lameter
Acked-by: David Rientjes
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Pekka Enberg

KAMEZAWA Hiroyuki
2011-01-11 23:09:50 +0800

03 Dec, 2010

1 commit

20d6c96b5 mem-hotplug: introduce {un}lock_memory_hotplug() ... Browse Code »
43

Presently hwpoison is using lock_system_sleep() to prevent a race with
memory hotplug. However lock_system_sleep() is a no-op if
CONFIG_HIBERNATION=n. Therefore we need a new lock.

Signed-off-by: KOSAKI Motohiro
Cc: Andi Kleen
Cc: Kamezawa Hiroyuki
Suggested-by: Hugh Dickins
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-12-03 06:51:15 +0800

27 Oct, 2010

1 commit

49ac82558 memory hotplug: unify is_removable and offline detection code ... Browse Code »

Now, sysfs interface of memory hotplug shows whether the section is
removable or not. But it checks only migrateype of pages and doesn't
check details of cluster of pages.

Next, memory hotplug's set_migratetype_isolate() has the same kind of
check, too.

This patch adds the function __count_unmovable_pages() and makes above 2
checks to use the same logic. Then, is_removable and hotremove code uses
the same logic. No changes in the hotremove logic itself.

TODO: need to find a way to check RECLAMABLE. But, considering bit,
calling shrink_slab() against a range before starting memory hotremove
sounds better. If so, this patch's logic doesn't need to be changed.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Reported-by: Michal Hocko
Cc: Wu Fengguang
Cc: Mel Gorman
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-27 07:52:06 +0800

25 May, 2010

1 commit

cf23422b9 cpu/mem hotplug: enable CPUs online before local memory online ... Browse Code »

Enable users to online CPUs even if the CPUs belongs to a numa node which
doesn't have onlined local memory.

The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
in system boot/init period, or at the time of local memory online. For a
numa node without onlined local memory, its zonelists are not initialized
at present. As a result, any memory allocation operations executed by
CPUs within this node will fail. In fact, an out-of-memory error is
triggered when attempt to online CPUs before memory comes to online.

This patch tries to create zonelists for such numa nodes, so that the
memory allocation for this node can be fallback'ed to other nodes.

[akpm@linux-foundation.org: remove unneeded export]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: minskey guo
Cc: Minchan Kim
Cc: Yasunori Goto
Cc: Andi Kleen
Cc: Christoph Lameter
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

minskey guo
2010-05-25 23:07:00 +0800

16 Dec, 2009

1 commit

b4e655a4a mm: memory_hotplug: make offline_pages() static ... Browse Code »

It has no references outside memory_hotplug.c.

Cc: "Rafael J. Wysocki"
Cc: Andi Kleen
Cc: Gerald Schaefer
Cc: KOSAKI Motohiro
Cc: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2009-12-16 00:53:20 +0800

23 Sep, 2009

1 commit

908eedc61 walk system ram range ... Browse Code »

Originally, walk_memory_resource() was introduced to traverse all memory
of "System RAM" for detecting memory hotplug/unplug range. For doing so,
flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
memory hotplug.

But for using other purpose, /proc/kcore, this may includes some firmware
area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the
check strict to find out busy "System RAM".

Note: PPC64 keeps their own walk_memory_resouce(), which walk through
ppc64's lmb informaton. Because old kclist_add() is called per lmb, this
patch makes no difference in behavior, finally.

And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
Because pfn_valid() just show "there is memmap or not* and cannot be used
for "there is physical memory or not", this function is useful in generic
to scan physical memory range.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: WANG Cong
Cc: Américo Wang
Cc: David Rientjes
Cc: Roland Dreier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-09-23 22:39:41 +0800

07 Jan, 2009

1 commit

c04fc586c mm: show node to memory section relationship with symlinks in sysfs ... Browse Code »

Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX. For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
- Provides information needed to determine the specific node
on which a defective DIMM is located. This will reduce system
downtime when the node or defective DIMM is swapped out.
- Prevents unintended onlining of a memory section that was
previously offlined due to a defective DIMM. This could happen
during node hot-add when the user or node hot-add assist script
onlines _all_ offlined sections due to user or script inability
to identify the specific memory sections located on the hot-added
node. The consequences of reintroducing the defective memory
could be ugly.
- Provides information needed to vary the amount and distribution
of memory on specific nodes for testing or debugging purposes.
Future:
- Will provide information needed to identify the memory
sections that need to be offlined prior to physical removal
of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems. Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.

Signed-off-by: Gary Hade
Signed-off-by: Badari Pulavarty
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gary Hade
2009-01-07 07:59:00 +0800

25 Jul, 2008

2 commits

5c755e9fd memory-hotplug: add sysfs removable attribute for hotplug memory remove ... Browse Code »

Memory may be hot-removed on a per-memory-block basis, particularly on
POWER where the SPARSEMEM section size often matches the memory-block
size. A user-level agent must be able to identify which sections of
memory are likely to be removable before attempting the potentially
expensive operation. This patch adds a file called "removable" to the
memory directory in sysfs to help such an agent. In this patch, a memory
block is considered removable if;

o It contains only MOVABLE pageblocks
o It contains only pageblocks with free pages regardless of pageblock type

On the other hand, a memory block starting with a PageReserved() page will
never be considered removable. Without this patch, the user-agent is
forced to choose a memory block to remove randomly.

Sample output of the sysfs files:

./memory/memory0/removable: 0
./memory/memory1/removable: 0
./memory/memory2/removable: 0
./memory/memory3/removable: 0
./memory/memory4/removable: 0
./memory/memory5/removable: 0
./memory/memory6/removable: 0
./memory/memory7/removable: 1
./memory/memory8/removable: 0
./memory/memory9/removable: 0
./memory/memory10/removable: 0
./memory/memory11/removable: 0
./memory/memory12/removable: 0
./memory/memory13/removable: 0
./memory/memory14/removable: 0
./memory/memory15/removable: 0
./memory/memory16/removable: 0
./memory/memory17/removable: 1
./memory/memory18/removable: 1
./memory/memory19/removable: 1
./memory/memory20/removable: 1
./memory/memory21/removable: 1
./memory/memory22/removable: 1

Signed-off-by: Badari Pulavarty
Signed-off-by: Mel Gorman
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2008-07-25 01:47:21 +0800
af370fb8c memory hotplug: small fixes to bootmem freeing for memory hotremove ... Browse Code »

- Change some naming
* Magic -> types
* MIX_INFO -> MIX_SECTION_INFO
* Change definition of bootmem type from direct hex value

- __free_pages_bootmem() becomes __meminit.

Signed-off-by: Yasunori Goto
Cc: Andy Whitcroft
Cc: Badari Pulavarty
Cc: Yinghai Lu
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2008-07-25 01:47:21 +0800

09 Jun, 2008

1 commit

0d5799449 [POWERPC] Make walk_memory_resource available with MEMORY_HOTPLUG=n ... Browse Code »

The ehea driver was recently changed[1] to use walk_memory_resource() to
detect the system's memory layout. However, walk_memory_resource() is
available only when memory hotplug is enabled. So CONFIG_EHEA was
made to depend on MEMORY_HOTPLUG [2], but it is inappropriate for a
network driver to have such a dependency.

Make the declaration of walk_memory_resource() and its powerpc
implementation (ehea is powerpc-specific) unconditionally available.

[1] 48cfb14f8b89d4d5b3df6c16f08b258686fb12ad
"ehea: Add DLPAR memory remove support"

[2] fb7b6ca2b6b7c23b52be143bdd5f55a23b9780c8
"ehea: Add dependency to Kconfig"

Signed-off-by: Nathan Lynch
Acked-by: Badari Pulavarty
Signed-off-by: Paul Mackerras

Nathan Lynch
2008-06-09 09:32:41 +0800

28 Apr, 2008

2 commits

047532787 memory hotplug: register section/node id to free ... Browse Code »

This patch set is to free pages which is allocated by bootmem for
memory-hotremove. Some structures of memory management are allocated by
bootmem. ex) memmap, etc.

To remove memory physically, some of them must be freed according to
circumstance. This patch set makes basis to free those pages, and free
memmaps.

Basic my idea is using remain members of struct page to remember information
of users of bootmem (section number or node id). When the section is
removing, kernel can confirm it. By this information, some issues can be
solved.

1) When the memmap of removing section is allocated on other
section by bootmem, it should/can be free.
2) When the memmap of removing section is allocated on the
same section, it shouldn't be freed. Because the section has to be
logical memory offlined already and all pages must be isolated against
page allocater. If it is freed, page allocator may use it which will
be removed physically soon.
3) When removing section has other section's memmap,
kernel will be able to show easily which section should be removed
before it for user. (Not implemented yet)
4) When the above case 2), the page isolation will be able to check and skip
memmap's page when logical memory offline (offline_pages()).
Current page isolation code fails in this case because this page is
just reserved page and it can't distinguish this pages can be
removed or not. But, it will be able to do by this patch.
(Not implemented yet.)
5) The node information like pgdat has similar issues. But, this
will be able to be solved too by this.
(Not implemented yet, but, remembering node id in the pages.)

Fortunately, current bootmem allocator just keeps PageReserved flags,
and doesn't use any other members of page struct. The users of
bootmem doesn't use them too.

This patch:

This is to register information which is node or section's id. Kernel can
distinguish which node/section uses the pages allcated by bootmem. This is
basis for hot-remove sections or nodes.

Signed-off-by: Yasunori Goto
Cc: Badari Pulavarty
Cc: Yinghai Lu
Cc: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasunori Goto
2008-04-28 23:58:25 +0800
ea01ea937 hotplug memory remove: generic __remove_pages() support ... Browse Code »

Generic helper function to remove section mappings and sysfs entries for the
section of the memory we are removing. offline_pages() correctly adjusted
zone and marked the pages reserved.

TODO: Yasunori Goto is working on patches to free up allocations from bootmem.

Signed-off-by: Badari Pulavarty
Acked-by: Yasunori Goto
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2008-04-28 23:58:17 +0800

17 Oct, 2007

4 commits

48e94196a fix memory hot remove not configured case. ... Browse Code »

Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
This patch cleans up them. This is against 2.6.23-rc6-mm1.

- fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case.
- For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
which returns -EINVAL.
- removed remove_pages() only used in powerpc.
- removed no-op remove_memory() in i386, sh, sparc64, x86_64.

- only powerpc returns -ENOSYS at memory hot remove(no-op). changes it
to return -EINVAL.

Note:
Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
archs if there are requirements and testers.

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2007-10-17 00:43:02 +0800
0c0e61958 memory unplug: page offline ... Browse Code »

Logic.
- set all pages in [start,end) as isolated migration-type.
by this, all free pages in the range will be not-for-use.
- Migrate all LRU pages in the range.
- Test all pages in the range's refcnt is zero or not.

Todo:
- allocate migration destination page from better area.
- confirm page_count(page)== 0 && PageReserved(page) page is safe to be freed..
(I don't like this kind of page but..
- Find out pages which cannot be migrated.
- more running tests.
- Use reclaim for unplugging other memory type area.

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2007-10-17 00:43:02 +0800
75884fb1c memory unplug: memory hotplug cleanup ... Browse Code »

A clean up patch for "scanning memory resource [start, end)" operation.

Now, find_next_system_ram() function is used in memory hotplug, but this
interface is not easy to use and codes are complicated.

This patch adds walk_memory_resouce(start,len,arg,func) function.
The function 'func' is called per valid memory resouce range in [start,pfn).

[pbadari@us.ibm.com: Error handling in walk_memory_resource()]
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2007-10-17 00:43:01 +0800
39e91e433 Clean up duplicate includes in include/linux/memory_hotplug.h ... Browse Code »

This patch cleans up duplicate includes in
include/linux/memory_hotplug.h

Signed-off-by: Jesper Juhl
Acked-by: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2007-10-17 00:42:52 +0800