Eric Lee / smarc-fsl-linux-kernel

25 May, 2013

1 commit

348f9f05e mm/memory_hotplug.c: fix printk format warnings ... Browse Code »

Fix printk format warnings in mm/memory_hotplug.c by using "%pa":

mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'resource_size_t' [-Wformat]
mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'resource_size_t' [-Wformat]

Signed-off-by: Randy Dunlap
Reported-by: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
12 years ago

30 Apr, 2013

4 commits

349daa0f9 mm: fix memory_hotplug.c printk format warning ... Browse Code »

PFN_PHYS() is a phys_addr_t, which can be u32 or u64.
Fix the build warning when phys_addr_t is u32.

mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'unsigned int' [-Wformat]: => 1685:3
mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'unsigned int' [-Wformat]: => 1685:3

Signed-off-by: Randy Dunlap
Reported-by: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
12 years ago
4edd7ceff mm, hotplug: avoid compiling memory hotremove functions when disabled ... Browse Code »

__remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE. PowerPC
pseries will return -EOPNOTSUPP if unsupported.

Adding an #ifdef causes several other functions it depends on to also
become unnecessary, which saves in .text when disabled (it's disabled in
most defconfigs besides powerpc, including x86). remove_memory_block()
becomes static since it is not referenced outside of
drivers/base/memory.c.

Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
and disabled.

Signed-off-by: David Rientjes
Acked-by: Toshi Kani
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Greg Kroah-Hartman
Cc: Wen Congyang
Cc: Tang Chen
Cc: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
12 years ago
fe74ebb10 mm: change __remove_pages() to call release_mem_region_adjustable() ... Browse Code »

Change __remove_pages() to call release_mem_region_adjustable(). This
allows a requested memory range to be released from the iomem_resource
table even if it does not match exactly to an resource entry but still
fits into. The resource entries initialized at bootup usually cover the
whole contiguous memory ranges and may not necessarily match with the
size of memory hot-delete requests.

If release_mem_region_adjustable() failed, __remove_pages() emits a
warning message and continues to proceed as it was the case with
release_mem_region(). release_mem_region(), which is defined to
__release_region(), emits a warning message and returns no error since a
void function.

Signed-off-by: Toshi Kani
Reviewed-by : Yasuaki Ishimatsu
Acked-by: David Rientjes
Cc: Ram Pai
Cc: T Makphaibulchoke
Cc: Wen Congyang
Cc: Tang Chen
Cc: Jiang Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshi Kani
12 years ago
e05c4bbfa mm: walk_memory_range(): fix typo in comment ... Browse Code »

Fix a typo "end_pft" in the comment of walk_memory_range().

Signed-off-by: Toshi Kani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshi Kani
12 years ago

23 Mar, 2013

1 commit

ca4b3f302 mm/hotplug: only free wait_table if it's allocated by vmalloc ... Browse Code »

zone->wait_table may be allocated from bootmem, it can not be freed.

Signed-off-by: Jianguo Wu
Reviewed-by: Tang Chen
Cc: Tang Chen
Cc: Jiang Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jianguo Wu
12 years ago

14 Mar, 2013

1 commit

f8749452a mm: remove_memory(): fix end_pfn setting ... Browse Code »

remove_memory() calls walk_memory_range() with [start_pfn, end_pfn), where
end_pfn is exclusive in this range. Therefore, end_pfn needs to be set to
the next page of the end address.

Signed-off-by: Toshi Kani
Cc: Wen Congyang
Cc: Tang Chen
Cc: Kamezawa Hiroyuki
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshi Kani
12 years ago

24 Feb, 2013

21 commits

c1f194952 mm/memory_hotplug: use pgdat_end_pfn() instead of open coding the same. ... Browse Code »

Replace open coded pgdat_end_pfn() with helper function.

Signed-off-by: Cody P Schafer
Cc: David Hansen
Cc: Catalin Marinas
Cc: Johannes Weiner
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cody P Schafer
12 years ago
64dd1b29b mm/memory_hotplug: use ensure_zone_is_initialized() ... Browse Code »

Remove open coding of ensure_zone_is_initialzied().

Signed-off-by: Cody P Schafer
Cc: David Hansen
Cc: Catalin Marinas
Cc: Johannes Weiner
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cody P Schafer
12 years ago
f6bbb78e5 mm: add helper ensure_zone_is_initialized() ... Browse Code »

ensure_zone_is_initialized() checks if a zone is in a empty & not
initialized state (typically occuring after it is created in memory
hotplugging), and, if so, calls init_currently_empty_zone() to
initialize the zone.

Signed-off-by: Cody P Schafer
Cc: David Hansen
Cc: Catalin Marinas
Cc: Johannes Weiner
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cody P Schafer
12 years ago
108bcc96e mm: add & use zone_end_pfn() and zone_spans_pfn() ... Browse Code »

Add 2 helpers (zone_end_pfn() and zone_spans_pfn()) to reduce code
duplication.

This also switches to using them in compaction (where an additional
variable needed to be renamed), page_alloc, vmstat, memory_hotplug, and
kmemleak.

Note that in compaction.c I avoid calling zone_end_pfn() repeatedly
because I expect at some point the sycronization issues with start_pfn &
spanned_pages will need fixing, either by actually using the seqlock or
clever memory barrier usage.

Signed-off-by: Cody P Schafer
Cc: David Hansen
Cc: Catalin Marinas
Cc: Johannes Weiner
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cody P Schafer
12 years ago
9c620e2bc mm: remove offlining arg to migrate_pages ... Browse Code »

No functional change, but the only purpose of the offlining argument to
migrate_pages() etc, was to ensure that __unmap_and_move() could migrate a
KSM page for memory hotremove (which took ksm_thread_mutex) but not for
other callers. Now all cases are safe, remove the arg.

Signed-off-by: Hugh Dickins
Cc: Rik van Riel
Cc: Petr Holasek
Cc: Andrea Arcangeli
Cc: Izik Eidus
Cc: Gerald Schaefer
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
12 years ago
c60514b63 mm: increase totalram_pages when free pages allocated by bootmem allocator ... Browse Code »

Function put_page_bootmem() is used to free pages allocated by bootmem
allocator, so it should increase totalram_pages when freeing pages into
the buddy system.

Signed-off-by: Jiang Liu
Cc: Wen Congyang
Cc: David Rientjes
Cc: Jiang Liu
Cc: Maciej Rutecki
Cc: Chris Clayton
Cc: "Rafael J . Wysocki"
Cc: Mel Gorman
Cc: Minchan Kim
Cc: KAMEZAWA Hiroyuki
Cc: Michal Hocko
Cc: Jianguo Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
12 years ago
e13fe8695 cpu-hotplug,memory-hotplug: clear cpu_to_node() when offlining the node ... Browse Code »

When the node is offlined, there is no memory/cpu on the node. If a
sleep task runs on a cpu of this node, it will be migrated to the cpu on
the other node. So we can clear cpu-to-node mapping.

[akpm@linux-foundation.org: numa_clear_node() and numa_set_node() can no longer be __cpuinit]
Signed-off-by: Wen Congyang
Signed-off-by: Tang Chen
Cc: Yasuaki Ishimatsu
Cc: David Rientjes
Cc: Jiang Liu
Cc: Minchan Kim
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wen Congyang
12 years ago
90b30cdc1 memory-hotplug: export the function try_offline_node() ... Browse Code »

try_offline_node() will be needed in the tristate
drivers/acpi/processor_driver.c.

The node will be offlined when all memory/cpu on the node have been
hotremoved. So we need the function try_offline_node() in cpu-hotplug
path.

If the memory-hotplug is disabled, and cpu-hotplug is enabled

1. no memory no the node
we don't online the node, and cpu's node is the nearest node.

2. the node contains some memory
the node has been onlined, and cpu's node is still needed
to migrate the sleep task on the cpu to the same node.

So we do nothing in try_offline_node() in this case.

[rientjes@google.com: export the function try_offline_node() fix]
Signed-off-by: Wen Congyang
Signed-off-by: Tang Chen
Cc: Yasuaki Ishimatsu
Cc: David Rientjes
Cc: Jiang Liu
Cc: Minchan Kim
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Peter Zijlstra
Cc: Len Brown
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wen Congyang
12 years ago
a1e565aa3 memory-hotplug: do not allocate pgdat if it was not freed when offline. ... Browse Code »

Since there is no way to guarentee the address of pgdat/zone is not on
stack of any kernel threads or used by other kernel objects without
reference counting or other symchronizing method, we cannot reset
node_data and free pgdat when offlining a node. Just reset pgdat to 0
and reuse the memory when the node is online again.

The problem is suggested by Kamezawa Hiroyuki. The idea is from Wen
Congyang.

NOTE: If we don't reset pgdat to 0, the WARN_ON in free_area_init_node()
will be triggered.

[akpm@linux-foundation.org: fix warning when CONFIG_NEED_MULTIPLE_NODES=n]
[akpm@linux-foundation.org: fix the warning again again]
Signed-off-by: Tang Chen
Reviewed-by: Wen Congyang
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Yasuaki Ishimatsu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
12 years ago
d822b86a9 memory-hotplug: free node_data when a node is offlined ... Browse Code »

We call hotadd_new_pgdat() to allocate memory to store node_data. So we
should free it when removing a node.

Signed-off-by: Wen Congyang
Signed-off-by: Tang Chen
Reviewed-by: Kamezawa Hiroyuki
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Yasuaki Ishimatsu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wen Congyang
12 years ago
60a5a19e7 memory-hotplug: remove sysfs file of node ... Browse Code »

Introduce a new function try_offline_node() to remove sysfs file of node
when all memory sections of this node are removed. If some memory
sections of this node are not removed, this function does nothing.

Signed-off-by: Wen Congyang
Signed-off-by: Tang Chen
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Yasuaki Ishimatsu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
12 years ago
815121d2b memory_hotplug: clear zone when removing the memory ... Browse Code »

When memory is added, we update zone's and pgdat's start_pfn and
spanned_pages in __add_zone(). So we should revert them when the memory
is removed.

The patch adds a new function __remove_zone() to do this.

Signed-off-by: Yasuaki Ishimatsu
Signed-off-by: Wen Congyang
Signed-off-by: Tang Chen
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasuaki Ishimatsu
12 years ago
5fc1d66a2 memory-hotplug: integrated __remove_section() of CONFIG_SPARSEMEM_VMEMMAP. ... Browse Code »

Currently __remove_section for SPARSEMEM_VMEMMAP does nothing. But even
if we use SPARSEMEM_VMEMMAP, we can unregister the memory_section.

Signed-off-by: Yasuaki Ishimatsu
Signed-off-by: Wen Congyang
Signed-off-by: Tang Chen
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
12 years ago
cd099682e memory-hotplug: move pgdat_resize_lock into sparse_remove_one_section() ... Browse Code »

In __remove_section(), we locked pgdat_resize_lock when calling
sparse_remove_one_section(). This lock will disable irq. But we don't
need to lock the whole function. If we do some work to free pagetables
in free_section_usemap(), we need to call flush_tlb_all(), which need
irq enabled. Otherwise the WARN_ON_ONCE() in smp_call_function_many()
will be triggered.

If we lock the whole sparse_remove_one_section(), then we come to this call trace:

------------[ cut here ]------------
WARNING: at kernel/smp.c:461 smp_call_function_many+0xbd/0x260()
Hardware name: PRIMEQUEST 1800E
......
Call Trace:
smp_call_function_many+0xbd/0x260
smp_call_function+0x3b/0x50
on_each_cpu+0x3b/0xc0
flush_tlb_all+0x1c/0x20
remove_pagetable+0x14e/0x1d0
vmemmap_free+0x18/0x20
sparse_remove_one_section+0xf7/0x100
__remove_section+0xa2/0xb0
__remove_pages+0xa0/0xd0
arch_remove_memory+0x6b/0xc0
remove_memory+0xb8/0xf0
acpi_memory_device_remove+0x53/0x96
acpi_device_remove+0x90/0xb2
__device_release_driver+0x7c/0xf0
device_release_driver+0x2f/0x50
acpi_bus_remove+0x32/0x6d
acpi_bus_trim+0x91/0x102
acpi_bus_hot_remove_device+0x88/0x16b
acpi_os_execute_deferred+0x27/0x34
process_one_work+0x20e/0x5c0
worker_thread+0x12e/0x370
kthread+0xee/0x100
ret_from_fork+0x7c/0xb0
---[ end trace 25e85300f542aa01 ]---

Signed-off-by: Tang Chen
Signed-off-by: Lai Jiangshan
Signed-off-by: Wen Congyang
Acked-by: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Wu Jianguo
Cc: Yasuaki Ishimatsu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
12 years ago
46723bfa5 memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap ... Browse Code »

For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by
get_page_bootmem(). So the patch searches pages of virtual mapping and
registers the pages by get_page_bootmem().

NOTE: register_page_bootmem_memmap() is not implemented for ia64,
ppc, s390, and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE
and revert register_page_bootmem_info_node() when platform doesn't
support it.

It's implemented by adding a new Kconfig option named
CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected
by memory-hotplug feature fully supported archs(currently only on
x86_64).

Since we have 2 config options called MEMORY_HOTPLUG and
MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately,
and codes in function register_page_bootmem_info_node() are only
used for collecting infomation for hot-remove, so reside it under
MEMORY_HOTREMOVE.

Besides page_isolation.c selected by MEMORY_ISOLATION under
MEMORY_HOTPLUG is also such case, move it too.

[mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE]
[linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()]
[mhocko@suse.cz: remove the arch specific functions without any implementation]
[linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed]
[rientjes@google.com: fix defined but not used warning]
Signed-off-by: Wen Congyang
Signed-off-by: Yasuaki Ishimatsu
Signed-off-by: Tang Chen
Reviewed-by: Wu Jianguo
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Michal Hocko
Signed-off-by: Lin Feng
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasuaki Ishimatsu
12 years ago
24d335ca3 memory-hotplug: introduce new arch_remove_memory() for removing page table ... Browse Code »

For removing memory, we need to remove page tables. But it depends on
architecture. So the patch introduce arch_remove_memory() for removing
page table. Now it only calls __remove_pages().

Note: __remove_pages() for some archtecuture is not implemented
(I don't know how to implement it for s390).

Signed-off-by: Wen Congyang
Signed-off-by: Tang Chen
Acked-by: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Yasuaki Ishimatsu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wen Congyang
12 years ago
46c66c4b7 memory-hotplug: remove /sys/firmware/memmap/X sysfs ... Browse Code »

When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start,
type} sysfs files are created. But there is no code to remove these
files. This patch implements the function to remove them.

We cannot free firmware_map_entry which is allocated by bootmem because
there is no way to do so when the system is up. But we can at least
remember the address of that memory and reuse the storage when the
memory is added next time.

This patch also introduces a new list map_entries_bootmem to link the
map entries allocated by bootmem when they are removed, and a lock to
protect it. And these entries will be reused when the memory is
hot-added again.

The idea is suggestted by Andrew Morton.

NOTE: It is unsafe to return an entry pointer and release the
map_entries_lock. So we should not hold the map_entries_lock
separately in firmware_map_find_entry() and
firmware_map_remove_entry(). Hold the map_entries_lock across find
and remove /sys/firmware/memmap/X operation.

And also, users of these two functions need to be careful to
hold the lock when using these two functions.

[tangchen@cn.fujitsu.com: Hold spinlock across find|remove /sys operation]
[tangchen@cn.fujitsu.com: fix the wrong comments of map_entries]
[tangchen@cn.fujitsu.com: reuse the storage of /sys/firmware/memmap/X/ allocated by bootmem]
[tangchen@cn.fujitsu.com: fix section mismatch problem]
[tangchen@cn.fujitsu.com: fix the doc format in drivers/firmware/memmap.c]
Signed-off-by: Wen Congyang
Signed-off-by: Yasuaki Ishimatsu
Signed-off-by: Tang Chen
Reviewed-by: Kamezawa Hiroyuki
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Lai Jiangshan
Cc: Tang Chen
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Julian Calaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasuaki Ishimatsu
12 years ago
bbc76be67 memory-hotplug: remove redundant codes ... Browse Code »

offlining memory blocks and checking whether memory blocks are offlined
are very similar. This patch introduces a new function to remove
redundant codes.

Signed-off-by: Wen Congyang
Signed-off-by: Tang Chen
Reviewed-by: Kamezawa Hiroyuki
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Yasuaki Ishimatsu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wen Congyang
12 years ago
6677e3eaf memory-hotplug: check whether all memory blocks are offlined or not when removing memory ... Browse Code »

We remove the memory like this:

1. lock memory hotplug
2. offline a memory block
3. unlock memory hotplug
4. repeat 1-3 to offline all memory blocks
5. lock memory hotplug
6. remove memory(TODO)
7. unlock memory hotplug

All memory blocks must be offlined before removing memory. But we don't
hold the lock in the whole operation. So we should check whether all
memory blocks are offlined before step6. Otherwise, kernel maybe
panicked.

Offlining a memory block and removing a memory device can be two
different operations. Users can just offline some memory blocks without
removing the memory device. For this purpose, the kernel has held
lock_memory_hotplug() in __offline_pages(). To reuse the code for
memory hot-remove, we repeat step 1-3 to offline all the memory blocks,
repeatedly lock and unlock memory hotplug, but not hold the memory
hotplug lock in the whole operation.

Signed-off-by: Wen Congyang
Signed-off-by: Yasuaki Ishimatsu
Signed-off-by: Tang Chen
Acked-by: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasuaki Ishimatsu
12 years ago
993c1aad8 memory-hotplug: try to offline the memory twice to avoid dependence ... Browse Code »

memory can't be offlined when CONFIG_MEMCG is selected. For example:
there is a memory device on node 1. The address range is [1G, 1.5G).
You will find 4 new directories memory8, memory9, memory10, and memory11
under the directory /sys/devices/system/memory/.

If CONFIG_MEMCG is selected, we will allocate memory to store page
cgroup when we online pages. When we online memory8, the memory stored
page cgroup is not provided by this memory device. But when we online
memory9, the memory stored page cgroup may be provided by memory8. So
we can't offline memory8 now. We should offline the memory in the
reversed order.

When the memory device is hotremoved, we will auto offline memory
provided by this memory device. But we don't know which memory is
onlined first, so offlining memory may fail. In such case, iterate
twice to offline the memory. 1st iterate: offline every non primary
memory block. 2nd iterate: offline primary (i.e. first added) memory
block.

This idea is suggested by KOSAKI Motohiro.

Signed-off-by: Wen Congyang
Signed-off-by: Tang Chen
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Kamezawa Hiroyuki
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Yasuaki Ishimatsu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wen Congyang
12 years ago
a864b9d06 mm: memory_hotplug: no need to check res twice in add_memory ... Browse Code »

Remove one redundant check of res.

Signed-off-by: Sasha Levin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sasha Levin
12 years ago

19 Dec, 2012

1 commit

79a4dcefd mm/memory_hotplug.c: improve comments ... Browse Code »

Signed-off-by: Tang Chen
Cc: Jiang Liu
Cc: Lai Jiangshan
Cc: Wen Congyang
Cc: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
13 years ago

17 Dec, 2012

1 commit

3d59eebc5 Merge tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma ... Browse Code »

Pull Automatic NUMA Balancing bare-bones from Mel Gorman:
"There are three implementations for NUMA balancing, this tree
(balancenuma), numacore which has been developed in tip/master and
autonuma which is in aa.git.

In almost all respects balancenuma is the dumbest of the three because
its main impact is on the VM side with no attempt to be smart about
scheduling. In the interest of getting the ball rolling, it would be
desirable to see this much merged for 3.8 with the view to building
scheduler smarts on top and adapting the VM where required for 3.9.

The most recent set of comparisons available from different people are

mel: https://lkml.org/lkml/2012/12/9/108
mingo: https://lkml.org/lkml/2012/12/7/331
tglx: https://lkml.org/lkml/2012/12/10/437
srikar: https://lkml.org/lkml/2012/12/10/397

The results are a mixed bag. In my own tests, balancenuma does
reasonably well. It's dumb as rocks and does not regress against
mainline. On the other hand, Ingo's tests shows that balancenuma is
incapable of converging for this workloads driven by perf which is bad
but is potentially explained by the lack of scheduler smarts. Thomas'
results show balancenuma improves on mainline but falls far short of
numacore or autonuma. Srikar's results indicate we all suffer on a
large machine with imbalanced node sizes.

My own testing showed that recent numacore results have improved
dramatically, particularly in the last week but not universally.
We've butted heads heavily on system CPU usage and high levels of
migration even when it shows that overall performance is better.
There are also cases where it regresses. Of interest is that for
specjbb in some configurations it will regress for lower numbers of
warehouses and show gains for higher numbers which is not reported by
the tool by default and sometimes missed in treports. Recently I
reported for numacore that the JVM was crashing with
NullPointerExceptions but currently it's unclear what the source of
this problem is. Initially I thought it was in how numacore batch
handles PTEs but I'm no longer think this is the case. It's possible
numacore is just able to trigger it due to higher rates of migration.

These reports were quite late in the cycle so I/we would like to start
with this tree as it contains much of the code we can agree on and has
not changed significantly over the last 2-3 weeks."

* tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits)
mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
mm/rmap: Convert the struct anon_vma::mutex to an rwsem
mm: migrate: Account a transhuge page properly when rate limiting
mm: numa: Account for failed allocations and isolations as migration failures
mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
mm: numa: Add THP migration for the NUMA working set scanning fault case.
mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node
mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG
mm: sched: numa: Control enabling and disabling of NUMA balancing
mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate
mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely tasknode relationships
mm: numa: migrate: Set last_nid on newly allocated page
mm: numa: split_huge_page: Transfer last_nid on tail page
mm: numa: Introduce last_nid to the page frame
sched: numa: Slowly increase the scanning period as NUMA faults are handled
mm: numa: Rate limit setting of pte_numa if node is saturated
mm: numa: Rate limit the amount of memory that is migrated between nodes
mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting
mm: numa: Migrate pages handled during a pmd_numa hinting fault
mm: numa: Migrate on reference policy
...

Linus Torvalds
13 years ago

14 Dec, 2012

2 commits

f6e858a00 Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge misc VM changes from Andrew Morton:
"The rest of most-of-MM. The other MM bits await a slab merge.

This patch includes the addition of a huge zero_page. Not a
performance boost but it an save large amounts of physical memory in
some situations.

Also a bunch of Fujitsu engineers are working on memory hotplug.
Which, as it turns out, was badly broken. About half of their patches
are included here; the remainder are 3.8 material."

However, this merge disables CONFIG_MOVABLE_NODE, which was totally
broken. We don't add new features with "default y", nor do we add
Kconfig questions that are incomprehensible to most people without any
help text. Does the feature even make sense without compaction or
memory hotplug?

* akpm: (54 commits)
mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()
mm/memory.c: remove unused code from do_wp_page()
asm-generic, mm: pgtable: consolidate zero page helpers
mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage
hwpoison, hugetlbfs: fix RSS-counter warning
hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage
mm: protect against concurrent vma expansion
memcg: do not check for mm in __mem_cgroup_count_vm_event
tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
mm: provide more accurate estimation of pages occupied by memmap
fs/buffer.c: remove redundant initialization in alloc_page_buffers()
fs/buffer.c: do not inline exported function
writeback: fix a typo in comment
mm: introduce new field "managed_pages" to struct zone
mm, oom: remove statically defined arch functions of same name
mm, oom: remove redundant sleep in pagefault oom handler
mm, oom: cleanup pagefault oom handler
memory_hotplug: allow online/offline memory to result movable node
numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
mm, memcg: avoid unnecessary function call when memcg is disabled
...

Linus Torvalds
13 years ago
a2013a13e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial branch from Jiri Kosina:
"Usual stuff -- comment/printk typo fixes, documentation updates, dead
code elimination."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
HOWTO: fix double words typo
x86 mtrr: fix comment typo in mtrr_bp_init
propagate name change to comments in kernel source
doc: Update the name of profiling based on sysfs
treewide: Fix typos in various drivers
treewide: Fix typos in various Kconfig
wireless: mwifiex: Fix typo in wireless/mwifiex driver
messages: i2o: Fix typo in messages/i2o
scripts/kernel-doc: check that non-void fcts describe their return value
Kernel-doc: Convention: Use a "Return" section to describe return values
radeon: Fix typo and copy/paste error in comments
doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
various: Fix spelling of "asynchronous" in comments.
Fix misspellings of "whether" in comments.
eisa: Fix spelling of "asynchronous".
various: Fix spelling of "registered" in comments.
doc: fix quite a few typos within Documentation
target: iscsi: fix comment typos in target/iscsi drivers
treewide: fix typo of "suport" in various comments and Kconfig
treewide: fix typo of "suppport" in various comments
...

Linus Torvalds
13 years ago

13 Dec, 2012

3 commits

9feedc9d8 mm: introduce new field "managed_pages" to struct zone ... Browse Code »

Currently a zone's present_pages is calcuated as below, which is
inaccurate and may cause trouble to memory hotplug.

spanned_pages - absent_pages - memmap_pages - dma_reserve.

During fixing bugs caused by inaccurate zone->present_pages, we found
zone->present_pages has been abused. The field zone->present_pages may
have different meanings in different contexts:

1) pages existing in a zone.
2) pages managed by the buddy system.

For more discussions about the issue, please refer to:
http://lkml.org/lkml/2012/11/5/866
https://patchwork.kernel.org/patch/1346751/

This patchset tries to introduce a new field named "managed_pages" to
struct zone, which counts "pages managed by the buddy system". And revert
zone->present_pages to count "physical pages existing in a zone", which
also keep in consistence with pgdat->node_present_pages.

We will set an initial value for zone->managed_pages in function
free_area_init_core() and will adjust it later if the initial value is
inaccurate.

For DMA/normal zones, the initial value is set to:

(spanned_pages - absent_pages - memmap_pages - dma_reserve)

Later zone->managed_pages will be adjusted to the accurate value when the
bootmem allocator frees all free pages to the buddy system in function
free_all_bootmem_node() and free_all_bootmem().

The bootmem allocator doesn't touch highmem pages, so highmem zones'
managed_pages is set to the accurate value "spanned_pages - absent_pages"
in function free_area_init_core() and won't be updated anymore.

This patch also adds a new field "managed_pages" to /proc/zoneinfo
and sysrq showmem.

[akpm@linux-foundation.org: small comment tweaks]
Signed-off-by: Jiang Liu
Cc: Wen Congyang
Cc: David Rientjes
Cc: Maciej Rutecki
Tested-by: Chris Clayton
Cc: "Rafael J . Wysocki"
Cc: Mel Gorman
Cc: Minchan Kim
Cc: KAMEZAWA Hiroyuki
Cc: Michal Hocko
Cc: Jianguo Wu
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
13 years ago
09285af75 memory_hotplug: allow online/offline memory to result movable node ... Browse Code »

Now, memory management can handle movable node or nodes which don't have
any normal memory, so we can dynamic configure and add movable node by:

online a ZONE_MOVABLE memory from a previous offline node
offline the last normal memory which result a non-normal-memory-node

movable-node is very important for power-saving, hardware partitioning and
high-available-system(hardware fault management).

Signed-off-by: Lai Jiangshan
Tested-by: Yasuaki Ishimatsu
Signed-off-by: Wen Congyang
Cc: Jiang Liu
Cc: KOSAKI Motohiro
Cc: Minchan Kim
Cc: Mel Gorman
Cc: David Rientjes
Cc: Yinghai Lu
Cc: Rusty Russell
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
13 years ago
6715ddf94 hotplug: update nodemasks management ... Browse Code »

Update nodemasks management for N_MEMORY.

[lliubbo@gmail.com: fix build]
Signed-off-by: Lai Jiangshan
Signed-off-by: Wen Congyang
Cc: Christoph Lameter
Cc: Hillf Danton
Cc: Lin Feng
Cc: David Rientjes
Signed-off-by: Bob Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
13 years ago

12 Dec, 2012

5 commits

74d42d8fe memory_hotplug: ensure every online node has NORMAL memory ... Browse Code »

Old memory hotplug code and new online/movable may cause a online node
don't have any normal memory, but memory-management acts bad when we have
nodes which is online but don't have any normal memory. Example: it may
cause a bound task fail on all kernel allocation and cause the task can't
create task or create other kernel object.

So we disable non-normal-memory-node here, we will enable it when we
prepared.

Signed-off-by: Lai Jiangshan
Signed-off-by: Wen Congyang
Cc: Yasuaki Ishimatsu
Cc: Lai Jiangshan
Cc: Jiang Liu
Cc: KOSAKI Motohiro
Cc: Minchan Kim
Cc: Mel Gorman
Cc: David Rientjes
Cc: Yinghai Lu
Cc: Rusty Russell
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
13 years ago
e455a9b92 memory_hotplug: handle empty zone when online_movable/online_kernel ... Browse Code »

Make online_movable/online_kernel can empty a zone or can move memory to a
empty zone.

Signed-off-by: Lai Jiangshan
Signed-off-by: Wen Congyang
Cc: Yasuaki Ishimatsu
Cc: Lai Jiangshan
Cc: Jiang Liu
Cc: KOSAKI Motohiro
Cc: Minchan Kim
Cc: Mel Gorman
Cc: David Rientjes
Cc: Yinghai Lu
Cc: Rusty Russell
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
13 years ago
511c2aba8 mm, memory-hotplug: dynamic configure movable memory and portion memory ... Browse Code »

Add online_movable and online_kernel for logic memory hotplug. This is
the dynamic version of "movablecore" & "kernelcore".

We have the same reason to introduce it as to introduce "movablecore" &
"kernelcore". It has the same motive as "movablecore" & "kernelcore", but
it is dynamic/running-time:

o We can configure memory as kernelcore or movablecore after boot.

Userspace workload is increased, we need more hugepage, we can't use
"online_movable" to add memory and allow the system use more
THP(transparent-huge-page), vice-verse when kernel workload is increase.

Also help for virtualization to dynamic configure host/guest's memory,
to save/(reduce waste) memory.

Memory capacity on Demand

o When a new node is physically online after boot, we need to use
"online_movable" or "online_kernel" to configure/portion it as we
expected when we logic-online it.

This configuration also helps for physically-memory-migrate.

o all benefit as the same as existed "movablecore" & "kernelcore".

o Preparing for movable-node, which is very important for power-saving,
hardware partitioning and high-available-system(hardware fault
management).

(Note, we don't introduce movable-node here.)

Action behavior:
When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.

When it is online by "online_kernel", the kernel can use it.
When it is online by "online", the zone type doesn't changed.

Current constraints:
Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be online from ZONE_NORMAL to ZONE_MOVABLE.

[akpm@linux-foundation.org: use min_t, cleanups]
Signed-off-by: Lai Jiangshan
Signed-off-by: Wen Congyang
Cc: Yasuaki Ishimatsu
Cc: Lai Jiangshan
Cc: Jiang Liu
Cc: KOSAKI Motohiro
Cc: Minchan Kim
Cc: Mel Gorman
Cc: David Rientjes
Cc: Yinghai Lu
Cc: Rusty Russell
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
13 years ago
712cd386f mm/memory_hotplug.c: update start_pfn in zone and pg_data when spanned_pages == 0. ... Browse Code »

If we hot-remove memory only and leave the cpus alive, the corresponding
node will not be removed. But the node_start_pfn and node_spanned_pages
in pg_data will be reset to 0. In this case, when we hot-add the memory
back next time, the node_start_pfn will always be 0 because no pfn is less
than 0. After that, if we hot-remove the memory again, it will cause
kernel panic in function find_biggest_section_pfn() when it tries to scan
all the pfns.

The zone will also have the same problem.

This patch sets start_pfn to the start_pfn of the section being added when
spanned_pages of the zone or pg_data is 0.

---How to reproduce---

1. hot-add a container with some memory and cpus;
2. hot-remove the container's memory, and leave cpus there;
3. hot-add these memory again;
4. hot-remove them again;

then, the kernel will panic.

---Call trace---

BUG: unable to handle kernel paging request at 00000fff82a8cc38
IP: [] find_biggest_section_pfn+0xe5/0x180
......
Call Trace:
[] __remove_zone+0x184/0x1b0
[] __remove_section+0x8c/0xb0
[] __remove_pages+0xe7/0x120
[] arch_remove_memory+0x2c/0x80
[] remove_memory+0x56/0x90
[] acpi_memory_device_remove_memory+0x48/0x73
[] acpi_memory_device_notify+0x153/0x274
[] acpi_ev_notify_dispatch+0x41/0x5f
[] acpi_os_execute_deferred+0x27/0x34
[] process_one_work+0x219/0x680
[] worker_thread+0x12e/0x320
[] kthread+0xc6/0xd0
[] kernel_thread_helper+0x4/0x10
......
---[ end trace 96d845dbf33fee11 ]---

Signed-off-by: Tang Chen
Cc: Yasuaki Ishimatsu
Cc: Wen Congyang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
13 years ago
d9713679d memory_hotplug: fix possible incorrect node_states[N_NORMAL_MEMORY] ... Browse Code »

Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY], it
forgets to manage node_states[N_NORMAL_MEMORY]. This may cause
node_states[N_NORMAL_MEMORY] to become incorrect.

Example, if a node is empty before online, and we online a memory which is
in ZONE_NORMAL. And after online, node_states[N_HIGH_MEMORY] is correct,
but node_states[N_NORMAL_MEMORY] is incorrect, the online code doesn't set
the new online node to node_states[N_NORMAL_MEMORY].

The same thing will happen when offlining (the offline code doesn't clear
the node from node_states[N_NORMAL_MEMORY] when needed). Some memory
managment code depends node_states[N_NORMAL_MEMORY], so we have to fix up
the node_states[N_NORMAL_MEMORY].

We add node_states_check_changes_online() and
node_states_check_changes_offline() to detect whether
node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY] are changed
while hotpluging.

Also add @status_change_nid_normal to struct memory_notify, thus the
memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY] are
changed. (We can add a @flags and reuse @status_change_nid instead of
introducing @status_change_nid_normal, but it will add much more
complexity in memory hotplug callback in every subsystem. So introducing
@status_change_nid_normal is better and it doesn't change the sematics of
@status_change_nid)

Signed-off-by: Lai Jiangshan
Cc: David Rientjes
Cc: Minchan Kim
Cc: KOSAKI Motohiro
Cc: Yasuaki Ishimatsu
Cc: Rob Landley
Cc: Jiang Liu
Cc: Kay Sievers
Cc: Greg Kroah-Hartman
Cc: Mel Gorman
Cc: Wen Congyang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
13 years ago