Doug / smarc-fsl-linux-kernel | Embedian Git Server

13 Nov, 2013

2 commits

79442ed18 mm/memblock.c: introduce bottom-up allocation mode ... Browse Code »

The Linux kernel cannot migrate pages used by the kernel. As a result,
kernel pages cannot be hot-removed. So we cannot allocate hotpluggable
memory for the kernel.

ACPI SRAT (System Resource Affinity Table) contains the memory hotplug
info. But before SRAT is parsed, memblock has already started to allocate
memory for the kernel. So we need to prevent memblock from doing this.

In a memory hotplug system, any numa node the kernel resides in should be
unhotpluggable. And for a modern server, each node could have at least
16GB memory. So memory around the kernel image is highly likely
unhotpluggable.

So the basic idea is: Allocate memory from the end of the kernel image and
to the higher memory. Since memory allocation before SRAT is parsed won't
be too much, it could highly likely be in the same node with kernel image.

The current memblock can only allocate memory top-down. So this patch
introduces a new bottom-up allocation mode to allocate memory bottom-up.
And later when we use this allocation direction to allocate memory, we
will limit the start address above the kernel.

Signed-off-by: Tang Chen
Signed-off-by: Zhang Yanfei
Acked-by: Toshi Kani
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Tejun Heo
Cc: Wanpeng Li
Cc: Thomas Renninger
Cc: Yinghai Lu
Cc: Jiang Liu
Cc: Wen Congyang
Cc: Lai Jiangshan
Cc: Yasuaki Ishimatsu
Cc: Taku Izumi
Cc: Mel Gorman
Cc: Michal Nazarewicz
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: Kamezawa Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2013-11-13 11:09:08 +0800
1402899e4 mm/memblock.c: factor out of top-down allocation ... Browse Code »

[Problem]

The current Linux cannot migrate pages used by the kernel because of the
kernel direct mapping. In Linux kernel space, va = pa + PAGE_OFFSET.
When the pa is changed, we cannot simply update the pagetable and keep the
va unmodified. So the kernel pages are not migratable.

There are also some other issues will cause the kernel pages not
migratable. For example, the physical address may be cached somewhere and
will be used. It is not to update all the caches.

When doing memory hotplug in Linux, we first migrate all the pages in one
memory device somewhere else, and then remove the device. But if pages
are used by the kernel, they are not migratable. As a result, memory used
by the kernel cannot be hot-removed.

Modifying the kernel direct mapping mechanism is too difficult to do. And
it may cause the kernel performance down and unstable. So we use the
following way to do memory hotplug.

[What we are doing]

In Linux, memory in one numa node is divided into several zones. One of
the zones is ZONE_MOVABLE, which the kernel won't use.

In order to implement memory hotplug in Linux, we are going to arrange all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use these
memory. To do this, we need ACPI's help.

In ACPI, SRAT(System Resource Affinity Table) contains NUMA info. The
memory affinities in SRAT record every memory range in the system, and
also, flags specifying if the memory range is hotpluggable. (Please refer
to ACPI spec 5.0 5.2.16)

With the help of SRAT, we have to do the following two things to achieve our
goal:

1. When doing memory hot-add, allow the users arranging hotpluggable as
ZONE_MOVABLE.
(This has been done by the MOVABLE_NODE functionality in Linux.)

2. when the system is booting, prevent bootmem allocator from allocating
hotpluggable memory for the kernel before the memory initialization
finishes.

The problem 2 is the key problem we are going to solve. But before solving it,
we need some preparation. Please see below.

[Preparation]

Bootloader has to load the kernel image into memory. And this memory must
be unhotpluggable. We cannot prevent this anyway. So in a memory hotplug
system, we can assume any node the kernel resides in is not hotpluggable.

Before SRAT is parsed, we don't know which memory ranges are hotpluggable.
But memblock has already started to work. In the current kernel,
memblock allocates the following memory before SRAT is parsed:

setup_arch()
|->memblock_x86_fill() /* memblock is ready */
|......
|->early_reserve_e820_mpc_new() /* allocate memory under 1MB */
|->reserve_real_mode() /* allocate memory under 1MB */
|->init_mem_mapping() /* allocate page tables, about 2MB to map 1GB memory */
|->dma_contiguous_reserve() /* specified by user, should be low */
|->setup_log_buf() /* specified by user, several mega bytes */
|->relocate_initrd() /* could be large, but will be freed after boot, should reorder */
|->acpi_initrd_override() /* several mega bytes */
|->reserve_crashkernel() /* could be large, should reorder */
|......
|->initmem_init() /* Parse SRAT */

According to Tejun's advice, before SRAT is parsed, we should try our best
to allocate memory near the kernel image. Since the whole node the kernel
resides in won't be hotpluggable, and for a modern server, a node may have
at least 16GB memory, allocating several mega bytes memory around the
kernel image won't cross to hotpluggable memory.

[About this patchset]

So this patchset is the preparation for the problem 2 that we want to
solve. It does the following:

1. Make memblock be able to allocate memory bottom up.
1) Keep all the memblock APIs' prototype unmodified.
2) When the direction is bottom up, keep the start address greater than the
end of kernel image.

2. Improve init_mem_mapping() to support allocate page tables in
bottom up direction.

3. Introduce "movable_node" boot option to enable and disable this
functionality.

This patch (of 6):

Create a new function __memblock_find_range_top_down to factor out of
top-down allocation from memblock_find_in_range_node. This is a
preparation because we will introduce a new bottom-up allocation mode in
the following patch.

Signed-off-by: Tang Chen
Signed-off-by: Zhang Yanfei
Acked-by: Tejun Heo
Acked-by: Toshi Kani
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Wanpeng Li
Cc: Thomas Renninger
Cc: Yinghai Lu
Cc: Jiang Liu
Cc: Wen Congyang
Cc: Lai Jiangshan
Cc: Yasuaki Ishimatsu
Cc: Taku Izumi
Cc: Mel Gorman
Cc: Michal Nazarewicz
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: Kamezawa Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2013-11-13 11:09:08 +0800

12 Sep, 2013

1 commit

e76b63f80 memblock, numa: binary search node id ... Browse Code »

Current early_pfn_to_nid() on arch that support memblock go over
memblock.memory one by one, so will take too many try near the end.

We can use existing memblock_search to find the node id for given pfn,
that could save some time on bigger system that have many entries
memblock.memory array.

Here are the timing differences for several machines. In each case with
the patch less time was spent in __early_pfn_to_nid().

3.11-rc5 with patch difference (%)
-------- ---------- --------------
UV1: 256 nodes 9TB: 411.66 402.47 -9.19 (2.23%)
UV2: 255 nodes 16TB: 1141.02 1138.12 -2.90 (0.25%)
UV2: 64 nodes 2TB: 128.15 126.53 -1.62 (1.26%)
UV2: 32 nodes 2TB: 121.87 121.07 -0.80 (0.66%)
Time in seconds.

Signed-off-by: Yinghai Lu
Cc: Tejun Heo
Acked-by: Russ Anderson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2013-09-12 06:57:51 +0800

10 Jul, 2013

1 commit

d8bbdd773 mm/memblock.c: fix wrong comment in __next_free_mem_range() ... Browse Code »

Remove one redundant "nid" in the comment.

Signed-off-by: Tang Chen
Signed-off-by: Linus Torvalds

Tang Chen
2013-07-10 01:33:23 +0800

30 Apr, 2013

2 commits

209ff86d6 memblock: fix missing comment of memblock_insert_region() ... Browse Code »

There is no comment for parameter nid of memblock_insert_region().
This patch adds comment for it.

Signed-off-by: Tang Chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2013-04-30 06:54:38 +0800
94f3d3afb memblock: add assertion for zero allocation alignment ... Browse Code »

This came to light when calling memblock allocator from arc port (for
copying flattended DT). If a "0" alignment is passed, the allocator
round_up() call incorrectly rounds up the size to 0.

round_up(num, alignto) => ((num - 1) | (alignto -1)) + 1

While the obvious allocation failure causes kernel to panic, it is better
to warn the caller to fix the code.

Tejun suggested that instead of BUG_ON(!align) - which might be
ineffective due to pending console init and such, it is better to WARN_ON,
and continue the boot with a reasonable default align.

Caller passing @size need not be handled similarly as the subsequent
panic will indicate that anyhow.

Signed-off-by: Vineet Gupta
Cc: Yinghai Lu
Cc: Wanpeng Li
Cc: Ingo Molnar
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vineet Gupta
2013-04-30 06:54:28 +0800

03 Mar, 2013

1 commit

20e6926dc x86, ACPI, mm: Revert movablemem_map support ... Browse Code »

Tim found:

WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
Hardware name: S2600CP
sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
smpboot: Booting Node 1, Processors #1
Modules linked in:
Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
Call Trace:
set_cpu_sibling_map+0x279/0x449
start_secondary+0x11d/0x1e5

Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")

It turns out movable_map has some problems, and it breaks several things

1. numa_init is called several times, NOT just for srat. so those
nodes_clear(numa_nodes_parsed)
memset(&numa_meminfo, 0, sizeof(numa_meminfo))
can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy.
and make fall back path working.

2. simply split acpi_numa_init to early_parse_srat.
a. that early_parse_srat is NOT called for ia64, so you break ia64.
b. for (i = 0; i < MAX_LOCAL_APIC; i++)
set_apicid_to_node(i, NUMA_NO_NODE)
still left in numa_init. So it will just clear result from early_parse_srat.
it should be moved before that....
c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
early before override from INITRD is settled.

3. that patch TITLE is total misleading, there is NO x86 in the title,
but it changes critical x86 code. It caused x86 guys did not
pay attention to find the problem early. Those patches really should
be routed via tip/x86/mm.

4. after that commit, following range can not use movable ram:
a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
b. initrd... it will be freed after booting, so it could be on movable...
c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
anymore.
d. init_mem_mapping: can not put page table high anymore.
e. initmem_init: vmemmap can not be high local node anymore. That is
not good.

If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.

We have workaround patch that could fix some problems, but some can not
be fixed.

So just remove that offending commit and related ones including:

f7210e6c4ac7 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
protect movablecore_map in memblock_overlaps_region().")

01a178a94e8e ("acpi, memory-hotplug: support getting hotplug info from
SRAT")

27168d38fa20 ("acpi, memory-hotplug: extend movablemem_map ranges to
the end of node")

e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is
ready")

fb06bc8e5f42 ("page_alloc: bootmem limit with movablecore_map")

42f47e27e761 ("page_alloc: make movablemem_map have higher priority")

6981ec31146c ("page_alloc: introduce zone_movable_limit[] to keep
movable limit for nodes")

34b71f1e04fc ("page_alloc: add movable_memmap kernel parameter")

4d59a75125d5 ("x86: get pg_data_t's memory from other node")

Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0. Also
need to find way to put other kernel used ram to local node ram.

Reported-by: Tim Gardner
Reported-by: Don Morris
Bisected-by: Don Morris
Tested-by: Don Morris
Signed-off-by: Yinghai Lu
Cc: Tony Luck
Cc: Thomas Renninger
Cc: Tejun Heo
Cc: Tang Chen
Cc: Yasuaki Ishimatsu
Signed-off-by: Linus Torvalds

Yinghai Lu
2013-03-03 01:34:39 +0800

24 Feb, 2013

2 commits

f7210e6c4 mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in m… ... Browse Code »

…emblock_overlaps_region().

The definition of struct movablecore_map is protected by
CONFIG_HAVE_MEMBLOCK_NODE_MAP but its use in memblock_overlaps_region()
is not. So add CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect the use of
movablecore_map in memblock_overlaps_region().

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Tang Chen
2013-02-24 09:50:14 +0800
fb06bc8e5 page_alloc: bootmem limit with movablecore_map ... Browse Code »

Ensure the bootmem will not allocate memory from areas that may be
ZONE_MOVABLE. The map info is from movablecore_map boot option.

Signed-off-by: Tang Chen
Reviewed-by: Wen Congyang
Reviewed-by: Lai Jiangshan
Tested-by: Lin Feng
Cc: Wu Jianguo
Cc: Mel Gorman
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2013-02-24 09:50:14 +0800

30 Jan, 2013

1 commit

595ad9af8 memblock: Add memblock_mem_size() ... Browse Code »

Use it to get mem size under the limit_pfn.
to replace local version in x86 reserved_initrd.

-v2: remove not needed cast that is pointed out by HPA.

Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/1359058816-7615-29-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin

Yinghai Lu
2013-01-30 11:32:57 +0800

12 Jan, 2013

1 commit

c0232ae86 mm: memblock: fix wrong memmove size in memblock_merge_regions() ... Browse Code »

The memmove span covers from (next+1) to the end of the array, and the
index of next is (i+1), so the index of (next+1) is (i+2). So the size
of remaining array elements is (type->cnt - (i + 2)).

Since the remaining elements of the memblock array are move forward by
one element and there is only one additional element caused by this bug.
So there won't be any write overflow here but read overflow. It may
read one more element out of the array address if the array happens to
be full. Commonly it doesn't matter at all but if the array happens to
be located at the end a memblock, it may cause a invalid read operation
for the physical address doesn't exist.

There are 2 *happens to be* here, so I think the probability is quite
low, I don't know if any guy is haunted by this bug before.

Mostly I think it's user-invisible.

Signed-off-by: Lin Feng
Acked-by: Tejun Heo
Reviewed-by: Wanpeng Li
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lin Feng
2013-01-12 06:54:54 +0800

25 Oct, 2012

1 commit

6ede1fd3c x86, mm: Trim memory in memblock to be page aligned ... Browse Code »

We will not map partial pages, so need to make sure memblock
allocation will not allocate those bytes out.

Also we will use for_each_mem_pfn_range() to loop to map memory
range to keep them consistent.

Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com
Acked-by: Jacob Shin
Signed-off-by: H. Peter Anvin
Cc:

Yinghai Lu
2012-10-25 02:52:21 +0800

09 Oct, 2012

2 commits

c22331166 mm: avoid section mismatch warning for memblock_type_name ... Browse Code »

Following section mismatch warning is thrown during build;

WARNING: vmlinux.o(.text+0x32408f): Section mismatch in reference from the function memblock_type_name() to the variable .meminit.data:memblock
The function memblock_type_name() references
the variable __meminitdata memblock.
This is often because memblock_type_name lacks a __meminitdata
annotation or the annotation of memblock is wrong.

This is because memblock_type_name makes reference to memblock variable
with attribute __meminitdata. Hence, the warning (even if the function is
inline).

[akpm@linux-foundation.org: remove inline]
Signed-off-by: Raghavendra D Prabhu
Cc: Tejun Heo
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Raghavendra D Prabhu
2012-10-09 15:23:01 +0800
e9d24ad30 mm/memblock: use existing interface to set nid ... Browse Code »

Use the existing interface function to set the NUMA node ID (NID) for the
regions, either memory or reserved region.

Signed-off-by: Wanpeng Li
Cc: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Cc: Minchan Kim
Cc: Gavin Shan
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2012-10-09 15:22:47 +0800

05 Sep, 2012

1 commit

15674868d mm/memblock: Use NULL instead of 0 for pointers ... Browse Code »

This type cleanup also fixes the following sparse warning:

mm/memblock.c:249:49: warning: Using plain integer as NULL pointer

Signed-off-by: Sachin Kamat
Acked-by: Tejun Heo
Cc: Andrew Morton
Cc: patches@linaro.org
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar

Sachin Kamat
2012-09-05 14:32:30 +0800

01 Aug, 2012

1 commit

fd07383b6 mm/memblock.c:memblock_double_array(): cosmetic cleanups ... Browse Code »

This function is an 80-column eyesore, quite unnecessarily. Clean that
up, and use standard comment layout style.

Cc: Benjamin Herrenschmidt
Cc: Greg Pearson
Cc: Tejun Heo
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-08-01 09:42:41 +0800

12 Jul, 2012

1 commit

29f673860 memblock: free allocated memblock_reserved_regions later ... Browse Code »

memblock_free_reserved_regions() calls memblock_free(), but
memblock_free() would double reserved.regions too, so we could free the
old range for reserved.regions.

Also tj said there is another bug which could be related to this.

| I don't think we're saving any noticeable
| amount by doing this "free - give it to page allocator - reserve
| again" dancing. We should just allocate regions aligned to page
| boundaries and free them later when memblock is no longer in use.

in that case, when DEBUG_PAGEALLOC, will get panic:

memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
BUG: unable to handle kernel paging request at ffff88102febd948
IP: [] __next_free_mem_range+0x9b/0x155
PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 0
Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
RIP: 0010:[] [] __next_free_mem_range+0x9b/0x155

See the discussion at https://lkml.org/lkml/2012/6/13/469

So try to allocate with PAGE_SIZE alignment and free it later.

Reported-by: Sasha Levin
Acked-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Signed-off-by: Yinghai Lu
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2012-07-12 07:04:50 +0800

21 Jun, 2012

2 commits

48c3b583b mm/memblock: fix overlapping allocation when doubling reserved array ... Browse Code »

__alloc_memory_core_early() asks memblock for a range of memory then try
to reserve it. If the reserved region array lacks space for the new
range, memblock_double_array() is called to allocate more space for the
array. If memblock is used to allocate memory for the new array it can
end up using a range that overlaps with the range originally allocated in
__alloc_memory_core_early(), leading to possible data corruption.

With this patch memblock_double_array() now calls memblock_find_in_range()
with a narrowed candidate range (in cases where the reserved.regions array
is being doubled) so any memory allocated will not overlap with the
original range that was being reserved. The range is narrowed by passing
in the starting address and size of the previously allocated range. Then
the range above the ending address is searched and if a candidate is not
found, the range below the starting address is searched.

Signed-off-by: Greg Pearson
Signed-off-by: Yinghai Lu
Acked-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Pearson
2012-06-21 05:39:36 +0800
dad7557eb mm: fix kernel-doc warnings ... Browse Code »

Fix kernel-doc warnings such as

Warning(../mm/page_cgroup.c:432): No description found for parameter 'id'
Warning(../mm/page_cgroup.c:432): Excess function parameter 'mem' description in 'swap_cgroup_record'

Signed-off-by: Wanpeng Li
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2012-06-21 05:39:36 +0800

08 Jun, 2012

1 commit

eab309494 memblock: Document memblock_is_region_{memory,reserved}() ... Browse Code »

At first glance one would think that memblock_is_region_memory()
and memblock_is_region_reserved() would be implemented in the
same way. Unfortunately they aren't and the former returns
whether the region specified is a subset of a memory bank while
the latter returns whether the region specified intersects with
reserved memory.

Document the two functions so that users aren't tempted to
make the implementation the same between them and to clarify the
purpose of the functions.

Signed-off-by: Stephen Boyd
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/1337845521-32755-1-git-send-email-sboyd@codeaurora.org
Signed-off-by: Ingo Molnar

Stephen Boyd
2012-06-08 17:59:46 +0800

30 May, 2012

2 commits

181eb3942 mm/memblock: fix memory leak on extending regions ... Browse Code »

The overall memblock has been organized into the memory regions and
reserved regions. Initially, the memory regions and reserved regions are
stored in the predetermined arrays of "struct memblock _region". It's
possible for the arrays to be enlarged when we have newly added regions,
but no free space left there. The policy here is to create double-sized
array either by slab allocator or memblock allocator. Unfortunately, we
didn't free the old array, which might be allocated through slab allocator
before. That would cause memory leak.

The patch introduces 2 variables to trace where (slab or memblock) the
memory and reserved regions come from. The memory for the memory or
reserved regions will be deallocated by kfree() if that was allocated by
slab allocator. Thus to fix the memory leak issue.

Signed-off-by: Gavin Shan
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gavin Shan
2012-05-30 07:22:24 +0800
4e2f07750 mm/memblock: cleanup on duplicate VA/PA conversion ... Browse Code »

The overall memblock has been organized into the memory regions and
reserved regions. Initially, the memory regions and reserved regions are
stored in the predetermined arrays of "struct memblock _region". It's
possible for the arrays to be enlarged when we have newly added regions
for them, but no enough space there. Under the situation, We will created
double-sized array to meet the requirement. However, the original
implementation converted the VA (Virtual Address) of the newly allocated
array of regions to PA (Physical Address), then translate back when we
allocates the new array from slab. That's actually unnecessary.

The patch removes the duplicate VA/PA conversion.

Signed-off-by: Gavin Shan
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gavin Shan
2012-05-30 07:22:24 +0800

21 Apr, 2012

1 commit

b3dc627ca memblock: memblock should be able to handle zero length operations ... Browse Code »

Commit 24aa07882b ("memblock, x86: Replace memblock_x86_reserve/
free_range() with generic ones") replaced x86 specific memblock
operations with the generic ones; unfortunately, it lost zero length
operation handling in the process making the kernel panic if somebody
tries to reserve zero length area.

There isn't much to be gained by being cranky to zero length operations
and panicking is almost the worst response. Drop the BUG_ON() in
memblock_reserve() and update memblock_add_region/isolate_range() so
that all zero length operations are handled as noops.

Signed-off-by: Tejun Heo
Cc: stable@vger.kernel.org
Reported-by: Valere Monseur
Bisected-by: Joseph Freeman
Tested-by: Joseph Freeman
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43098
Signed-off-by: Linus Torvalds

Tejun Heo
2012-04-21 02:18:46 +0800

01 Mar, 2012

1 commit

847854f59 memblock: Fix size aligning of memblock_alloc_base_nid() ... Browse Code »

memblock allocator aligns @size to @align to reduce the amount
of fragmentation. Commit:

7bd0b0f0da ("memblock: Reimplement memblock allocation using reverse free area iterator")

Broke it by incorrectly relocating @size aligning to
memblock_find_in_range_node(). As the aligned size is not
propagated back to memblock_alloc_base_nid(), the actually
reserved size isn't aligned.

While this increases memory use for memblock reserved array,
this shouldn't cause any critical failure; however, it seems
that the size aligning was hiding a use-beyond-allocation bug in
sparc64 and losing the aligning causes boot failure.

The underlying problem is currently being debugged but this is a
proper fix in itself, it's already pretty late in -rc cycle for
boot failures and reverting the change for debugging isn't
difficult. Restore the size aligning moving it to
memblock_alloc_base_nid().

Reported-by: Meelis Roos
Signed-off-by: Tejun Heo
Cc: David S. Miller
Cc: Grant Likely
Cc: Rob Herring
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/20120228205621.GC3252@dhcp-172-17-108-109.mtv.corp.google.com
Signed-off-by: Ingo Molnar
LKML-Reference:

Tejun Heo
2012-03-01 17:53:18 +0800

16 Jan, 2012

1 commit

5d53cb27d memblock: Fix alloc failure due to dumb underflow protection in memblock_find_in_range_node() ... Browse Code »

7bd0b0f0da ("memblock: Reimplement memblock allocation using
reverse free area iterator") implemented a simple top-down
allocator using a reverse memblock iterator. To avoid underflow
in the allocator loop, it simply raised the lower boundary to
the requested size under the assumption that requested size
would be far smaller than available memblocks.

This causes early page table allocation failure under certain
configurations in Xen. Fix it by checking for underflow directly
instead of bumping up lower bound.

Signed-off-by: Tejun Heo
Reported-by: Konrad Rzeszutek Wilk
Cc: rjw@sisk.pl
Cc: xen-devel@lists.xensource.com
Cc: Benjamin Herrenschmidt
Cc: Andrew Morton
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/20120113181412.GA11112@google.com
Signed-off-by: Ingo Molnar

Tejun Heo
2012-01-16 15:38:06 +0800

09 Dec, 2011

14 commits

7bd0b0f0d memblock: Reimplement memblock allocation using reverse free area iterator ... Browse Code »

Now that all early memory information is in memblock when enabled, we
can implement reverse free area iterator and use it to implement NUMA
aware allocator which is then wrapped for simpler variants instead of
the confusing and inefficient mending of information in separate NUMA
aware allocator.

Implement for_each_free_mem_range_reverse(), use it to reimplement
memblock_find_in_range_node() which in turn is used by all allocators.

The visible allocator interface is inconsistent and can probably use
some cleanup too.

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu

Tejun Heo
2011-12-09 02:22:09 +0800
0ee332c14 memblock: Kill early_node_map[] ... Browse Code »

Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP -
there's no user of early_node_map[] left. Kill early_node_map[] and
replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP. Also,
relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h
as page_alloc.c would no longer host an alternative implementation.

This change is ultimately one to one mapping and shouldn't cause any
observable difference; however, after the recent changes, there are
some functions which now would fit memblock.c better than page_alloc.c
and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK
doesn't make much sense on some of them. Further cleanups for
functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice.

-v2: Fix compile bug introduced by mis-spelling
CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in
mmzone.h. Reported by Stephen Rothwell.

Signed-off-by: Tejun Heo
Cc: Stephen Rothwell
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu
Cc: Tony Luck
Cc: Ralf Baechle
Cc: Martin Schwidefsky
Cc: Chen Liqin
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: "H. Peter Anvin"

Tejun Heo
2011-12-09 02:22:09 +0800
7fb0bc3f0 memblock: Implement memblock_add_node() ... Browse Code »

Implement memblock_add_node() which can add a new memblock memory
region with specific node ID.

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu

Tejun Heo
2011-12-09 02:22:08 +0800
1aadc0560 memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users ... Browse Code »

The only function of memblock_analyze() is now allowing resize of
memblock region arrays. Rename it to memblock_allow_resize() and
update its users.

* The following users remain the same other than renaming.

arm/mm/init.c::arm_memblock_init()
microblaze/kernel/prom.c::early_init_devtree()
powerpc/kernel/prom.c::early_init_devtree()
openrisc/kernel/prom.c::early_init_devtree()
sh/mm/init.c::paging_init()
sparc/mm/init_64.c::paging_init()
unicore32/mm/init.c::uc32_memblock_init()

* In the following users, analyze was used to update total size which
is no longer necessary.

powerpc/kernel/machine_kexec.c::reserve_crashkernel()
powerpc/kernel/prom.c::early_init_devtree()
powerpc/mm/init_32.c::MMU_init()
powerpc/mm/tlb_nohash.c::__early_init_mmu()
powerpc/platforms/ps3/mm.c::ps3_mm_add_memory()
powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups()
sh/kernel/machine_kexec.c::reserve_crashkernel()

* x86/kernel/e820.c::memblock_x86_fill() was directly setting
memblock_can_resize before populating memblock and calling analyze
afterwards. Call memblock_allow_resize() before start populating.

memblock_can_resize is now static inside memblock.c.

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu
Cc: Russell King
Cc: Michal Simek
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: Guan Xuetao
Cc: "H. Peter Anvin"

Tejun Heo
2011-12-09 02:22:08 +0800
1440c4e2c memblock: Track total size of regions automatically ... Browse Code »

Total size of memory regions was calculated by memblock_analyze()
requiring explicitly calling the function between operations which can
change memory regions and possible users of total size, which is
cumbersome and fragile.

This patch makes each memblock_type track total size automatically
with minor modifications to memblock manipulation functions and remove
requirements on calling memblock_analyze(). [__]memblock_dump_all()
now also dumps the total size of reserved regions.

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu

Tejun Heo
2011-12-09 02:22:08 +0800
c0ce8fef5 memblock: Reimplement memblock_enforce_memory_limit() using __memblock_remove() ... Browse Code »

With recent updates, the basic memblock operations are robust enough
that there's no reason for memblock_enfore_memory_limit() to directly
manipulate memblock region arrays. Reimplement it using
__memblock_remove().

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu

Tejun Heo
2011-12-09 02:22:07 +0800
eb18f1b5b memblock: Make memblock functions handle overflowing range @size ... Browse Code »

Allow memblock users to specify range where @base + @size overflows
and automatically cap it at maximum. This makes the interface more
robust and specifying till-the-end-of-memory easier.

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu

Tejun Heo
2011-12-09 02:22:07 +0800
719361809 memblock: Reimplement __memblock_remove() using memblock_isolate_range() ... Browse Code »

__memblock_remove()'s open coded region manipulation can be trivially
replaced with memblock_islate_range(). This increases code sharing
and eases improving region tracking.

This pulls memblock_isolate_range() out of HAVE_MEMBLOCK_NODE_MAP.
Make it use memblock_get_region_node() instead of assuming rgn->nid is
available.

-v2: Fixed build failure on !HAVE_MEMBLOCK_NODE_MAP caused by direct
rgn->nid access.

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu

Tejun Heo
2011-12-09 02:22:07 +0800
6a9ceb31c memblock: Separate out memblock_isolate_range() from memblock_set_node() ... Browse Code »

memblock_set_node() operates in three steps - break regions crossing
boundaries, set nid and merge back regions. This patch separates the
first part into a separate function - memblock_isolate_range(), which
breaks regions crossing range boundaries and returns range index range
for regions properly contained in the specified memory range.

This doesn't introduce any behavior change and will be used to further
unify region handling.

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu

Tejun Heo
2011-12-09 02:22:07 +0800
fe091c208 memblock: Kill memblock_init() ... Browse Code »

memblock_init() initializes arrays for regions and memblock itself;
however, all these can be done with struct initializers and
memblock_init() can be removed. This patch kills memblock_init() and
initializes memblock with struct initializer.

The only difference is that the first dummy entries don't have .nid
set to MAX_NUMNODES initially. This doesn't cause any behavior
difference.

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu
Cc: Russell King
Cc: Michal Simek
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: Guan Xuetao
Cc: "H. Peter Anvin"

Tejun Heo
2011-12-09 02:22:07 +0800
c5a1cb284 memblock: Kill sentinel entries at the end of static region arrays ... Browse Code »

memblock no longer depends on having one more entry at the end during
addition making the sentinel entries at the end of region arrays not
too useful. Remove the sentinels. This eases further updates.

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu

Tejun Heo
2011-12-09 02:22:07 +0800
4ff7b82f1 memblock: Add __memblock_dump_all() ... Browse Code »

Add __memblock_dump_all() which dumps memblock configuration whether
memblock_debug is enabled or not.

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu

Tejun Heo
2011-12-09 02:22:06 +0800
9c8c27e2b memblock: Use memblock_reserve() in memblock internal functions ... Browse Code »

Make memblock_double_array(), __memblock_alloc_base() and
memblock_alloc_nid() use memblock_reserve() instead of calling
memblock_add_region() with reserved array directly. This eases
debugging and updates to memblock_add_region().

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu

Tejun Heo
2011-12-09 02:22:06 +0800
581adcbe1 memblock: Make memblock_{add|remove|free|reserve}() return int and update prototypes ... Browse Code »

memblock_{add|remove|free|reserve}() return either 0 or -errno but had
long as return type. Chage it to int. Also, drop 'extern' from all
prototypes in memblock.h - they are unnecessary and used
inconsistently (especially if mm.h is included in the picture).

Signed-off-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Cc: Yinghai Lu

Tejun Heo
2011-12-09 02:22:06 +0800

29 Nov, 2011

1 commit

d4bbf7e77 Merge branch 'master' into x86/memblock ... Browse Code »

Conflicts & resolutions:

* arch/x86/xen/setup.c

dc91c728fd "xen: allow extra memory to be in multiple regions"
24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..."

conflicted on xen_add_extra_mem() updates. The resolution is
trivial as the latter just want to replace
memblock_x86_reserve_range() with memblock_reserve().

* drivers/pci/intel-iommu.c

166e9278a3f "x86/ia64: intel-iommu: move to drivers/iommu/"
5dfe8660a3d "bootmem: Replace work_with_active_regions() with..."

conflicted as the former moved the file under drivers/iommu/.
Resolved by applying the chnages from the latter on the moved
file.

* mm/Kconfig

6661672053a "memblock: add NO_BOOTMEM config symbol"
c378ddd53f9 "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option"

conflicted trivially. Both added config options. Just
letting both add their own options resolves the conflict.

* mm/memblock.c

d1f0ece6cdc "mm/memblock.c: small function definition fixes"
ed7b56a799c "memblock: Remove memblock_memory_can_coalesce()"

confliected. The former updates function removed by the
latter. Resolution is trivial.

Signed-off-by: Tejun Heo

Tejun Heo
2011-11-29 01:46:22 +0800