Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

13 Nov, 2013

2 commits

4a099fb4b mm/bootmem.c: remove unused local `map' ... Browse Code »

Signed-off-by: Daeseok Youn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daeseok Youn
2013-11-13 11:09:09 +0800
83285c72e mm: use pgdat_end_pfn() to simplify the code in others ... Browse Code »

Use "pgdat_end_pfn()" instead of "pgdat->node_start_pfn +
pgdat->node_spanned_pages". Simplify the code, no functional change.

Signed-off-by: Xishi Qiu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xishi Qiu
2013-11-13 11:09:03 +0800

04 Jul, 2013

3 commits

e1280be0d mm: kill free_all_bootmem_node() ... Browse Code »

Now nobody makes use of free_all_bootmem_node(), kill it.

Signed-off-by: Jiang Liu
Cc: Johannes Weiner
Cc: "David S. Miller"
Cc: Yinghai Lu
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:39 +0800
0c9885347 mm: concentrate modification of totalram_pages into the mm core ... Browse Code »

Concentrate code to modify totalram_pages into the mm core, so the arch
memory initialized code doesn't need to take care of it. With these
changes applied, only following functions from mm core modify global
variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
free_all_bootmem_node(), adjust_managed_page_count().

With this patch applied, it will be much more easier for us to keep
totalram_pages and zone->managed_pages in consistence.

Signed-off-by: Jiang Liu
Acked-by: David Howells
Cc: "H. Peter Anvin"
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Marek Szyprowski
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tang Chen
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wen Congyang
Cc: Will Deacon
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800
7b4b2a0d6 mm: accurately calculate zone->managed_pages for highmem zones ... Browse Code »

Commit "mm: introduce new field 'managed_pages' to struct zone" assumes
that all highmem pages will be freed into the buddy system by function
mem_init(). But that's not always true, some architectures may reserve
some highmem pages during boot. For example PPC may allocate highmem
pages for giagant HugeTLB pages, and several architectures have code to
check PageReserved flag to exclude highmem pages allocated during boot
when freeing highmem pages into the buddy system.

So treat highmem pages in the same way as normal pages, that is to:
1) reset zone->managed_pages to zero in mem_init().
2) recalculate managed_pages when freeing pages into the buddy system.

Signed-off-by: Jiang Liu
Cc: "H. Peter Anvin"
Cc: Tejun Heo
Cc: Joonsoo Kim
Cc: Yinghai Lu
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Kamezawa Hiroyuki
Cc: Marek Szyprowski
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Konrad Rzeszutek Wilk
Cc: Michel Lespinasse
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tang Chen
Cc: Thomas Gleixner
Cc: Wen Congyang
Cc: Will Deacon
Cc: Yasuaki Ishimatsu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800

30 Jan, 2013

1 commit

38fa4175e mm: Add alloc_bootmem_low_pages_nopanic() ... Browse Code »

We don't need to panic in some case, like for swiotlb preallocating.

Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/1359058816-7615-35-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin

Yinghai Lu
2013-01-30 11:32:59 +0800

12 Jan, 2013

1 commit

10d73e655 mm: bootmem: fix free_all_bootmem_core() with odd bitmap alignment ... Browse Code »

Currently free_all_bootmem_core ignores that node_min_pfn may be not
multiple of BITS_PER_LONG. Eg commit 6dccdcbe2c3e ("mm: bootmem: fix
checking the bitmap when finally freeing bootmem") shifts vec by lower
bits of start instead of lower bits of idx. Also

if (IS_ALIGNED(start, BITS_PER_LONG) && vec == ~0UL)

assumes that vec bit 0 corresponds to start pfn, which is only true when
node_min_pfn is a multiple of BITS_PER_LONG. Also loop in the else
clause can double-free pages (e.g. with node_min_pfn == start == 1,
map[0] == ~0 on 32-bit machine page 32 will be double-freed).

This bug causes the following message during xtensa kernel boot:

bootmem::free_all_bootmem_core nid=0 start=1 end=8000
BUG: Bad page state in process swapper pfn:00001
page:d04bd020 count:0 mapcount:-127 mapping: (null) index:0x2
page flags: 0x0()
Call Trace:
bad_page+0x8c/0x9c
free_pages_prepare+0x5e/0x88
free_hot_cold_page+0xc/0xa0
__free_pages+0x24/0x38
__free_pages_bootmem+0x54/0x56
free_all_bootmem_core$part$11+0xeb/0x138
free_all_bootmem+0x46/0x58
mem_init+0x25/0xa4
start_kernel+0x11e/0x25c
should_never_return+0x0/0x3be7

The fix is the following:
- always align vec so that its bit 0 corresponds to start
- provide BITS_PER_LONG bits in vec, if those bits are available in the
map
- don't free pages past next start position in the else clause.

Signed-off-by: Max Filippov
Cc: Gavin Shan
Cc: Johannes Weiner
Cc: Tejun Heo
Cc: Yinghai Lu
Cc: Joonsoo Kim
Cc: Prasad Koya
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Max Filippov
2013-01-12 06:54:55 +0800

13 Dec, 2012

4 commits

98870901c mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic() ... Browse Code »

reserve_bootmem_generic() has no caller,

Signed-off-by: Lin Feng
Acked-by: Johannes Weiner
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lin Feng
2012-12-13 09:38:35 +0800
9feedc9d8 mm: introduce new field "managed_pages" to struct zone ... Browse Code »

Currently a zone's present_pages is calcuated as below, which is
inaccurate and may cause trouble to memory hotplug.

spanned_pages - absent_pages - memmap_pages - dma_reserve.

During fixing bugs caused by inaccurate zone->present_pages, we found
zone->present_pages has been abused. The field zone->present_pages may
have different meanings in different contexts:

1) pages existing in a zone.
2) pages managed by the buddy system.

For more discussions about the issue, please refer to:
http://lkml.org/lkml/2012/11/5/866
https://patchwork.kernel.org/patch/1346751/

This patchset tries to introduce a new field named "managed_pages" to
struct zone, which counts "pages managed by the buddy system". And revert
zone->present_pages to count "physical pages existing in a zone", which
also keep in consistence with pgdat->node_present_pages.

We will set an initial value for zone->managed_pages in function
free_area_init_core() and will adjust it later if the initial value is
inaccurate.

For DMA/normal zones, the initial value is set to:

(spanned_pages - absent_pages - memmap_pages - dma_reserve)

Later zone->managed_pages will be adjusted to the accurate value when the
bootmem allocator frees all free pages to the buddy system in function
free_all_bootmem_node() and free_all_bootmem().

The bootmem allocator doesn't touch highmem pages, so highmem zones'
managed_pages is set to the accurate value "spanned_pages - absent_pages"
in function free_area_init_core() and won't be updated anymore.

This patch also adds a new field "managed_pages" to /proc/zoneinfo
and sysrq showmem.

[akpm@linux-foundation.org: small comment tweaks]
Signed-off-by: Jiang Liu
Cc: Wen Congyang
Cc: David Rientjes
Cc: Maciej Rutecki
Tested-by: Chris Clayton
Cc: "Rafael J . Wysocki"
Cc: Mel Gorman
Cc: Minchan Kim
Cc: KAMEZAWA Hiroyuki
Cc: Michal Hocko
Cc: Jianguo Wu
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2012-12-13 09:38:34 +0800
3f7dfe24b bootmem: remove alloc_arch_preferred_bootmem() ... Browse Code »

The name of this function is not suitable, and removing the function and
open-coding it into each call sites makes the code more understandable.

Additionally, we shouldn't do an allocation from bootmem when
slab_is_available(), so directly return kmalloc()'s return value.

Signed-off-by: Joonsoo Kim
Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Johannes Weiner
Cc: FUJITA Tomonori
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2012-12-13 09:38:31 +0800
2d7a69560 bootmem: remove not implemented function call, bootmem_arch_preferred_node() ... Browse Code »

There is no implementation of bootmem_arch_preferred_node() and a call to
this function will cause a compilation error. So remove it.

Signed-off-by: Joonsoo Kim
Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Johannes Weiner
Cc: FUJITA Tomonori
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2012-12-13 09:38:30 +0800

12 Dec, 2012

1 commit

81df9bff2 bootmem: fix wrong call parameter for free_bootmem() ... Browse Code »

It is strange that alloc_bootmem() returns a virtual address and
free_bootmem() requires a physical address. Anyway, free_bootmem()'s
first parameter should be physical address.

There are some call sites for free_bootmem() with virtual address. So fix
them.

[akpm@linux-foundation.org: improve free_bootmem() and free_bootmem_pate() documentation]
Signed-off-by: Joonsoo Kim
Cc: Haavard Skinnemoen
Cc: Hans-Christian Egtvedt
Cc: Johannes Weiner
Cc: FUJITA Tomonori
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2012-12-12 09:22:28 +0800

17 Nov, 2012

1 commit

5576646f3 revert "mm: fix-up zone present pages" ... Browse Code »

Revert commit 7f1290f2f2a4 ("mm: fix-up zone present pages")

That patch tried to fix a issue when calculating zone->present_pages,
but it caused a regression on 32bit systems with HIGHMEM. With that
change, reset_zone_present_pages() resets all zone->present_pages to
zero, and fixup_zone_present_pages() is called to recalculate
zone->present_pages when the boot allocator frees core memory pages into
buddy allocator. Because highmem pages are not freed by bootmem
allocator, all highmem zones' present_pages becomes zero.

Various options for improving the situation are being discussed but for
now, let's return to the 3.6 code.

Cc: Jianguo Wu
Cc: Jiang Liu
Cc: Petr Tesarik
Cc: "Luck, Tony"
Cc: Mel Gorman
Cc: Yinghai Lu
Cc: Minchan Kim
Cc: Johannes Weiner
Acked-by: David Rientjes
Tested-by: Chris Clayton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-11-17 06:33:04 +0800

09 Oct, 2012

1 commit

7f1290f2f mm: fix-up zone present pages ... Browse Code »

I think zone->present_pages indicates pages that buddy system can management,
it should be:

zone->present_pages = spanned pages - absent pages - bootmem pages,

but is now:
zone->present_pages = spanned pages - absent pages - memmap pages.

spanned pages: total size, including holes.
absent pages: holes.
bootmem pages: pages used in system boot, managed by bootmem allocator.
memmap pages: pages used by page structs.

This may cause zone->present_pages less than it should be. For example,
numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other
bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's
present_pages should be spanned pages - absent pages, but now it also
minus memmap pages(free_area_init_core), which are actually allocated from
ZONE_MOVABLE. When offlining all memory of a zone, this will cause
zone->present_pages less than 0, because present_pages is unsigned long
type, it is actually a very large integer, it indirectly caused
zone->watermark[WMARK_MIN] becomes a large
integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a
large integer(calculate_totalreserve_pages()), and finally cause memory
allocating failure when fork process(__vm_enough_memory()).

[root@localhost ~]# dmesg
-bash: fork: Cannot allocate memory

I think the bug described in

http://marc.info/?l=linux-mm&m=134502182714186&w=2

is also caused by wrong zone present pages.

This patch intends to fix-up zone->present_pages when memory are freed to
buddy system on x86_64 and IA64 platforms.

Signed-off-by: Jianguo Wu
Signed-off-by: Jiang Liu
Reported-by: Petr Tesarik
Tested-by: Petr Tesarik
Cc: "Luck, Tony"
Cc: Mel Gorman
Cc: Yinghai Lu
Cc: Minchan Kim
Cc: Johannes Weiner
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jianguo Wu
2012-10-09 15:22:54 +0800

27 Aug, 2012

1 commit

0d4ba4d7b bootmem: Fix the short description of reserve_bootmem() ... Browse Code »

It marks pages as reserved, as the long description says.

Signed-off-by: Javi Merino
Acked-by: Johannes Weiner
Signed-off-by: Jiri Kosina

Javi Merino
2012-08-27 22:57:21 +0800

18 Jul, 2012

1 commit

c8f4a2d09 bootmem: make ___alloc_bootmem_node_nopanic() really nopanic ... Browse Code »

In reaction to commit 99ab7b19440a ("mm: sparse: fix usemap allocation
above node descriptor section") Johannes said:
| while backporting the below patch, I realised that your fix busted
| f5bf18fa22f8 again. The problem was not a panicking version on
| allocation failure but when the usemap size was too large such that
| goal + size > limit triggers the BUG_ON in the bootmem allocator. So
| we need a version that passes limit ONLY if the usemap is smaller than
| the section.

after checking the code, the name of ___alloc_bootmem_node_nopanic()
does not reflect the fact.

Make bootmem really not panic.

Hope will kill bootmem sooner.

Signed-off-by: Yinghai Lu
Cc: Johannes Weiner
Cc: [3.3.x, 3.4.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2012-07-18 07:21:29 +0800

12 Jul, 2012

1 commit

99ab7b194 mm: sparse: fix usemap allocation above node descriptor section ... Browse Code »

After commit f5bf18fa22f8 ("bootmem/sparsemem: remove limit constraint
in alloc_bootmem_section"), usemap allocations may easily be placed
outside the optimal section that holds the node descriptor, even if
there is space available in that section. This results in unnecessary
hotplug dependencies that need to have the node unplugged before the
section holding the usemap.

The reason is that the bootmem allocator doesn't guarantee a linear
search starting from the passed allocation goal but may start out at a
much higher address absent an upper limit.

Fix this by trying the allocation with the limit at the section end,
then retry without if that fails. This keeps the fix from f5bf18fa22f8
of not panicking if the allocation does not fit in the section, but
still makes sure to try to stay within the section at first.

Signed-off-by: Yinghai Lu
Signed-off-by: Johannes Weiner
Cc: [3.3.x, 3.4.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2012-07-12 07:04:49 +0800

30 May, 2012

9 commits

5c2b8a162 mm/bootmem.c: cleanup on addition to bootmem data list ... Browse Code »

The objects of "struct bootmem_data_t" are linked together to form
double-linked list sequentially based on its minimal page frame number.

The current implementation implicitly supports the following cases,
which means the inserting point for current bootmem data depends on how
"list_for_each" works. That makes the code a little hard to read.
Besides, "list_for_each" and "list_entry" can be replaced with
"list_for_each_entry".

- The linked list is empty.
- There has no entry in the linked list, whose minimal page
frame number is bigger than current one.

Signed-off-by: Gavin Shan
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gavin Shan
2012-05-30 07:22:24 +0800
238305bb4 mm: remove sparsemem allocation details from the bootmem allocator ... Browse Code »

alloc_bootmem_section() derives allocation area constraints from the
specified sparsemem section. This is a bit specific for a generic memory
allocator like bootmem, though, so move it over to sparsemem.

As __alloc_bootmem_node_nopanic() already retries failed allocations with
relaxed area constraints, the fallback code in sparsemem.c can be removed
and the code becomes a bit more compact overall.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:22 +0800
e9079911e mm: bootmem: pass pgdat instead of pgdat->bdata down the stack ... Browse Code »

Pass down the node descriptor instead of the more specific bootmem node
descriptor down the call stack, like nobootmem does, when there is no good
reason for the two to be different.

Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:22 +0800
421456edd mm: bootmem: unify allocation policy of (non-)panicking node allocations ... Browse Code »

While the panicking node-specific allocation function tries to satisfy
node+goal, goal, node, anywhere, the non-panicking function still does
node+goal, goal, anywhere.

Make it simpler: define the panicking version in terms of the
non-panicking one, like the node-agnostic interface, so they always behave
the same way apart from how to deal with allocation failure.

Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:22 +0800
ab3818432 mm: bootmem: allocate in order node+goal, goal, node, anywhere ... Browse Code »

Match the nobootmem version of __alloc_bootmem_node. Try to satisfy both
the node and the goal, then just the goal, then just the node, then
allocate anywhere before panicking.

Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:21 +0800
c12ab504a mm: bootmem: split out goal-to-node mapping from goal dropping ... Browse Code »

Matching the desired goal to the right node is one thing, dropping the
goal when it can not be satisfied is another. Split this into separate
functions so that subsequent patches can use the node-finding but drop and
handle the goal fallback on their own terms.

Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:21 +0800
c6785b6bf mm: bootmem: rename alloc_bootmem_core to alloc_bootmem_bdata ... Browse Code »

Callsites need to provide a bootmem_data_t *, make the naming more
descriptive.

Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:21 +0800
549381e19 mm: bootmem: remove redundant offset check when finally freeing bootmem ... Browse Code »

When bootmem releases an unaligned BITS_PER_LONG pages chunk of memory
to the page allocator, it checks the bitmap if there are still
unreserved pages in the chunk (set bits), but also if the offset in the
chunk indicates BITS_PER_LONG loop iterations already.

But since the consulted bitmap is only a one-word-excerpt of the full
per-node bitmap, there can not be more than BITS_PER_LONG bits set in
it. The additional offset check is unnecessary.

Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:21 +0800
6dccdcbe2 mm: bootmem: fix checking the bitmap when finally freeing bootmem ... Browse Code »

When bootmem releases an unaligned chunk of memory at the beginning of a
node to the page allocator, it iterates from that unaligned PFN but
checks an aligned word of the page bitmap. The checked bits do not
correspond to the PFNs and, as a result, reserved pages can be freed.

Properly shift the bitmap word so that the lowest bit corresponds to the
starting PFN before entering the freeing loop.

This bug has been around since commit 41546c17418f ("bootmem: clean up
free_all_bootmem_core") (2.6.27) without known reports.

Signed-off-by: Gavin Shan
Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gavin Shan
2012-05-30 07:22:21 +0800

22 Mar, 2012

1 commit

f5bf18fa2 bootmem/sparsemem: remove limit constraint in alloc_bootmem_section ... Browse Code »

While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
Overcommit) on powerpc, we tripped the following:

kernel BUG at mm/bootmem.c:483!
cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
sp: c000000000c03bc0
msr: 8000000000021032
current = 0xc000000000b0cce0
paca = 0xc000000001d80000
pid = 0, comm = swapper
kernel BUG at mm/bootmem.c:483!
enter ? for help
[c000000000c03c80] c000000000a64bcc
.sparse_early_usemaps_alloc_node+0x84/0x29c
[c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
[c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
[c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
[c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c

This is

BUG_ON(limit && goal + size > limit);

and after some debugging, it seems that

goal = 0x7ffff000000
limit = 0x80000000000

and sparse_early_usemaps_alloc_node ->
sparse_early_usemaps_alloc_pgdat_section calls

return alloc_bootmem_section(usemap_size() * count, section_nr);

This is on a system with 8TB available via the AMS pool, and as a quirk
of AMS in firmware, all of that memory shows up in node 0. So, we end
up with an allocation that will fail the goal/limit constraints.

In theory, we could "fall-back" to alloc_bootmem_node() in
sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
defined, we'll BUG_ON() instead. A simple solution appears to be to
unconditionally remove the limit condition in alloc_bootmem_section,
meaning allocations are allowed to cross section boundaries (necessary
for systems of this size).

Johannes Weiner pointed out that if alloc_bootmem_section() no longer
guarantees section-locality, we need check_usemap_section_nr() to print
possible cross-dependencies between node descriptors and the usemaps
allocated through it. That makes the two loops in
sparse_early_usemaps_alloc_node() identical, so re-factor the code a
bit.

[akpm@linux-foundation.org: code simplification]
Signed-off-by: Nishanth Aravamudan
Cc: Dave Hansen
Cc: Anton Blanchard
Cc: Paul Mackerras
Cc: Ben Herrenschmidt
Cc: Robert Jennings
Acked-by: Johannes Weiner
Acked-by: Mel Gorman
Cc: [3.3.1]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nishanth Aravamudan
2012-03-22 08:54:58 +0800

11 Jan, 2012

3 commits

799f933a8 mm: bootmem: try harder to free pages in bulk ... Browse Code »

The loop that frees pages to the page allocator while bootstrapping tries
to free higher-order blocks only when the starting address is aligned to
that block size. Otherwise it will free all pages on that node
one-by-one.

Change it to free individual pages up to the first aligned block and then
try higher-order frees from there.

Signed-off-by: Johannes Weiner
Cc: Uwe Kleine-König
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-01-11 08:30:45 +0800
560a036b3 mm: bootmem: drop superfluous range check when freeing pages in bulk ... Browse Code »

The area node_bootmem_map represents is aligned to BITS_PER_LONG, and all
bits in any aligned word of that map valid. When the represented area
extends beyond the end of the node, the non-existant pages will be marked
as reserved.

As a result, when freeing a page block, doing an explicit range check for
whether that block is within the node's range is redundant as the bitmap
is consulted anyway to see whether all pages in the block are unreserved.

Signed-off-by: Johannes Weiner
Cc: Uwe Kleine-König
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-01-11 08:30:44 +0800
9571a9829 bootmem: micro optimize freeing pages in bulk ... Browse Code »

The first entry of bdata->node_bootmem_map holds the data for
bdata->node_min_pfn up to bdata->node_min_pfn + BITS_PER_LONG - 1. So the
test for freeing all pages of a single map entry can be slightly relaxed.

Moreover use DIV_ROUND_UP in another place instead of open coding it.

Signed-off-by: Uwe Kleine-König
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Uwe Kleine-König
2012-01-11 08:30:44 +0800

31 Oct, 2011

1 commit

b95f1b31b mm: Map most files to use export.h instead of module.h ... Browse Code »

The files changed within are only using the EXPORT_SYMBOL
macro variants. They are not using core modular infrastructure
and hence don't need module.h but only the export.h header.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

24 Mar, 2011

1 commit

93a72052b crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setu… ... Browse Code »

…p_elfcorehdr and saved_max_pfn

The Xen PV drivers in a crashed HVM guest can not connect to the dom0
backend drivers because both frontend and backend drivers are still in
connected state. To run the connection reset function only in case of a
crashdump, the is_kdump_kernel() function needs to be available for the PV
driver modules.

Consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn into
kernel/crash_dump.c Also export elfcorehdr_addr to make is_kdump_kernel()
usable for modules.

Leave 'elfcorehdr' as early_param(). This changes powerpc from __setup()
to early_param(). It adds an address range check from x86 also on ia64
and powerpc.

[akpm@linux-foundation.org: additional #includes]
[akpm@linux-foundation.org: remove elfcorehdr_addr export]
[akpm@linux-foundation.org: fix for Tejun's mm/nobootmem.c changes]
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Olaf Hering
2011-03-24 10:47:19 +0800

24 Feb, 2011

2 commits

e782ab421 bootmem: Move contig_page_data definition to bootmem.c/nobootmem.c ... Browse Code »

Now that bootmem.c and nobootmem.c are separate, it's cleaner to
define contig_page_data in each file than in page_alloc.c with #ifdef.
Move it.

This patch doesn't introduce any behavior change.

-v2: According to Andrew, fixed the struct layout.
-tj: Updated commit description.

Signed-off-by: Yinghai Lu
Acked-by: Andrew Morton
Signed-off-by: Tejun Heo

Yinghai Lu
2011-02-24 21:43:06 +0800
093258732 bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.c ... Browse Code »

mm/bootmem.c contained code paths for both bootmem and no bootmem
configurations. They implement about the same set of APIs in
different ways and as a result bootmem.c contains massive amount of
#ifdef CONFIG_NO_BOOTMEM.

Separate out CONFIG_NO_BOOTMEM code into mm/nobootmem.c. As the
common part is relatively small, duplicate them in nobootmem.c instead
of creating a common file or ifdef'ing in bootmem.c.

The followings are duplicated.

* {min|max}_low_pfn, max_pfn, saved_max_pfn
* free_bootmem_late()
* ___alloc_bootmem()
* __alloc_bootmem_low()

The followings are applicable only to nobootmem and moved verbatim.

* __free_pages_memory()
* free_all_memory_core_early()

The followings are not applicable to nobootmem and omitted in
nobootmem.c.

* reserve_bootmem_node()
* reserve_bootmem()

The rest split function bodies according to CONFIG_NO_BOOTMEM.

Makefile is updated so that only either bootmem.c or nobootmem.c is
built according to CONFIG_NO_BOOTMEM.

This patch doesn't introduce any behavior change.

-tj: Rewrote commit description.

Suggested-by: Ingo Molnar
Signed-off-by: Yinghai Lu
Acked-by: Andrew Morton
Signed-off-by: Tejun Heo

Yinghai Lu
2011-02-24 21:43:05 +0800

28 Aug, 2010

3 commits

a9ce6bc15 x86, memblock: Replace e820_/_early string with memblock_ ... Browse Code »

1.include linux/memblock.h directly. so later could reduce e820.h reference.
2 this patch is done by sed scripts mainly

-v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL

Signed-off-by: Yinghai Lu
Signed-off-by: H. Peter Anvin

Yinghai Lu
2010-08-28 02:13:47 +0800
72d7c3b33 x86: Use memblock to replace early_res ... Browse Code »

1. replace find_e820_area with memblock_find_in_range
2. replace reserve_early with memblock_x86_reserve_range
3. replace free_early with memblock_x86_free_range.
4. NO_BOOTMEM will switch to use memblock too.
5. use _e820, _early wrap in the patch, in following patch, will
replace them all
6. because memblock_x86_free_range support partial free, we can remove some special care
7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill()
so adjust some calling later in setup.c::setup_arch()
-- corruption_check and mptable_update

-v2: Move reserve_brk() early
Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range()
that could happen We have more then 128 RAM entry in E820 tables, and
memblock_x86_fill() could use memblock_find_in_range() to find a new place for
memblock.memory.region array.
and We don't need to use extend_brk() after fill_memblock_area()
So move reserve_brk() early before fill_memblock_area().
-v3: Move find_smp_config early
To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable
in right place.
-v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in
memblock.reserved already..
use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later.
-v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit
active_region for 32bit does include high pages
need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped()
-v6: Use current_limit instead
-v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L
-v8: Set memblock_can_resize early to handle EFI with more RAM entries
-v9: update after kmemleak changes in mainline

Suggested-by: David S. Miller
Suggested-by: Benjamin Herrenschmidt
Suggested-by: Thomas Gleixner
Signed-off-by: Yinghai Lu
Signed-off-by: H. Peter Anvin

Yinghai Lu
2010-08-28 02:12:29 +0800
f88eff74a bootmem, x86: Add weak version of reserve_bootmem_generic ... Browse Code »

It will be used memblock_x86_to_bootmem converting

It is an wrapper for reserve_bootmem, and x86 64bit is using special one.

Also clean up that version for x86_64. We don't need to take care of numa
path for that, bootmem can handle it how

Signed-off-by: Yinghai Lu
Signed-off-by: H. Peter Anvin

Yinghai Lu
2010-08-28 02:08:13 +0800

21 Jul, 2010

1 commit

b8ab9f820 x86,nobootmem: make alloc_bootmem_node fall back to other node when 32bit numa is used ... Browse Code »

Borislav Petkov reported his 32bit numa system has problem:

[ 0.000000] Reserving total of 4c00 pages for numa KVA remap
[ 0.000000] kva_start_pfn ~ 32800 max_low_pfn ~ 375fe
[ 0.000000] max_pfn = 238000
[ 0.000000] 8202MB HIGHMEM available.
[ 0.000000] 885MB LOWMEM available.
[ 0.000000] mapped low ram: 0 - 375fe000
[ 0.000000] low ram: 0 - 375fe000
[ 0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 1000 1000 => 34e7000
[ 0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 200 40 => 34c9d80
[ 0.000000] alloc (nid=0 100000 - 7ee00000) (1000000 - ffffffffffffffff) 180 40 => 34e6140
[ 0.000000] alloc (nid=1 80000000 - c7e60000) (1000000 - ffffffffffffffff) 240 40 => 80000000
[ 0.000000] BUG: unable to handle kernel paging request at 40000000
[ 0.000000] IP: [] __alloc_memory_core_early+0x147/0x1d6
[ 0.000000] *pdpt = 0000000000000000 *pde = f000ff53f000ff00
...
[ 0.000000] Call Trace:
[ 0.000000] [] ? __alloc_bootmem_node+0x216/0x22f
[ 0.000000] [] ? sparse_early_usemaps_alloc_node+0x5a/0x10b
[ 0.000000] [] ? sparse_init+0x1dc/0x499
[ 0.000000] [] ? paging_init+0x168/0x1df
[ 0.000000] [] ? native_pagetable_setup_start+0xef/0x1bb

looks like it allocates too much high address for bootmem.

Try to cut limit with get_max_mapped()

Reported-by: Borislav Petkov
Tested-by: Conny Seidel
Signed-off-by: Yinghai Lu
Cc: [2.6.34.x]
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Johannes Weiner
Cc: Lee Schermerhorn
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2010-07-21 07:25:40 +0800

08 Apr, 2010

1 commit

fb1ae6357 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/x86/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards
x86: Increase CONFIG_NODES_SHIFT max to 10
ibft, x86: Change reserve_ibft_region() to find_ibft_region()
x86, hpet: Fix bug in RTC emulation
x86, hpet: Erratum workaround for read after write of HPET comparator
bootmem, x86: Fix 32bit numa system without RAM on node 0
nobootmem, x86: Fix 32bit numa system without RAM on node 0
x86: Handle overlapping mptables
x86: Make e820_remove_range to handle all covered case
x86-32, resume: do a global tlb flush in S4 resume

Linus Torvalds
2010-04-08 02:02:23 +0800

02 Apr, 2010

1 commit

aa235fc71 bootmem, x86: Fix 32bit numa system without RAM on node 0 ... Browse Code »

When 32bit numa is used, free_all_bootmem() will still only go over with
node id 0.

If node 0 doesn't have RAM installed, the lowest populated node
becomes low RAM.

This one fixes BOOTMEM path by iterating over the bdata_list.

-v3: add more comments, and fix bootmem path too.
-v4: seperate from one big patch

Signed-off-by: Yinghai Lu
LKML-Reference:
Signed-off-by: H. Peter Anvin

Yinghai Lu
2010-04-02 05:41:19 +0800