Eric Lee / smarc-fsl-linux-kernel

12 Oct, 2016

1 commit

9099daed9 mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping ... Browse Code »

Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a
physical address to a virtual one using __va(). However, such physical
addresses may sometimes be located in highmem and using __va() is
incorrect, leading to inconsistent object tracking in kmemleak.

The following functions have been added to the kmemleak API and they take
a physical address as the object pointer. They only perform the
corresponding action if the address has a lowmem mapping:

kmemleak_alloc_phys
kmemleak_free_part_phys
kmemleak_not_leak_phys
kmemleak_ignore_phys

The affected calling places have been updated to use the new kmemleak
API.

Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas
Reported-by: Vignesh R
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Catalin Marinas
2016-10-12 06:06:33 +0800

08 Oct, 2016

2 commits

914a05165 mm: nobootmem: move the comment of free_all_bootmem ... Browse Code »

Commit b4def3509d18 ("mm, nobootmem: clean-up of free_low_memory_core_early()")
removed the unnecessary nodeid argument, after that, this comment
becomes more confused. We should move it to the right place.

Fixes: b4def3509d18c1db9 ("mm, nobootmem: clean-up of free_low_memory_core_early()")
Link: http://lkml.kernel.org/r/1473996082-14603-1-git-send-email-wanlong.gao@gmail.com
Signed-off-by: Wanlong Gao
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanlong Gao
2016-10-08 09:46:29 +0800
2382705f2 mm/nobootmem.c: remove duplicate macro ARCH_LOW_ADDRESS_LIMIT statements ... Browse Code »

Fix the following bugs:

- the same ARCH_LOW_ADDRESS_LIMIT statements are duplicated between
header and relevant source

- don't ensure ARCH_LOW_ADDRESS_LIMIT perhaps defined by ARCH in
asm/processor.h is preferred over default in linux/bootmem.h
completely since the former header isn't included by the latter

Link: http://lkml.kernel.org/r/e046aeaa-e160-6d9e-dc1b-e084c2fd999f@zoho.com
Signed-off-by: zijun_hu
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

zijun_hu
2016-10-08 09:46:28 +0800

18 Mar, 2016

1 commit

1170532bb mm: convert printk(KERN_<LEVEL> to pr_<level> ... Browse Code »

Most of the mm subsystem uses pr_ so make it consistent.

Miscellanea:

- Realign arguments
- Add missing newline to format
- kmemleak-test.c has a "kmemleak: " prefix added to the
"Kmemleak testing" logging message via pr_fmt

Signed-off-by: Joe Perches
Acked-by: Tejun Heo [percpu]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2016-03-18 06:09:34 +0800

06 Dec, 2015

1 commit

8dd330300 x86/mm: Introduce max_possible_pfn ... Browse Code »

max_possible_pfn will be used for tracking max possible
PFN for memory that isn't present in E820 table and
could be hotplugged later.

By default max_possible_pfn is initialized with max_pfn,
but later it could be updated with highest PFN of
hotpluggable memory ranges declared in ACPI SRAT table
if any present.

Signed-off-by: Igor Mammedov
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: akataria@vmware.com
Cc: fujita.tomonori@lab.ntt.co.jp
Cc: konrad.wilk@oracle.com
Cc: pbonzini@redhat.com
Cc: revers@redhat.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1449234426-273049-2-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar

Igor Mammedov
2015-12-06 19:46:31 +0800

01 Jul, 2015

2 commits

d70ddd7a5 mm: page_alloc: pass PFN to __free_pages_bootmem ... Browse Code »

__free_pages_bootmem prepares a page for release to the buddy allocator
and assumes that the struct page is initialised. Parallel initialisation
of struct pages defers initialisation and __free_pages_bootmem can be
called for struct pages that cannot yet map struct page to PFN. This
patch passes PFN to __free_pages_bootmem with no other functional change.

Signed-off-by: Mel Gorman
Tested-by: Nate Zimmer
Tested-by: Waiman Long
Tested-by: Daniel J Blueman
Acked-by: Pekka Enberg
Cc: Robin Holt
Cc: Nate Zimmer
Cc: Dave Hansen
Cc: Waiman Long
Cc: Scott Norton
Cc: "Luck, Tony"
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2015-07-01 10:44:55 +0800
92923ca3a mm: meminit: only set page reserved in the memblock region ... Browse Code »

Currently each page struct is set as reserved upon initialization. This
patch leaves the reserved bit clear and only sets the reserved bit when it
is known the memory was allocated by the bootmem allocator. This makes it
easier to distinguish between uninitialised struct pages and reserved
struct pages in later patches.

Signed-off-by: Robin Holt
Signed-off-by: Nathan Zimmer
Signed-off-by: Mel Gorman
Tested-by: Nate Zimmer
Tested-by: Waiman Long
Tested-by: Daniel J Blueman
Acked-by: Pekka Enberg
Cc: Robin Holt
Cc: Dave Hansen
Cc: Waiman Long
Cc: Scott Norton
Cc: "Luck, Tony"
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nathan Zimmer
2015-07-01 10:44:55 +0800

25 Jun, 2015

2 commits

a3f5bafcc mm/memblock: allocate boot time data structures from mirrored memory ... Browse Code »

Try to allocate all boot time kernel data structures from mirrored
memory.

If we run out of mirrored memory print warnings, but fall back to using
non-mirrored memory to make sure that we still boot.

By number of bytes, most of what we allocate at boot time is the page
structures. 64 bytes per 4K page on x86_64 ... or about 1.5% of total
system memory. For workloads where the bulk of memory is allocated to
applications this may represent a useful improvement to system
availability since 1.5% of total memory might be a third of the memory
allocated to the kernel.

Signed-off-by: Tony Luck
Cc: Xishi Qiu
Cc: Hanjun Guo
Cc: Xiexiuqi
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Yinghai Lu
Cc: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tony Luck
2015-06-25 08:49:45 +0800
fc6daaf93 mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute ... Browse Code »

Some high end Intel Xeon systems report uncorrectable memory errors as a
recoverable machine check. Linux has included code for some time to
process these and just signal the affected processes (or even recover
completely if the error was in a read only page that can be replaced by
reading from disk).

But we have no recovery path for errors encountered during kernel code
execution. Except for some very specific cases were are unlikely to ever
be able to recover.

Enter memory mirroring. Actually 3rd generation of memory mirroing.

Gen1: All memory is mirrored
Pro: No s/w enabling - h/w just gets good data from other side of the
mirror
Con: Halves effective memory capacity available to OS/applications

Gen2: Partial memory mirror - just mirror memory begind some memory controllers
Pro: Keep more of the capacity
Con: Nightmare to enable. Have to choose between allocating from
mirrored memory for safety vs. NUMA local memory for performance

Gen3: Address range partial memory mirror - some mirror on each memory
controller
Pro: Can tune the amount of mirror and keep NUMA performance
Con: I have to write memory management code to implement

The current plan is just to use mirrored memory for kernel allocations.
This has been broken into two phases:

1) This patch series - find the mirrored memory, use it for boot time
allocations

2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the
unused mirrored memory from mm/memblock.c and only give it out to
select kernel allocations (this is still being scoped because
page_alloc.c is scary).

This patch (of 3):

Add extra "flags" to memblock to allow selection of memory based on
attribute. No functional changes

Signed-off-by: Tony Luck
Cc: Xishi Qiu
Cc: Hanjun Guo
Cc: Xiexiuqi
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Yinghai Lu
Cc: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tony Luck
2015-06-25 08:49:44 +0800

14 Nov, 2014

1 commit

f784a3f19 mem-hotplug: reset node managed pages when hot-adding a new pgdat ... Browse Code »

In free_area_init_core(), zone->managed_pages is set to an approximate
value for lowmem, and will be adjusted when the bootmem allocator frees
pages into the buddy system.

But free_area_init_core() is also called by hotadd_new_pgdat() when
hot-adding memory. As a result, zone->managed_pages of the newly added
node's pgdat is set to an approximate value in the very beginning.

Even if the memory on that node has node been onlined,
/sys/device/system/node/nodeXXX/meminfo has wrong value:

hot-add node2 (memory not onlined)
cat /sys/device/system/node/node2/meminfo
Node 2 MemTotal: 33554432 kB
Node 2 MemFree: 0 kB
Node 2 MemUsed: 33554432 kB
Node 2 Active: 0 kB

This patch fixes this problem by reset node managed pages to 0 after
hot-adding a new node.

1. Move reset_managed_pages_done from reset_node_managed_pages() to
reset_all_zones_managed_pages()
2. Make reset_node_managed_pages() non-static
3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat
is initialized

Signed-off-by: Tang Chen
Signed-off-by: Yasuaki Ishimatsu
Cc: [3.16+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2014-11-14 08:17:06 +0800

11 Sep, 2014

1 commit

0a313a998 mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range() ... Browse Code »

Let memblock skip the hotpluggable memory regions in __next_mem_range(),
it is used to to prevent memblock from allocating hotpluggable memory
for the kernel at early time. The code is the same as __next_mem_range_rev().

Clear hotpluggable flag before releasing free pages to the buddy
allocator. If we don't clear hotpluggable flag in
free_low_memory_core_early(), the memory which marked hotpluggable flag
will not free to buddy allocator. Because __next_mem_range() will skip
them.

free_low_memory_core_early
for_each_free_mem_range
for_each_mem_range
__next_mem_range

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Xishi Qiu
Cc: Tejun Heo
Cc: Tang Chen
Cc: Zhang Yanfei
Cc: Wen Congyang
Cc: "Rafael J. Wysocki"
Cc: "H. Peter Anvin"
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xishi Qiu
2014-09-11 06:42:12 +0800

07 Jun, 2014

1 commit

aedf95ea0 mm/memblock.c: call kmemleak directly from memblock_(alloc|free) ... Browse Code »

Kmemleak could ignore memory blocks allocated via memblock_alloc()
leading to false positives during scanning. This patch adds the
corresponding callbacks and removes kmemleak_free_* calls in
mm/nobootmem.c to avoid duplication.

The kmemleak_alloc() in mm/nobootmem.c is kept since
__alloc_memory_core_early() does not use memblock_alloc() directly.

Signed-off-by: Catalin Marinas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Catalin Marinas
2014-06-07 07:08:17 +0800

04 Apr, 2014

1 commit

de4985072 mm/nobootmem.c: mark function as static ... Browse Code »

Mark function as static in nobootmem.c because it is not used outside
this file.

This eliminates the following warning in mm/nobootmem.c:

mm/nobootmem.c:324:15: warning: no previous prototype for `___alloc_bootmem_node' [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria
Reviewed-by: Josh Triplett
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rashika Kheria
2014-04-04 07:21:02 +0800

24 Jan, 2014

3 commits

354f17e1e mm/nobootmem: free_all_bootmem again ... Browse Code »

get_allocated_memblock_reserved_regions_info() should work if it is
compiled in. Extended the ifdef around
get_allocated_memblock_memory_regions_info() to include
get_allocated_memblock_reserved_regions_info() as well. Similar changes
in nobootmem.c/free_low_memory_core_early() where the two functions are
called.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Philipp Hachtmann
Cc: qiuxishi
Cc: David Howells
Cc: Daeseok Youn
Cc: Jiang Liu
Acked-by: Yinghai Lu
Cc: Zhang Yanfei
Cc: Santosh Shilimkar
Cc: Grygorii Strashko
Cc: Tang Chen
Cc: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Philipp Hachtmann
2014-01-24 08:36:52 +0800
5e270e254 mm: free memblock.memory in free_all_bootmem ... Browse Code »

When calling free_all_bootmem() the free areas under memblock's control
are released to the buddy allocator. Additionally the reserved list is
freed if it was reallocated by memblock. The same should apply for the
memory list.

Signed-off-by: Philipp Hachtmann
Reviewed-by: Tejun Heo
Cc: Joonsoo Kim
Cc: Johannes Weiner
Cc: Tang Chen
Cc: Toshi Kani
Cc: Jianguo Wu
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Philipp Hachtmann
2014-01-24 08:36:51 +0800
87379ec8c mm/nobootmem.c: add return value check in __alloc_memory_core_early() ... Browse Code »

When memblock_reserve() fails because memblock.reserved.regions cannot
be resized, the caller (e.g. alloc_bootmem()) is not informed of the
failed allocation. Therefore alloc_bootmem() silently returns the same
pointer again and again.

This patch adds a check for the return value of memblock_reserve() in
__alloc_memory_core().

Signed-off-by: Philipp Hachtmann
Reviewed-by: Tejun Heo
Cc: Joonsoo Kim
Cc: Johannes Weiner
Cc: Tang Chen
Cc: Toshi Kani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Philipp Hachtmann
2014-01-24 08:36:51 +0800

22 Jan, 2014

2 commits

b11542335 mm/memblock: switch to use NUMA_NO_NODE instead of MAX_NUMNODES ... Browse Code »

It's recommended to use NUMA_NO_NODE everywhere to select "process any
node" behavior or to indicate that "no node id specified".

Hence, update __next_free_mem_range*() API's to accept both NUMA_NO_NODE
and MAX_NUMNODES, but emit warning once on MAX_NUMNODES, and correct
corresponding API's documentation to describe new behavior. Also,
update other memblock/nobootmem APIs where MAX_NUMNODES is used
dirrectly.

The change was suggested by Tejun Heo.

Signed-off-by: Grygorii Strashko
Signed-off-by: Santosh Shilimkar
Cc: Yinghai Lu
Cc: Tejun Heo
Cc: "Rafael J. Wysocki"
Cc: Arnd Bergmann
Cc: Christoph Lameter
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Michal Hocko
Cc: Paul Walmsley
Cc: Pavel Machek
Cc: Russell King
Cc: Tony Lindgren
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Grygorii Strashko
2014-01-22 08:19:46 +0800
87029ee93 mm/memblock: reorder parameters of memblock_find_in_range_node ... Browse Code »

Reorder parameters of memblock_find_in_range_node to be consistent with
other memblock APIs.

The change was suggested by Tejun Heo .

Signed-off-by: Grygorii Strashko
Signed-off-by: Santosh Shilimkar
Cc: Yinghai Lu
Cc: Tejun Heo
Cc: "Rafael J. Wysocki"
Cc: Arnd Bergmann
Cc: Christoph Lameter
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Michal Hocko
Cc: Paul Walmsley
Cc: Pavel Machek
Cc: Russell King
Cc: Tony Lindgren
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Grygorii Strashko
2014-01-22 08:19:46 +0800

13 Nov, 2013

1 commit

309d0b391 mm/nobootmem.c: have __free_pages_memory() free in larger chunks. ... Browse Code »

On large memory machines it can take a few minutes to get through
free_all_bootmem().

Currently, when free_all_bootmem() calls __free_pages_memory(), the number
of contiguous pages that __free_pages_memory() passes to the buddy
allocator is limited to BITS_PER_LONG. BITS_PER_LONG was originally
chosen to keep things similar to mm/nobootmem.c. But it is more efficient
to limit it to MAX_ORDER.

base new change
8TB 202s 172s 30s
16TB 401s 351s 50s

That is around 1%-3% improvement on total boot time.

This patch was spun off from the boot time rfc Robin and I had been
working on.

Signed-off-by: Robin Holt
Signed-off-by: Nathan Zimmer
Cc: Robin Holt
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Mike Travis
Cc: Yinghai Lu
Cc: Mel Gorman
Acked-by: Johannes Weiner
Reviewed-by: Wanpeng Li
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robin Holt
2013-11-13 11:09:04 +0800

04 Jul, 2013

2 commits

0c9885347 mm: concentrate modification of totalram_pages into the mm core ... Browse Code »

Concentrate code to modify totalram_pages into the mm core, so the arch
memory initialized code doesn't need to take care of it. With these
changes applied, only following functions from mm core modify global
variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
free_all_bootmem_node(), adjust_managed_page_count().

With this patch applied, it will be much more easier for us to keep
totalram_pages and zone->managed_pages in consistence.

Signed-off-by: Jiang Liu
Acked-by: David Howells
Cc: "H. Peter Anvin"
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Joonsoo Kim
Cc: Kamezawa Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Marek Szyprowski
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tang Chen
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wen Congyang
Cc: Will Deacon
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800
7b4b2a0d6 mm: accurately calculate zone->managed_pages for highmem zones ... Browse Code »

Commit "mm: introduce new field 'managed_pages' to struct zone" assumes
that all highmem pages will be freed into the buddy system by function
mem_init(). But that's not always true, some architectures may reserve
some highmem pages during boot. For example PPC may allocate highmem
pages for giagant HugeTLB pages, and several architectures have code to
check PageReserved flag to exclude highmem pages allocated during boot
when freeing highmem pages into the buddy system.

So treat highmem pages in the same way as normal pages, that is to:
1) reset zone->managed_pages to zero in mem_init().
2) recalculate managed_pages when freeing pages into the buddy system.

Signed-off-by: Jiang Liu
Cc: "H. Peter Anvin"
Cc: Tejun Heo
Cc: Joonsoo Kim
Cc: Yinghai Lu
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Kamezawa Hiroyuki
Cc: Marek Szyprowski
Cc: "Michael S. Tsirkin"
Cc:
Cc: Arnd Bergmann
Cc: Catalin Marinas
Cc: Chris Metcalf
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Ingo Molnar
Cc: Jeremy Fitzhardinge
Cc: Jianguo Wu
Cc: Konrad Rzeszutek Wilk
Cc: Michel Lespinasse
Cc: Rik van Riel
Cc: Rusty Russell
Cc: Tang Chen
Cc: Thomas Gleixner
Cc: Wen Congyang
Cc: Will Deacon
Cc: Yasuaki Ishimatsu
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2013-07-04 07:07:33 +0800

30 Apr, 2013

2 commits

b476e2951 mm, nobootmem: do memset() after memblock_reserve() ... Browse Code »

Currently, we do memset() before reserving the area. This may not cause
any problem, but it is somewhat weird. So change execution order.

Signed-off-by: Joonsoo Kim
Cc: Yinghai Lu
Acked-by: Johannes Weiner
Cc: Jiang Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2013-04-30 06:54:39 +0800
b4def3509 mm, nobootmem: clean-up of free_low_memory_core_early() ... Browse Code »

Remove unused argument and make function static, because there is no user
outside of nobootmem.c

Signed-off-by: Joonsoo Kim
Cc: Yinghai Lu
Acked-by: Johannes Weiner
Cc: Jiang Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2013-04-30 06:54:39 +0800

30 Jan, 2013

2 commits

38fa4175e mm: Add alloc_bootmem_low_pages_nopanic() ... Browse Code »

We don't need to panic in some case, like for swiotlb preallocating.

Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/1359058816-7615-35-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin

Yinghai Lu
2013-01-30 11:32:59 +0800
de65d816a Merge remote-tracking branch 'origin/x86/boot' into x86/mm2 ... Browse Code »

Coming patches to x86/mm2 require the changes and advanced baseline in
x86/boot.

Resolved Conflicts:
arch/x86/kernel/setup.c
mm/nobootmem.c

Signed-off-by: H. Peter Anvin

H. Peter Anvin
2013-01-30 07:10:15 +0800

13 Dec, 2012

1 commit

9feedc9d8 mm: introduce new field "managed_pages" to struct zone ... Browse Code »

Currently a zone's present_pages is calcuated as below, which is
inaccurate and may cause trouble to memory hotplug.

spanned_pages - absent_pages - memmap_pages - dma_reserve.

During fixing bugs caused by inaccurate zone->present_pages, we found
zone->present_pages has been abused. The field zone->present_pages may
have different meanings in different contexts:

1) pages existing in a zone.
2) pages managed by the buddy system.

For more discussions about the issue, please refer to:
http://lkml.org/lkml/2012/11/5/866
https://patchwork.kernel.org/patch/1346751/

This patchset tries to introduce a new field named "managed_pages" to
struct zone, which counts "pages managed by the buddy system". And revert
zone->present_pages to count "physical pages existing in a zone", which
also keep in consistence with pgdat->node_present_pages.

We will set an initial value for zone->managed_pages in function
free_area_init_core() and will adjust it later if the initial value is
inaccurate.

For DMA/normal zones, the initial value is set to:

(spanned_pages - absent_pages - memmap_pages - dma_reserve)

Later zone->managed_pages will be adjusted to the accurate value when the
bootmem allocator frees all free pages to the buddy system in function
free_all_bootmem_node() and free_all_bootmem().

The bootmem allocator doesn't touch highmem pages, so highmem zones'
managed_pages is set to the accurate value "spanned_pages - absent_pages"
in function free_area_init_core() and won't be updated anymore.

This patch also adds a new field "managed_pages" to /proc/zoneinfo
and sysrq showmem.

[akpm@linux-foundation.org: small comment tweaks]
Signed-off-by: Jiang Liu
Cc: Wen Congyang
Cc: David Rientjes
Cc: Maciej Rutecki
Tested-by: Chris Clayton
Cc: "Rafael J . Wysocki"
Cc: Mel Gorman
Cc: Minchan Kim
Cc: KAMEZAWA Hiroyuki
Cc: Michal Hocko
Cc: Jianguo Wu
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiang Liu
2012-12-13 09:38:34 +0800

18 Nov, 2012

1 commit

600cc5b7f mm: Kill NO_BOOTMEM version free_all_bootmem_node() ... Browse Code »

Now NO_BOOTMEM version free_all_bootmem_node() does not really
do free_bootmem at all, and it only call register_page_bootmem_info_node
for online nodes instead.

That is confusing.

We can kill that free_all_bootmem_node(), after we kill two callings
in x86 and sparc.

Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/1353123563-3103-46-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin

Yinghai Lu
2012-11-18 03:59:50 +0800

17 Nov, 2012

1 commit

5576646f3 revert "mm: fix-up zone present pages" ... Browse Code »

Revert commit 7f1290f2f2a4 ("mm: fix-up zone present pages")

That patch tried to fix a issue when calculating zone->present_pages,
but it caused a regression on 32bit systems with HIGHMEM. With that
change, reset_zone_present_pages() resets all zone->present_pages to
zero, and fixup_zone_present_pages() is called to recalculate
zone->present_pages when the boot allocator frees core memory pages into
buddy allocator. Because highmem pages are not freed by bootmem
allocator, all highmem zones' present_pages becomes zero.

Various options for improving the situation are being discussed but for
now, let's return to the 3.6 code.

Cc: Jianguo Wu
Cc: Jiang Liu
Cc: Petr Tesarik
Cc: "Luck, Tony"
Cc: Mel Gorman
Cc: Yinghai Lu
Cc: Minchan Kim
Cc: Johannes Weiner
Acked-by: David Rientjes
Tested-by: Chris Clayton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-11-17 06:33:04 +0800

09 Oct, 2012

2 commits

7f1290f2f mm: fix-up zone present pages ... Browse Code »

I think zone->present_pages indicates pages that buddy system can management,
it should be:

zone->present_pages = spanned pages - absent pages - bootmem pages,

but is now:
zone->present_pages = spanned pages - absent pages - memmap pages.

spanned pages: total size, including holes.
absent pages: holes.
bootmem pages: pages used in system boot, managed by bootmem allocator.
memmap pages: pages used by page structs.

This may cause zone->present_pages less than it should be. For example,
numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other
bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's
present_pages should be spanned pages - absent pages, but now it also
minus memmap pages(free_area_init_core), which are actually allocated from
ZONE_MOVABLE. When offlining all memory of a zone, this will cause
zone->present_pages less than 0, because present_pages is unsigned long
type, it is actually a very large integer, it indirectly caused
zone->watermark[WMARK_MIN] becomes a large
integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a
large integer(calculate_totalreserve_pages()), and finally cause memory
allocating failure when fork process(__vm_enough_memory()).

[root@localhost ~]# dmesg
-bash: fork: Cannot allocate memory

I think the bug described in

http://marc.info/?l=linux-mm&m=134502182714186&w=2

is also caused by wrong zone present pages.

This patch intends to fix-up zone->present_pages when memory are freed to
buddy system on x86_64 and IA64 platforms.

Signed-off-by: Jianguo Wu
Signed-off-by: Jiang Liu
Reported-by: Petr Tesarik
Tested-by: Petr Tesarik
Cc: "Luck, Tony"
Cc: Mel Gorman
Cc: Yinghai Lu
Cc: Minchan Kim
Cc: Johannes Weiner
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jianguo Wu
2012-10-09 15:22:54 +0800
f2d52fe51 mm/memblock: cleanup early_node_map[] related comments ... Browse Code »

Commit 0ee332c14518 ("memblock: Kill early_node_map[]") removed
early_node_map[]. Clean up the comments to comply with that change.

Signed-off-by: Wanpeng Li
Cc: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Cc: Minchan Kim
Cc: Gavin Shan
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2012-10-09 15:22:47 +0800

12 Jul, 2012

2 commits

29f673860 memblock: free allocated memblock_reserved_regions later ... Browse Code »

memblock_free_reserved_regions() calls memblock_free(), but
memblock_free() would double reserved.regions too, so we could free the
old range for reserved.regions.

Also tj said there is another bug which could be related to this.

| I don't think we're saving any noticeable
| amount by doing this "free - give it to page allocator - reserve
| again" dancing. We should just allocate regions aligned to page
| boundaries and free them later when memblock is no longer in use.

in that case, when DEBUG_PAGEALLOC, will get panic:

memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
BUG: unable to handle kernel paging request at ffff88102febd948
IP: [] __next_free_mem_range+0x9b/0x155
PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 0
Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
RIP: 0010:[] [] __next_free_mem_range+0x9b/0x155

See the discussion at https://lkml.org/lkml/2012/6/13/469

So try to allocate with PAGE_SIZE alignment and free it later.

Reported-by: Sasha Levin
Acked-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Signed-off-by: Yinghai Lu
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2012-07-12 07:04:50 +0800
99ab7b194 mm: sparse: fix usemap allocation above node descriptor section ... Browse Code »
43

After commit f5bf18fa22f8 ("bootmem/sparsemem: remove limit constraint
in alloc_bootmem_section"), usemap allocations may easily be placed
outside the optimal section that holds the node descriptor, even if
there is space available in that section. This results in unnecessary
hotplug dependencies that need to have the node unplugged before the
section holding the usemap.

The reason is that the bootmem allocator doesn't guarantee a linear
search starting from the passed allocation goal but may start out at a
much higher address absent an upper limit.

Fix this by trying the allocation with the limit at the section end,
then retry without if that fails. This keeps the fix from f5bf18fa22f8
of not panicking if the allocation does not fit in the section, but
still makes sure to try to stay within the section at first.

Signed-off-by: Yinghai Lu
Signed-off-by: Johannes Weiner
Cc: [3.3.x, 3.4.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2012-07-12 07:04:49 +0800

30 May, 2012

3 commits

238305bb4 mm: remove sparsemem allocation details from the bootmem allocator ... Browse Code »
43

alloc_bootmem_section() derives allocation area constraints from the
specified sparsemem section. This is a bit specific for a generic memory
allocator like bootmem, though, so move it over to sparsemem.

As __alloc_bootmem_node_nopanic() already retries failed allocations with
relaxed area constraints, the fallback code in sparsemem.c can be removed
and the code becomes a bit more compact overall.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Johannes Weiner
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:22 +0800
ba5398683 mm: nobootmem: unify allocation policy of (non-)panicking node allocations ... Browse Code »

While the panicking node-specific allocation function tries to satisfy
node+goal, goal, node, anywhere, the non-panicking function still does
node+goal, goal, anywhere.

Make it simpler: define the panicking version in terms of the non-panicking
one, like the node-agnostic interface, so they always behave the same way
apart from how to deal with allocation failure.

Signed-off-by: Johannes Weiner
Acked-by: Yinghai Lu
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:22 +0800
2c478eae9 mm: nobootmem: panic on node-specific allocation failure ... Browse Code »

__alloc_bootmem_node and __alloc_bootmem_low_node documentation claims
the functions panic on allocation failure. Do it.

Signed-off-by: Johannes Weiner
Acked-by: Yinghai Lu
Acked-by: Tejun Heo
Acked-by: David S. Miller
Cc: Gavin Shan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2012-05-30 07:22:22 +0800

11 May, 2012

1 commit

6bc2e853c mm: nobootmem: fix sign extend problem in __free_pages_memory() ... Browse Code »
1

Systems with 8 TBytes of memory or greater can hit a problem where only
the the first 8 TB of memory shows up. This is due to "int i" being
smaller than "unsigned long start_aligned", causing the high bits to be
dropped.

The fix is to change `i' to unsigned long to match start_aligned
and end_aligned.

Thanks to Jack Steiner for assistance tracking this down.

Signed-off-by: Russ Anderson
Cc: Jack Steiner
Cc: Johannes Weiner
Cc: Tejun Heo
Cc: David S. Miller
Cc: Yinghai Lu
Cc: Gavin Shan
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Russ Anderson
2012-05-11 06:06:44 +0800

26 Apr, 2012

1 commit

4e1c2b284 mm: nobootmem: Correct alloc_bootmem semantics. ... Browse Code »

The comments above __alloc_bootmem_node() claim that the code will
first try the allocation using 'goal' and if that fails it will
try again but with the 'goal' requirement dropped.

Unfortunately, this is not what the code does, so fix it to do so.

This is important for nobootmem conversions to architectures such
as sparc where MAX_DMA_ADDRESS is infinity.

On such architectures all of the allocations done by generic spots,
such as the sparse-vmemmap implementation, will pass in:

__pa(MAX_DMA_ADDRESS)

as the goal, and with the limit given as "-1" this will always fail
unless we add the appropriate fallback logic here.

Signed-off-by: David S. Miller
Acked-by: Yinghai Lu
Signed-off-by: Linus Torvalds

David Miller
2012-04-26 12:18:01 +0800

29 Nov, 2011

1 commit

d4bbf7e77 Merge branch 'master' into x86/memblock ... Browse Code »

Conflicts & resolutions:

* arch/x86/xen/setup.c

dc91c728fd "xen: allow extra memory to be in multiple regions"
24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..."

conflicted on xen_add_extra_mem() updates. The resolution is
trivial as the latter just want to replace
memblock_x86_reserve_range() with memblock_reserve().

* drivers/pci/intel-iommu.c

166e9278a3f "x86/ia64: intel-iommu: move to drivers/iommu/"
5dfe8660a3d "bootmem: Replace work_with_active_regions() with..."

conflicted as the former moved the file under drivers/iommu/.
Resolved by applying the chnages from the latter on the moved
file.

* mm/Kconfig

6661672053a "memblock: add NO_BOOTMEM config symbol"
c378ddd53f9 "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option"

conflicted trivially. Both added config options. Just
letting both add their own options resolves the conflict.

* mm/memblock.c

d1f0ece6cdc "mm/memblock.c: small function definition fixes"
ed7b56a799c "memblock: Remove memblock_memory_can_coalesce()"

confliected. The former updates function removed by the
latter. Resolution is trivial.

Signed-off-by: Tejun Heo

Tejun Heo
2011-11-29 01:46:22 +0800

31 Oct, 2011

1 commit

b95f1b31b mm: Map most files to use export.h instead of module.h ... Browse Code »

The files changed within are only using the EXPORT_SYMBOL
macro variants. They are not using core modular infrastructure
and hence don't need module.h but only the export.h header.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

15 Jul, 2011

1 commit

24aa07882 memblock, x86: Replace memblock_x86_reserve/free_range() with generic ones ... Browse Code »
43

Other than sanity check and debug message, the x86 specific version of
memblock reserve/free functions are simple wrappers around the generic
versions - memblock_reserve/free().

This patch adds debug messages with caller identification to the
generic versions and replaces x86 specific ones and kills them.
arch/x86/include/asm/memblock.h and arch/x86/mm/memblock.c are empty
after this change and removed.

Signed-off-by: Tejun Heo
Link: http://lkml.kernel.org/r/1310462166-31469-14-git-send-email-tj@kernel.org
Cc: Yinghai Lu
Cc: Benjamin Herrenschmidt
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Signed-off-by: H. Peter Anvin

Tejun Heo
2011-07-15 02:47:53 +0800