Eric Lee / smarc-fsl-linux-kernel

29 Mar, 2018

1 commit

99b6ead44 Revert "mm: page_alloc: skip over regions of invalid pfns where possible" ... Browse Code »

commit f59f1caf72ba00d519c793c3deb32cd3be32edc2 upstream.

This reverts commit b92df1de5d28 ("mm: page_alloc: skip over regions of
invalid pfns where possible"). The commit is meant to be a boot init
speed up skipping the loop in memmap_init_zone() for invalid pfns.

But given some specific memory mapping on x86_64 (or more generally
theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the
implementation also skips valid pfns which is plain wrong and causes
'kernel BUG at mm/page_alloc.c:1389!'

crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
kernel BUG at mm/page_alloc.c:1389!
invalid opcode: 0000 [#1] SMP
--
RIP: 0010: move_freepages+0x15e/0x160
--
Call Trace:
move_freepages_block+0x73/0x80
__rmqueue+0x263/0x460
get_page_from_freelist+0x7e1/0x9e0
__alloc_pages_nodemask+0x176/0x420
--

crash> page_init_bug -v | grep RAM
1000 - 9bfff System RAM (620.00 KiB)
100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB)
4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB)
7b788000 - 7b7fffff System RAM (480.00 KiB)
100000000 - 67fffffff System RAM ( 22.00 GiB)

crash> page_init_bug | head -6
7b788000 - 7b7fffff System RAM (480.00 KiB)
1fffff00000000 0 1 DMA32 4096 1048575
505736 505344 505855
0 0 0 DMA 1 4095
1fffff00000400 0 1 DMA32 4096 1048575
BUG, zones differ!

crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001ed7fc0 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 0 0 <<<<
ffffea0001ede1c0 7b787000 0 0 0 0
ffffea0001ede200 7b788000 0 0 1 1fffff00000000

Link: http://lkml.kernel.org/r/20180316143855.29838-1-neelx@redhat.com
Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek
Acked-by: Ard Biesheuvel
Acked-by: Michal Hocko
Reviewed-by: Andrew Morton
Cc: Vlastimil Babka
Cc: Mel Gorman
Cc: Pavel Tatashin
Cc: Paul Burton
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Daniel Vacek
2018-03-29 00:24:39 +0800

15 Mar, 2018

1 commit

88b3e6acb mm/memblock.c: hardcode the end_pfn being -1 ... Browse Code »

commit 379b03b7fa05f7db521b7732a52692448a3c34fe upstream.

This is just a cleanup. It aids handling the special end case in the
next commit.

[akpm@linux-foundation.org: make it work against current -linus, not against -mm]
[akpm@linux-foundation.org: make it work against current -linus, not against -mm some more]
Link: http://lkml.kernel.org/r/1ca478d4269125a99bcfb1ca04d7b88ac1aee924.1520011944.git.neelx@redhat.com
Signed-off-by: Daniel Vacek
Cc: Michal Hocko
Cc: Vlastimil Babka
Cc: Mel Gorman
Cc: Pavel Tatashin
Cc: Paul Burton
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Daniel Vacek
2018-03-15 17:54:32 +0800

26 Aug, 2017

1 commit

91b540f98 mm/memblock.c: reversed logic in memblock_discard() ... Browse Code »

In recently introduced memblock_discard() there is a reversed logic bug.
Memory is freed of static array instead of dynamically allocated one.

Link: http://lkml.kernel.org/r/1503511441-95478-2-git-send-email-pasha.tatashin@oracle.com
Fixes: 3010f876500f ("mm: discard memblock data later")
Signed-off-by: Pavel Tatashin
Reported-by: Woody Suwalski
Tested-by: Woody Suwalski
Acked-by: Michal Hocko
Cc: Vlastimil Babka
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Tatashin
2017-08-26 07:12:46 +0800

19 Aug, 2017

1 commit

3010f8765 mm: discard memblock data later ... Browse Code »

There is existing use after free bug when deferred struct pages are
enabled:

The memblock_add() allocates memory for the memory array if more than
128 entries are needed. See comment in e820__memblock_setup():

* The bootstrap memblock region count maximum is 128 entries
* (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
* than that - so allow memblock resizing.

This memblock memory is freed here:
free_low_memory_core_early()

We access the freed memblock.memory later in boot when deferred pages
are initialized in this path:

deferred_init_memmap()
for_each_mem_pfn_range()
__next_mem_pfn_range()
type = &memblock.memory;

One possible explanation for why this use-after-free hasn't been hit
before is that the limit of INIT_MEMBLOCK_REGIONS has never been
exceeded at least on systems where deferred struct pages were enabled.

Tested by reducing INIT_MEMBLOCK_REGIONS down to 4 from the current 128,
and verifying in qemu that this code is getting excuted and that the
freed pages are sane.

Link: http://lkml.kernel.org/r/1502485554-318703-2-git-send-email-pasha.tatashin@oracle.com
Fixes: 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
Signed-off-by: Pavel Tatashin
Reviewed-by: Steven Sistare
Reviewed-by: Daniel Jordan
Reviewed-by: Bob Picco
Acked-by: Michal Hocko
Cc: Mel Gorman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Tatashin
2017-08-19 06:32:01 +0800

07 Jul, 2017

2 commits

4932381ee mm, memory_hotplug: move movable_node to the hotplug proper ... Browse Code »

movable_node_is_enabled is defined in memblock proper while it is
initialized from the memory hotplug proper. This is quite messy and it
makes a dependency between the two so move movable_node along with the
helper functions to memory_hotplug.

To make it more entertaining the kernel parameter is ignored unless
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y because we do not have the node
information for each memblock otherwise. So let's warn when the option
is disabled.

Link: http://lkml.kernel.org/r/20170529114141.536-4-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Reza Arbab
Acked-by: Vlastimil Babka
Cc: Mel Gorman
Cc: Andrea Arcangeli
Cc: Jerome Glisse
Cc: Yasuaki Ishimatsu
Cc: Xishi Qiu
Cc: Kani Toshimitsu
Cc: Chen Yucong
Cc: Joonsoo Kim
Cc: Andi Kleen
Cc: David Rientjes
Cc: Daniel Kiper
Cc: Igor Mammedov
Cc: Vitaly Kuznetsov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-07-07 07:24:35 +0800
f70029bba mm, memory_hotplug: drop CONFIG_MOVABLE_NODE ... Browse Code »

Commit 20b2f52b73fe ("numa: add CONFIG_MOVABLE_NODE for
movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a
good explanation on why it is actually useful.

It makes a lot of sense to make movable node semantic opt in but we
already have that because the feature has to be explicitly enabled on
the kernel command line. A config option on top only makes the
configuration space larger without a good reason. It also adds an
additional ifdefery that pollutes the code.

Just drop the config option and make it de-facto always enabled. This
shouldn't introduce any change to the semantic.

Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Reza Arbab
Acked-by: Vlastimil Babka
Cc: Mel Gorman
Cc: Andrea Arcangeli
Cc: Jerome Glisse
Cc: Yasuaki Ishimatsu
Cc: Xishi Qiu
Cc: Kani Toshimitsu
Cc: Chen Yucong
Cc: Joonsoo Kim
Cc: Andi Kleen
Cc: David Rientjes
Cc: Daniel Kiper
Cc: Igor Mammedov
Cc: Vitaly Kuznetsov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-07-07 07:24:35 +0800

03 Jun, 2017

1 commit

864b9a393 mm: consider memblock reservations for deferred memory initialization sizing ... Browse Code »

We have seen an early OOM killer invocation on ppc64 systems with
crashkernel=4096M:

kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
kthreadd cpuset=/ mems_allowed=7
CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
Call Trace:
dump_stack+0xb0/0xf0 (unreliable)
dump_header+0xb0/0x258
out_of_memory+0x5f0/0x640
__alloc_pages_nodemask+0xa8c/0xc80
kmem_getpages+0x84/0x1a0
fallback_alloc+0x2a4/0x320
kmem_cache_alloc_node+0xc0/0x2e0
copy_process.isra.25+0x260/0x1b30
_do_fork+0x94/0x470
kernel_thread+0x48/0x60
kthreadd+0x264/0x330
ret_from_kernel_thread+0x5c/0xa4

Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:5 slab_unreclaimable:73
mapped:0 shmem:0 pagetables:0 bounce:0
free:0 free_pcp:0 free_cma:0
Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
0 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
819200 pages RAM
0 pages HighMem/MovableOnly
817481 pages reserved
0 pages cma reserved
0 pages hwpoisoned

the reason is that the managed memory is too low (only 110MB) while the
rest of the the 50GB is still waiting for the deferred intialization to
be done. update_defer_init estimates the initial memoty to initialize
to 2GB at least but it doesn't consider any memory allocated in that
range. In this particular case we've had

Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)

so the low 2GB is mostly depleted.

Fix this by considering memblock allocations in the initial static
initialization estimation. Move the max_initialise to
reset_deferred_meminit and implement a simple memblock_reserved_memory
helper which iterates all reserved blocks and sums the size of all that
start below the given address. The cumulative size is than added on top
of the initial estimation. This is still not ideal because
reset_deferred_meminit doesn't consider holes and so reservation might
be above the initial estimation whihch we ignore but let's make the
logic simpler until we really need to handle more complicated cases.

Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
Signed-off-by: Michal Hocko
Acked-by: Mel Gorman
Tested-by: Srikar Dronamraju
Cc: [4.2+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2017-06-03 06:07:38 +0800

06 Apr, 2017

2 commits

c9ca9b4e2 memblock: add memblock_cap_memory_range() ... Browse Code »

Add memblock_cap_memory_range() which will remove all the memblock regions
except the memory range specified in the arguments. In addition, rework is
done on memblock_mem_limit_remove_map() to re-implement it using
memblock_cap_memory_range().

This function, like memblock_mem_limit_remove_map(), will not remove
memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
later as "device memory."
See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
address the mem limit issue").

This function is used, in a succeeding patch in the series of arm64 kdump
suuport, to limit the range of usable memory, or System RAM, on crash dump
kernel.
(Please note that "mem=" parameter is of little use for this purpose.)

Signed-off-by: AKASHI Takahiro
Reviewed-by: Will Deacon
Acked-by: Catalin Marinas
Acked-by: Dennis Chen
Cc: linux-mm@kvack.org
Cc: Andrew Morton
Reviewed-by: Ard Biesheuvel
Signed-off-by: Catalin Marinas

AKASHI Takahiro
2017-04-06 01:26:50 +0800
4c546b8a3 memblock: add memblock_clear_nomap() ... Browse Code »

This function, with a combination of memblock_mark_nomap(), will be used
in a later kdump patch for arm64 when it temporarily isolates some range
of memory from the other memory blocks in order to create a specific
kernel mapping at boot time.

Signed-off-by: AKASHI Takahiro
Reviewed-by: Ard Biesheuvel
Signed-off-by: Catalin Marinas

AKASHI Takahiro
2017-04-06 01:26:46 +0800

10 Mar, 2017

1 commit

c9a1b80da mm/memblock.c: fix memblock_next_valid_pfn() ... Browse Code »

Obviously, we should not access memblock.memory.regions[right] if
'right' is outside of [0..memblock.memory.cnt>.

Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Link: http://lkml.kernel.org/r/20170303023745.9104-1-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro
Cc: Paul Burton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

AKASHI Takahiro
2017-03-10 09:01:10 +0800

25 Feb, 2017

3 commits

0262d9c84 memblock: embed memblock type name within struct memblock_type ... Browse Code »

Provide the name of each memblock type with struct memblock_type. This
allows to get rid of the function memblock_type_name() and duplicating
the type names in __memblock_dump_all().

The only memblock_type usage out of mm/memblock.c seems to be
arch/s390/kernel/crash_dump.c. While at it, give it a name.

Link: http://lkml.kernel.org/r/20170120123456.46508-4-heiko.carstens@de.ibm.com
Signed-off-by: Heiko Carstens
Cc: Philipp Hachtmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2017-02-25 09:46:54 +0800
409efd4c9 memblock: also dump physmem list within __memblock_dump_all ... Browse Code »

Since commit 70210ed950b5 ("mm/memblock: add physical memory list") the
memblock structure knows about a physical memory list.

The physical memory list should also be dumped if memblock_dump_all() is
called in case memblock_debug is switched on. This makes debugging a
bit easier.

Link: http://lkml.kernel.org/r/20170120123456.46508-3-heiko.carstens@de.ibm.com
Signed-off-by: Heiko Carstens
Cc: Philipp Hachtmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2017-02-25 09:46:54 +0800
7409c5f73 memblock: let memblock_type_name know about physmem type ... Browse Code »

Since commit 70210ed950b5 ("mm/memblock: add physical memory list") the
memblock structure knows about a physical memory list.

memblock_type_name() should return "physmem" instead of "unknown" if the
name of the physmem memblock_type is being asked for.

Link: http://lkml.kernel.org/r/20170120123456.46508-2-heiko.carstens@de.ibm.com
Signed-off-by: Heiko Carstens
Cc: Philipp Hachtmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2017-02-25 09:46:53 +0800

23 Feb, 2017

4 commits

5d63f81c9 mm/memblock.c: remove unnecessary log and clean up ... Browse Code »

There is no variable named flags in memblock_add() and
memblock_reserve() so remove it from the log messages.

This patch also cleans up the type casting for phys_addr_t by using %pa
to print them.

Link: http://lkml.kernel.org/r/1484720165-25403-1-git-send-email-miles.chen@mediatek.com
Signed-off-by: Miles Chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miles Chen
2017-02-23 08:41:30 +0800
7d41c03e2 mm/memblock.c: check return value of memblock_reserve() in memblock_virt_alloc_internal() ... Browse Code »

memblock_reserve() would add a new range to memblock.reserved in case
the new range is not totally covered by any of the current
memblock.reserved range. If the memblock.reserved is full and can't
resize, memblock_reserve() would fail.

This doesn't happen in real world now, I observed this during code
review. While theoretically, it has the chance to happen. And if it
happens, others would think this range of memory is still available and
may corrupt the memory.

This patch checks the return value and goto "done" after it succeeds.

Link: http://lkml.kernel.org/r/1482363033-24754-3-git-send-email-richard.weiyang@gmail.com
Signed-off-by: Wei Yang
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wei Yang
2017-02-23 08:41:29 +0800
ef415ef41 mm/memblock.c: trivial code refine in memblock_is_region_memory() ... Browse Code »

memblock_is_region_memory() invoke memblock_search() to see whether the
base address is in the memory region. If it fails, idx would be -1.
Then, it returns 0.

If the memblock_search() returns a valid index, it means the base
address is guaranteed to be in the range memblock.memory.regions[idx].
Because of this, it is not necessary to check the base again.

This patch removes the check on "base".

Link: http://lkml.kernel.org/r/1482363033-24754-2-git-send-email-richard.weiyang@gmail.com
Signed-off-by: Wei Yang
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wei Yang
2017-02-23 08:41:29 +0800
b92df1de5 mm: page_alloc: skip over regions of invalid pfns where possible ... Browse Code »

When using a sparse memory model memmap_init_zone() when invoked with
the MEMMAP_EARLY context will skip over pages which aren't valid - ie.
which aren't in a populated region of the sparse memory map. However if
the memory map is extremely sparse then it can spend a long time
linearly checking each PFN in a large non-populated region of the memory
map & skipping it in turn.

When CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled, we have sufficient
information to quickly discover the next valid PFN given an invalid one
by searching through the list of memory regions & skipping forwards to
the first PFN covered by the memory region to the right of the
non-populated region. Implement this in order to speed up
memmap_init_zone() for systems with extremely sparse memory maps.

James said "I have tested this patch on a virtual model of a Samurai CPU
with a sparse memory map. The kernel boot time drops from 109 to
62 seconds. "

Link: http://lkml.kernel.org/r/20161125185518.29885-1-paul.burton@imgtec.com
Signed-off-by: Paul Burton
Tested-by: James Hartley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Burton
2017-02-23 08:41:29 +0800

12 Oct, 2016

1 commit

9099daed9 mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping ... Browse Code »

Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a
physical address to a virtual one using __va(). However, such physical
addresses may sometimes be located in highmem and using __va() is
incorrect, leading to inconsistent object tracking in kmemleak.

The following functions have been added to the kmemleak API and they take
a physical address as the object pointer. They only perform the
corresponding action if the address has a lowmem mapping:

kmemleak_alloc_phys
kmemleak_free_part_phys
kmemleak_not_leak_phys
kmemleak_ignore_phys

The affected calling places have been updated to use the new kmemleak
API.

Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas
Reported-by: Vignesh R
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Catalin Marinas
2016-10-12 06:06:33 +0800

08 Oct, 2016

1 commit

8907de5dc mm/memblock.c: expose total reserved memory ... Browse Code »

The total reserved memory in a system is accounted but not available for
use use outside mm/memblock.c. By exposing the total reserved memory,
systems can better calculate the size of large hashes.

Link: http://lkml.kernel.org/r/1472476010-4709-3-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Srikar Dronamraju
Suggested-by: Mel Gorman
Cc: Vlastimil Babka
Cc: Michal Hocko
Cc: Michael Ellerman
Cc: Mahesh Salgaonkar
Cc: Hari Bathini
Cc: Dave Hansen
Cc: Balbir Singh
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Srikar Dronamraju
2016-10-08 09:46:28 +0800

05 Aug, 2016

2 commits

e47608ab6 mm/memblock.c: fix NULL dereference error ... Browse Code »

It causes NULL dereference error and failure to get type_a->regions[0]
info if parameter type_b of __next_mem_range_rev() == NULL

Fix this by checking before dereferring and initializing idx_b to 0

The approach is tested by dumping all types of region via
__memblock_dump_all() and __next_mem_range_rev() fixed to UART
separately the result is okay after checking the logs.

Link: http://lkml.kernel.org/r/57A0320D.6070102@zoho.com
Signed-off-by: zijun_hu
Tested-by: zijun_hu
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

zijun_hu
2016-08-05 08:02:09 +0800
412d0008d mm/memblock: fix a typo in a comment ... Browse Code »

s/accomodate/accommodate/

Link: http://lkml.kernel.org/r/20160804121824.18100-1-kuleshovmail@gmail.com
Signed-off-by: Alexander Kuleshov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Kuleshov
2016-08-05 08:02:09 +0800

29 Jul, 2016

3 commits

fb399b485 mm/memblock.c: fix index adjustment error in __next_mem_range_rev() ... Browse Code »

Fix region index adjustment error when parameter type_b of
__next_mem_range_rev() == NULL.

Signed-off-by: zijun_hu
Cc: Alexander Kuleshov
Cc: Ard Biesheuvel
Cc: Tang Chen
Cc: Wei Yang
Cc: Tang Chen
Cc: Richard Leitner
Cc: David Gibson
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

zijun_hu
2016-07-29 07:07:41 +0800
a571d4eb5 mm/memblock.c: add new infrastructure to address the mem limit issue ... Browse Code »

In some cases, memblock is queried by kernel to determine whether a
specified address is RAM or not. For example, the ACPI core needs this
information to determine which attributes to use when mapping ACPI
regions(acpi_os_ioremap). Use of incorrect memory types can result in
faults, data corruption, or other issues.

Removing memory with memblock_enforce_memory_limit() throws away this
information, and so a kernel booted with 'mem=' may suffer from the
issues described above. To avoid this, we need to keep those NOMAP
regions instead of removing all above the limit, which preserves the
information we need while preventing other use of those regions.

This patch adds new infrastructure to retain all NOMAP memblock regions
while removing others, to cater for this.

Link: http://lkml.kernel.org/r/1468475036-5852-2-git-send-email-dennis.chen@arm.com
Signed-off-by: Dennis Chen
Acked-by: Steve Capper
Cc: Catalin Marinas
Cc: Ard Biesheuvel
Cc: Pekka Enberg
Cc: Mel Gorman
Cc: Tang Chen
Cc: Tony Luck
Cc: Ingo Molnar
Cc: Rafael J. Wysocki
Cc: Will Deacon
Cc: Mark Rutland
Cc: Matt Fleming
Cc: Kaly Xin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dennis Chen
2016-07-29 07:07:41 +0800
c4c5ad6b3 memblock: include <asm/sections.h> instead of <asm-generic/sections.h> ... Browse Code »

asm-generic headers are generic implementations for architecture
specific code and should not be included by common code. Thus use the
asm/ version of sections.h to get at the linker sections.

Link: http://lkml.kernel.org/r/1468285103-7470-1-git-send-email-hch@lst.de
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2016-07-29 07:07:41 +0800

27 Jul, 2016

1 commit

ef3cc4db4 mm/memblock.c:memblock_add_range(): if nr_new is 0 just return ... Browse Code »

If nr_new is 0 which means there's no region would be added, so just
return to the caller.

Signed-off-by: nimisolo
Cc: Alexander Kuleshov
Cc: Pekka Enberg
Cc: Tony Luck
Cc: Mel Gorman
Cc: Tang Chen
Cc: Wei Yang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

nimisolo
2016-07-27 07:19:19 +0800

21 May, 2016

2 commits

cd33a76b0 mm/memblock.c: remove unnecessary always-true comparison ... Browse Code »

Comparing an u64 variable to >= 0 returns always true and can therefore
be removed. This issue was detected using the -Wtype-limits gcc flag.

This patch fixes following type-limits warning:

mm/memblock.c: In function `__next_reserved_mem_region':
mm/memblock.c:843:11: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
if (*idx >= 0 && *idx < type->cnt) {

Link: http://lkml.kernel.org/r/20160510103625.3a7f8f32@g0hl1n.net
Signed-off-by: Richard Leitner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Richard Leitner
2016-05-21 08:58:30 +0800
f705ac4b3 mm/memblock.c: move memblock_{add,reserve}_region into memblock_{add,reserve} ... Browse Code »

memblock_add_region() and memblock_reserve_region() do nothing specific
before the call of memblock_add_range(), only print debug output.

We can do the same in memblock_add() and memblock_reserve() since both
memblock_add_region() and memblock_reserve_region() are not used by
anybody outside of memblock.c and memblock_{add,reserve}() have the same
set of flags and nids.

Since memblock_add_region() and memblock_reserve_region() will be
inlined, there will not be functional changes, but will improve code
readability a little.

Signed-off-by: Alexander Kuleshov
Acked-by: Ard Biesheuvel
Cc: Mel Gorman
Cc: Pekka Enberg
Cc: Tony Luck
Cc: Tang Chen
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Kuleshov
2016-05-21 08:58:30 +0800

18 Mar, 2016

1 commit

756a025f0 mm: coalesce split strings ... Browse Code »

Kernel style prefers a single string over split strings when the string is
'user-visible'.

Miscellanea:

- Add a missing newline
- Realign arguments

Signed-off-by: Joe Perches
Acked-by: Tejun Heo [percpu]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2016-03-18 06:09:34 +0800

16 Mar, 2016

1 commit

5aa174801 mm/memblock.c: remove unnecessary memblock_type variable ... Browse Code »

We define struct memblock_type *type in the memblock_add_region() and
memblock_reserve_region() functions only for passing it to the
memlock_add_range() and memblock_reserve_range() functions. Let's
remove these variables and will pass a type directly.

Signed-off-by: Alexander Kuleshov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Kuleshov
2016-03-16 07:55:16 +0800

06 Feb, 2016

1 commit

1f1ffb8a1 memblock: don't mark memblock_phys_mem_size() as __init ... Browse Code »

At the moment memblock_phys_mem_size() is marked as __init, and so is
discarded after boot. This is different from most of the memblock
functions which are marked __init_memblock, and are only discarded after
boot if memory hotplug is not configured.

To allow for upcoming code which will need memblock_phys_mem_size() in
the hotplug path, change it from __init to __init_memblock.

Signed-off-by: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Gibson
2016-02-06 10:10:40 +0800

15 Jan, 2016

3 commits

8c9c1701c mm/memblock: introduce for_each_memblock_type() ... Browse Code »

We already have the for_each_memblock() macro in
which provides ability to iterate over memblock regions of a known type.
The for_each_memblock() macro allows us to pass the pointer to the
struct memblock_type, instead we need to pass name of the type.

This patch introduces a new macro for_each_memblock_type() which allows
us iterate over memblock regions with the given type when the type is
unknown.

Signed-off-by: Alexander Kuleshov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Kuleshov
2016-01-15 08:00:49 +0800
f14516fbf mm/memblock: remove rgnbase and rgnsize variables ... Browse Code »

Remove rgnbase and rgnsize variables from memblock_overlaps_region().
We use these variables only for passing to the memblock_addrs_overlap()
function and that's all. Let's remove them.

Signed-off-by: Alexander Kuleshov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Kuleshov
2016-01-15 08:00:49 +0800
b4ad0c7e0 mm/memblock.c: memblock_is_memory()/reserved() can be boolean ... Browse Code »

Make memblock_is_memory() and memblock_is_reserved return bool to
improve readability due to these particular functions only using either
one or zero as their return value.

No functional change.

Signed-off-by: Yaowei Bai
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yaowei Bai
2016-01-15 08:00:49 +0800

10 Dec, 2015

1 commit

bf3d3cc58 mm/memblock: add MEMBLOCK_NOMAP attribute to memblock memory table ... Browse Code »

This introduces the MEMBLOCK_NOMAP attribute and the required plumbing
to make it usable as an indicator that some parts of normal memory
should not be covered by the kernel direct mapping. It is up to the
arch to actually honor the attribute when laying out this mapping,
but the memblock code itself is modified to disregard these regions
for allocations and other general use.

Cc: linux-mm@kvack.org
Cc: Alexander Kuleshov
Cc: Andrew Morton
Reviewed-by: Matt Fleming
Signed-off-by: Ard Biesheuvel
Signed-off-by: Will Deacon

Ard Biesheuvel
2015-12-10 00:56:58 +0800

06 Nov, 2015

1 commit

35bd16a22 mm/memblock: make memblock_remove_range() static ... Browse Code »

memblock_remove_range() is only used in the mm/memblock.c, so we can make
it static.

Signed-off-by: Alexander Kuleshov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Kuleshov
2015-11-06 11:34:48 +0800

09 Sep, 2015

5 commits

ad5ea8cd5 mm/memblock.c: fix comment in __next_mem_range() ... Browse Code »

Signed-off-by: Alexander Kuleshov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Kuleshov
2015-09-09 06:35:28 +0800
c11539315 mm/memblock.c: fiy typos in comments ... Browse Code »

s/succees/success/

Signed-off-by: Alexander Kuleshov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Kuleshov
2015-09-09 06:35:28 +0800
567d117b8 mm/memblock.c: rename local variable of memblock_type to 'type' ... Browse Code »

Since commit e3239ff92a17 ("memblock: Rename memblock_region to
memblock_type and memblock_property to memblock_region"), all local
variables of the membock_type type were renamed to 'type'. This commit
renames all remaining local variables with the memblock_type type to the
same view.

Signed-off-by: Alexander Kuleshov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Kuleshov
2015-09-09 06:35:28 +0800
95cf82ecc mem-hotplug: handle node hole when initializing numa_meminfo. ... Browse Code »

When parsing SRAT, all memory ranges are added into numa_meminfo. In
numa_init(), before entering numa_cleanup_meminfo(), all possible memory
ranges are in numa_meminfo. And numa_cleanup_meminfo() removes all
ranges over max_pfn or empty.

But, this only works if the nodes are continuous. Let's have a look at
the following example:

We have an SRAT like this:
SRAT: Node 0 PXM 0 [mem 0x00000000-0x5fffffff]
SRAT: Node 0 PXM 0 [mem 0x100000000-0x1ffffffffff]
SRAT: Node 1 PXM 1 [mem 0x20000000000-0x3ffffffffff]
SRAT: Node 4 PXM 2 [mem 0x40000000000-0x5ffffffffff] hotplug
SRAT: Node 5 PXM 3 [mem 0x60000000000-0x7ffffffffff] hotplug
SRAT: Node 2 PXM 4 [mem 0x80000000000-0x9ffffffffff] hotplug
SRAT: Node 3 PXM 5 [mem 0xa0000000000-0xbffffffffff] hotplug
SRAT: Node 6 PXM 6 [mem 0xc0000000000-0xdffffffffff] hotplug
SRAT: Node 7 PXM 7 [mem 0xe0000000000-0xfffffffffff] hotplug

On boot, only node 0,1,2,3 exist.

And the numa_meminfo will look like this:
numa_meminfo.nr_blks = 9
1. on node 0: [0, 60000000]
2. on node 0: [100000000, 20000000000]
3. on node 1: [20000000000, 40000000000]
4. on node 4: [40000000000, 60000000000]
5. on node 5: [60000000000, 80000000000]
6. on node 2: [80000000000, a0000000000]
7. on node 3: [a0000000000, a0800000000]
8. on node 6: [c0000000000, a0800000000]
9. on node 7: [e0000000000, a0800000000]

And numa_cleanup_meminfo() will merge 1 and 2, and remove 8,9 because the
end address is over max_pfn, which is a0800000000. But 4 and 5 are not
removed because their end addresses are less then max_pfn. But in fact,
node 4 and 5 don't exist.

In a word, numa_cleanup_meminfo() is not able to handle holes between nodes.

Since memory ranges in node 4 and 5 are in numa_meminfo, in
numa_register_memblks(), node 4 and 5 will be mistakenly set to online.

If you run lscpu, it will show:
NUMA node0 CPU(s): 0-14,128-142
NUMA node1 CPU(s): 15-29,143-157
NUMA node2 CPU(s):
NUMA node3 CPU(s):
NUMA node4 CPU(s): 62-76,190-204
NUMA node5 CPU(s): 78-92,206-220

In this patch, we use memblock_overlaps_region() to check if ranges in
numa_meminfo overlap with ranges in memory_block. Since memory_block
contains all available memory at boot time, if they overlap, it means the
ranges exist. If not, then remove them from numa_meminfo.

After this patch, lscpu will show:
NUMA node0 CPU(s): 0-14,128-142
NUMA node1 CPU(s): 15-29,143-157
NUMA node4 CPU(s): 62-76,190-204
NUMA node5 CPU(s): 78-92,206-220

Signed-off-by: Tang Chen
Reviewed-by: Yasuaki Ishimatsu
Cc: Thomas Gleixner
Cc: Tejun Heo
Cc: Luiz Capitulino
Cc: Xishi Qiu
Cc: Will Deacon
Cc: Vladimir Murzin
Cc: Fabian Frederick
Cc: Alexander Kuleshov
Cc: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2015-09-09 06:35:28 +0800
c5c5c9d10 mm/memblock.c: make memblock_overlaps_region() return bool. ... Browse Code »

memblock_overlaps_region() checks if the given memblock region
intersects a region in memblock. If so, it returns the index of the
intersected region.

But its only caller is memblock_is_region_reserved(), and it returns 0
if false, non-zero if true.

Both of these should return bool.

Signed-off-by: Tang Chen
Cc: Thomas Gleixner
Cc: Tejun Heo
Cc: Yasuaki Ishimatsu
Cc: Luiz Capitulino
Cc: Xishi Qiu
Cc: Will Deacon
Cc: Vladimir Murzin
Cc: Fabian Frederick
Cc: Alexander Kuleshov
Cc: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2015-09-09 06:35:28 +0800