Eric Lee / smarc-fsl-linux-kernel

16 Apr, 2015

1 commit

6a4055bc7 mm/memblock.c: add debug output for memblock_add() ... Browse Code »

memblock_reserve() calls memblock_reserve_region() which prints debugging
information if 'memblock=debug' was passed on the command line. This
patch adds the same behaviour, but for memblock_add function().

[akpm@linux-foundation.org: s/memblock_memory/memblock_add/ in message]
Signed-off-by: Alexander Kuleshov
Cc: Martin Schwidefsky
Cc: Philipp Hachtmann
Cc: Fabian Frederick
Cc: Catalin Marinas
Cc: Emil Medve
Cc: Akinobu Mita
Cc: Tang Chen
Cc: Tony Luck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Kuleshov
2015-04-16 07:35:19 +0800

15 Apr, 2015

1 commit

7fc825b45 mm/memblock.c: rename local variable of memblock_type to `type' ... Browse Code »

A small cleanup. Seems in e3239ff9 ("memblock: Rename memblock_region to
memblock_type and memblock_property to memblock_region") this one was
missed.

Signed-off-by: Baoquan He
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Baoquan He
2015-04-15 07:49:00 +0800

14 Dec, 2014

1 commit

4308ce17f mm/memblock.c: refactor functions to set/clear MEMBLOCK_HOTPLUG ... Browse Code »

There is a lot of duplication in the rubric around actually setting or
clearing a mem region flag. Create a new helper function to do this and
reduce each of memblock_mark_hotplug() and memblock_clear_hotplug() to a
single line.

This will be useful if someone were to add a new mem region flag - which
I hope to be doing some day soon. But it looks like a plausible cleanup
even without that - so I'd like to get it out of the way now.

Signed-off-by: Tony Luck
Cc: Santosh Shilimkar
Cc: Tang Chen
Cc: Grygorii Strashko
Cc: Zhang Yanfei
Cc: Philipp Hachtmann
Cc: Yinghai Lu
Cc: Emil Medve
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tony Luck
2014-12-14 04:42:46 +0800

11 Sep, 2014

1 commit

0a313a998 mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range() ... Browse Code »

Let memblock skip the hotpluggable memory regions in __next_mem_range(),
it is used to to prevent memblock from allocating hotpluggable memory
for the kernel at early time. The code is the same as __next_mem_range_rev().

Clear hotpluggable flag before releasing free pages to the buddy
allocator. If we don't clear hotpluggable flag in
free_low_memory_core_early(), the memory which marked hotpluggable flag
will not free to buddy allocator. Because __next_mem_range() will skip
them.

free_low_memory_core_early
for_each_free_mem_range
for_each_mem_range
__next_mem_range

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Xishi Qiu
Cc: Tejun Heo
Cc: Tang Chen
Cc: Zhang Yanfei
Cc: Wen Congyang
Cc: "Rafael J. Wysocki"
Cc: "H. Peter Anvin"
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xishi Qiu
2014-09-11 06:42:12 +0800

30 Aug, 2014

1 commit

0cfb8f0c3 memblock, memhotplug: fix wrong type in memblock_find_in_range_node(). ... Browse Code »

In memblock_find_in_range_node(), we defined ret as int. But it should
be phys_addr_t because it is used to store the return value from
__memblock_find_range_bottom_up().

The bug has not been triggered because when allocating low memory near
the kernel end, the "int ret" won't turn out to be negative. When we
started to allocate memory on other nodes, and the "int ret" could be
minus. Then the kernel will panic.

A simple way to reproduce this: comment out the following code in
numa_init(),

memblock_set_bottom_up(false);

and the kernel won't boot.

Reported-by: Xishi Qiu
Signed-off-by: Tang Chen
Tested-by: Xishi Qiu
Cc: [3.13+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2014-08-30 07:28:15 +0800

07 Jun, 2014

1 commit

aedf95ea0 mm/memblock.c: call kmemleak directly from memblock_(alloc|free) ... Browse Code »

Kmemleak could ignore memory blocks allocated via memblock_alloc()
leading to false positives during scanning. This patch adds the
corresponding callbacks and removes kmemleak_free_* calls in
mm/nobootmem.c to avoid duplication.

The kmemleak_alloc() in mm/nobootmem.c is kept since
__alloc_memory_core_early() does not use memblock_alloc() directly.

Signed-off-by: Catalin Marinas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Catalin Marinas
2014-06-07 07:08:17 +0800

05 Jun, 2014

2 commits

f7e2f7e89 mm/memblock.c: use PFN_DOWN ... Browse Code »

Replace ((x) >> PAGE_SHIFT) with the pfn macro.

Signed-off-by: Fabian Frederick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2014-06-05 07:54:02 +0800
2bfc2862c memblock: introduce memblock_alloc_range() ... Browse Code »

This introduces memblock_alloc_range() which allocates memblock from the
specified range of physical address. I would like to use this function
to specify the location of CMA.

Signed-off-by: Akinobu Mita
Cc: Marek Szyprowski
Cc: Konrad Rzeszutek Wilk
Cc: David Woodhouse
Cc: Don Dutile
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Andi Kleen
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2014-06-05 07:53:57 +0800

20 May, 2014

2 commits

70210ed95 mm/memblock: add physical memory list ... Browse Code »

Add the physmem list to the memblock structure. This list only exists
if HAVE_MEMBLOCK_PHYS_MAP is selected and contains the unmodified
list of physically available memory. It differs from the memblock
memory list as it always contains all memory ranges even if the
memory has been restricted, e.g. by use of the mem= kernel parameter.

Signed-off-by: Philipp Hachtmann
Signed-off-by: Martin Schwidefsky

Philipp Hachtmann
2014-05-20 14:58:39 +0800
f1af9d3af mm/memblock: Do some refactoring, enhance API ... Browse Code »

Refactor the memblock code and extend the memblock API to make it
more flexible. With the extended API it is simple to define and
work with additional memory lists.

The static functions memblock_add_region and __memblock_remove are
renamed to memblock_add_range and meblock_remove_range and added to
the memblock API.

The __next_free_mem_range and __next_free_mem_range_rev functions
are replaced with calls to the more generic list walkers
__next_mem_range and __next_mem_range_rev.

To walk an arbitrary memory list two new macros for_each_mem_range
and for_each_mem_range_rev are added. These new macros are used
to define for_each_free_mem_range and for_each_free_mem_range_reverse.

Signed-off-by: Philipp Hachtmann
Signed-off-by: Martin Schwidefsky

Philipp Hachtmann
2014-05-20 14:58:39 +0800

08 Apr, 2014

2 commits

167632303 mm/memblock.c: use PFN_PHYS() ... Browse Code »

Replace ((phys_addr_t)(x) << PAGE_SHIFT) by pfn macro.

Signed-off-by: Fabian Frederick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2014-04-08 07:35:58 +0800
136199f0a memblock: use for_each_memblock() ... Browse Code »

This is a small cleanup.

Signed-off-by: Emil Medve
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Emil Medve
2014-04-08 07:35:58 +0800

12 Mar, 2014

1 commit

fec510141 ARM: 7993/1: mm/memblock: add memblock_get_current_limit ... Browse Code »

Apart from setting the limit of memblock, it's also useful to be able
to get the limit to avoid recalculating it every time. Add the function
to do so.

Acked-by: Catalin Marinas
Acked-by: Santosh Shilimkar
Acked-by: Andrew Morton
Acked-by: Nicolas Pitre
Signed-off-by: Laura Abbott
Signed-off-by: Russell King

Laura Abbott
2014-03-12 08:16:56 +0800

30 Jan, 2014

1 commit

f544e14f3 memblock: add limit checking to memblock_virt_alloc ... Browse Code »

In original bootmem wrapper for memblock, we have limit checking.

Add it to memblock_virt_alloc, to address arm and x86 booting crash.

Signed-off-by: Yinghai Lu
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Reported-by: Kevin Hilman
Tested-by: Kevin Hilman
Reported-by: Olof Johansson
Tested-by: Olof Johansson
Reported-by: Konrad Rzeszutek Wilk
Tested-by: Konrad Rzeszutek Wilk
Cc: Dave Hansen
Cc: Santosh Shilimkar
Cc: "Strashko, Grygorii"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2014-01-30 08:22:40 +0800

28 Jan, 2014

1 commit

fb5bb60cd memblock: don't silently align size in memblock_virt_alloc() ... Browse Code »

In original __alloc_memory_core_early() for bootmem wrapper, we do not
align size silently.

We should not do that, as later free with old size will leave some range
not freed.

It's obvious that code is copied from memblock_base_nid(), and that code
is wrong for the same reason.

Also remove that in memblock_alloc_base.

Signed-off-by: Yinghai Lu
Acked-by: Santosh Shilimkar
Cc: Dave Hansen
Cc: Russell King
Cc: Konrad Rzeszutek Wilk
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2014-01-28 13:02:39 +0800

24 Jan, 2014

2 commits

354f17e1e mm/nobootmem: free_all_bootmem again ... Browse Code »

get_allocated_memblock_reserved_regions_info() should work if it is
compiled in. Extended the ifdef around
get_allocated_memblock_memory_regions_info() to include
get_allocated_memblock_reserved_regions_info() as well. Similar changes
in nobootmem.c/free_low_memory_core_early() where the two functions are
called.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Philipp Hachtmann
Cc: qiuxishi
Cc: David Howells
Cc: Daeseok Youn
Cc: Jiang Liu
Acked-by: Yinghai Lu
Cc: Zhang Yanfei
Cc: Santosh Shilimkar
Cc: Grygorii Strashko
Cc: Tang Chen
Cc: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Philipp Hachtmann
2014-01-24 08:36:52 +0800
5e270e254 mm: free memblock.memory in free_all_bootmem ... Browse Code »

When calling free_all_bootmem() the free areas under memblock's control
are released to the buddy allocator. Additionally the reserved list is
freed if it was reallocated by memblock. The same should apply for the
memory list.

Signed-off-by: Philipp Hachtmann
Reviewed-by: Tejun Heo
Cc: Joonsoo Kim
Cc: Johannes Weiner
Cc: Tang Chen
Cc: Toshi Kani
Cc: Jianguo Wu
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Philipp Hachtmann
2014-01-24 08:36:51 +0800

22 Jan, 2014

11 commits

560dca27a mm/memblock: use WARN_ONCE when MAX_NUMNODES passed as input parameter ... Browse Code »

Check nid parameter and produce warning if it has deprecated
MAX_NUMNODES value. Also re-assign NUMA_NO_NODE value to the nid
parameter in this case.

These will help to identify the wrong API usage (the caller) and make
code simpler.

Signed-off-by: Grygorii Strashko
Signed-off-by: Santosh Shilimkar
Cc: Yinghai Lu
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Grygorii Strashko
2014-01-22 08:19:48 +0800
26f09e9b3 mm/memblock: add memblock memory allocation apis ... Browse Code »

Introduce memblock memory allocation APIs which allow to support PAE or
LPAE extension on 32 bits archs where the physical memory start address
can be beyond 4GB. In such cases, existing bootmem APIs which operate
on 32 bit addresses won't work and needs memblock layer which operates
on 64 bit addresses.

So we add equivalent APIs so that we can replace usage of bootmem with
memblock interfaces. Architectures already converted to NO_BOOTMEM use
these new memblock interfaces. The architectures which are still not
converted to NO_BOOTMEM continue to function as is because we still
maintain the fal lback option of bootmem back-end supporting these new
interfaces. So no functional change as such.

In long run, once all the architectures moves to NO_BOOTMEM, we can get
rid of bootmem layer completely. This is one step to remove the core
code dependency with bootmem and also gives path for architectures to
move away from bootmem.

The proposed interface will became active if both CONFIG_HAVE_MEMBLOCK
and CONFIG_NO_BOOTMEM are specified by arch. In case
!CONFIG_NO_BOOTMEM, the memblock() wrappers will fallback to the
existing bootmem apis so that arch's not converted to NO_BOOTMEM
continue to work as is.

The meaning of MEMBLOCK_ALLOC_ACCESSIBLE and MEMBLOCK_ALLOC_ANYWHERE
is kept same.

[akpm@linux-foundation.org: s/depricated/deprecated/]
Signed-off-by: Grygorii Strashko
Signed-off-by: Santosh Shilimkar
Cc: Yinghai Lu
Cc: Tejun Heo
Cc: "Rafael J. Wysocki"
Cc: Arnd Bergmann
Cc: Christoph Lameter
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Michal Hocko
Cc: Paul Walmsley
Cc: Pavel Machek
Cc: Russell King
Cc: Tony Lindgren
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Santosh Shilimkar
2014-01-22 08:19:46 +0800
b11542335 mm/memblock: switch to use NUMA_NO_NODE instead of MAX_NUMNODES ... Browse Code »

It's recommended to use NUMA_NO_NODE everywhere to select "process any
node" behavior or to indicate that "no node id specified".

Hence, update __next_free_mem_range*() API's to accept both NUMA_NO_NODE
and MAX_NUMNODES, but emit warning once on MAX_NUMNODES, and correct
corresponding API's documentation to describe new behavior. Also,
update other memblock/nobootmem APIs where MAX_NUMNODES is used
dirrectly.

The change was suggested by Tejun Heo.

Signed-off-by: Grygorii Strashko
Signed-off-by: Santosh Shilimkar
Cc: Yinghai Lu
Cc: Tejun Heo
Cc: "Rafael J. Wysocki"
Cc: Arnd Bergmann
Cc: Christoph Lameter
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Michal Hocko
Cc: Paul Walmsley
Cc: Pavel Machek
Cc: Russell King
Cc: Tony Lindgren
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Grygorii Strashko
2014-01-22 08:19:46 +0800
87029ee93 mm/memblock: reorder parameters of memblock_find_in_range_node ... Browse Code »

Reorder parameters of memblock_find_in_range_node to be consistent with
other memblock APIs.

The change was suggested by Tejun Heo .

Signed-off-by: Grygorii Strashko
Signed-off-by: Santosh Shilimkar
Cc: Yinghai Lu
Cc: Tejun Heo
Cc: "Rafael J. Wysocki"
Cc: Arnd Bergmann
Cc: Christoph Lameter
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Michal Hocko
Cc: Paul Walmsley
Cc: Pavel Machek
Cc: Russell King
Cc: Tony Lindgren
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Grygorii Strashko
2014-01-22 08:19:46 +0800
79f40fab0 mm/memblock: drop WARN and use SMP_CACHE_BYTES as a default alignment ... Browse Code »

Don't produce warning and interpret 0 as "default align" equal to
SMP_CACHE_BYTES in case if caller of memblock_alloc_base_nid() doesn't
specify alignment for the block (align == 0).

This is done in preparation of introducing common memblock alloc interface
to make code behavior consistent. More details are in below thread :

https://lkml.org/lkml/2013/10/13/117.

Signed-off-by: Grygorii Strashko
Signed-off-by: Santosh Shilimkar
Cc: Yinghai Lu
Cc: Tejun Heo
Cc: "Rafael J. Wysocki"
Cc: Arnd Bergmann
Cc: Christoph Lameter
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Michal Hocko
Cc: Paul Walmsley
Cc: Pavel Machek
Cc: Russell King
Cc: Tony Lindgren
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Grygorii Strashko
2014-01-22 08:19:46 +0800
fd615c4e6 mm/memblock: debug: don't free reserved array if !ARCH_DISCARD_MEMBLOCK ... Browse Code »

Now the Nobootmem allocator will always try to free memory allocated for
reserved memory regions (free_low_memory_core_early()) without taking
into to account current memblock debugging configuration
(CONFIG_ARCH_DISCARD_MEMBLOCK and CONFIG_DEBUG_FS state).

As result if:

- CONFIG_DEBUG_FS defined
- CONFIG_ARCH_DISCARD_MEMBLOCK not defined;
- reserved memory regions array have been resized during boot

then:

- memory allocated for reserved memory regions array will be freed to
buddy allocator;
- debug_fs entry "sys/kernel/debug/memblock/reserved" will show garbage
instead of state of memory reservations. like:
0: 0x98393bc0..0x9a393bbf
1: 0xff120000..0xff11ffff
2: 0x00000000..0xffffffff

Hence, do not free memory allocated for reserved memory regions if
defined(CONFIG_DEBUG_FS) && !defined(CONFIG_ARCH_DISCARD_MEMBLOCK).

Signed-off-by: Grygorii Strashko
Signed-off-by: Santosh Shilimkar
Reviewed-by: Tejun Heo
Cc: Yinghai Lu
Cc: "Rafael J. Wysocki"
Cc: Arnd Bergmann
Cc: Christoph Lameter
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Michal Hocko
Cc: Paul Walmsley
Cc: Pavel Machek
Cc: Russell King
Cc: Tony Lindgren
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Grygorii Strashko
2014-01-22 08:19:46 +0800
55ac590c2 memblock, mem_hotplug: make memblock skip hotpluggable regions if needed ... Browse Code »

Linux kernel cannot migrate pages used by the kernel. As a result,
hotpluggable memory used by the kernel won't be able to be hot-removed.
To solve this problem, the basic idea is to prevent memblock from
allocating hotpluggable memory for the kernel at early time, and arrange
all hotpluggable memory in ACPI SRAT(System Resource Affinity Table) as
ZONE_MOVABLE when initializing zones.

In the previous patches, we have marked hotpluggable memory regions with
MEMBLOCK_HOTPLUG flag in memblock.memory.

In this patch, we make memblock skip these hotpluggable memory regions
in the default top-down allocation function if movable_node boot option
is specified.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Tang Chen
Signed-off-by: Zhang Yanfei
Cc: "H. Peter Anvin"
Cc: "Rafael J . Wysocki"
Cc: Chen Tang
Cc: Gong Chen
Cc: Ingo Molnar
Cc: Jiang Liu
Cc: Johannes Weiner
Cc: Lai Jiangshan
Cc: Larry Woodman
Cc: Len Brown
Cc: Liu Jiang
Cc: Mel Gorman
Cc: Michal Nazarewicz
Cc: Minchan Kim
Cc: Prarit Bhargava
Cc: Rik van Riel
Cc: Taku Izumi
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Thomas Renninger
Cc: Toshi Kani
Cc: Vasilis Liaskovitis
Cc: Wanpeng Li
Cc: Wen Congyang
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2014-01-22 08:19:45 +0800
e7e8de591 memblock: make memblock_set_node() support different memblock_type ... Browse Code »

[sfr@canb.auug.org.au: fix powerpc build]
Signed-off-by: Tang Chen
Reviewed-by: Zhang Yanfei
Cc: "H. Peter Anvin"
Cc: "Rafael J . Wysocki"
Cc: Chen Tang
Cc: Gong Chen
Cc: Ingo Molnar
Cc: Jiang Liu
Cc: Johannes Weiner
Cc: Lai Jiangshan
Cc: Larry Woodman
Cc: Len Brown
Cc: Liu Jiang
Cc: Mel Gorman
Cc: Michal Nazarewicz
Cc: Minchan Kim
Cc: Prarit Bhargava
Cc: Rik van Riel
Cc: Taku Izumi
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Thomas Renninger
Cc: Toshi Kani
Cc: Vasilis Liaskovitis
Cc: Wanpeng Li
Cc: Wen Congyang
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2014-01-22 08:19:44 +0800
66b16edf9 memblock, mem_hotplug: introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions ... Browse Code »

In find_hotpluggable_memory, once we find out a memory region which is
hotpluggable, we want to mark them in memblock.memory. So that we could
control memblock allocator not to allocte hotpluggable memory for the
kernel later.

To achieve this goal, we introduce MEMBLOCK_HOTPLUG flag to indicate the
hotpluggable memory regions in memblock and a function
memblock_mark_hotplug() to mark hotpluggable memory if we find one.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Tang Chen
Reviewed-by: Zhang Yanfei
Cc: "H. Peter Anvin"
Cc: "Rafael J . Wysocki"
Cc: Chen Tang
Cc: Gong Chen
Cc: Ingo Molnar
Cc: Jiang Liu
Cc: Johannes Weiner
Cc: Lai Jiangshan
Cc: Larry Woodman
Cc: Len Brown
Cc: Liu Jiang
Cc: Mel Gorman
Cc: Michal Nazarewicz
Cc: Minchan Kim
Cc: Prarit Bhargava
Cc: Rik van Riel
Cc: Taku Izumi
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Thomas Renninger
Cc: Toshi Kani
Cc: Vasilis Liaskovitis
Cc: Wanpeng Li
Cc: Wen Congyang
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2014-01-22 08:19:44 +0800
66a207572 memblock, numa: introduce flags field into memblock ... Browse Code »

There is no flag in memblock to describe what type the memory is.
Sometimes, we may use memblock to reserve some memory for special usage.
And we want to know what kind of memory it is. So we need a way to

In hotplug environment, we want to reserve hotpluggable memory so the
kernel won't be able to use it. And when the system is up, we have to
free these hotpluggable memory to buddy. So we need to mark these
memory first.

In order to do so, we need to mark out these special memory in memblock.
In this patch, we introduce a new "flags" member into memblock_region:

struct memblock_region {
phys_addr_t base;
phys_addr_t size;
unsigned long flags; /* This is new. */
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
int nid;
#endif
};

This patch does the following things:
1) Add "flags" member to memblock_region.
2) Modify the following APIs' prototype:
memblock_add_region()
memblock_insert_region()
3) Add memblock_reserve_region() to support reserve memory with flags, and keep
memblock_reserve()'s prototype unmodified.
4) Modify other APIs to support flags, but keep their prototype unmodified.

The idea is from Wen Congyang and Liu Jiang .

Suggested-by: Wen Congyang
Suggested-by: Liu Jiang
Signed-off-by: Tang Chen
Reviewed-by: Zhang Yanfei
Cc: "H. Peter Anvin"
Cc: "Rafael J . Wysocki"
Cc: Chen Tang
Cc: Gong Chen
Cc: Ingo Molnar
Cc: Jiang Liu
Cc: Johannes Weiner
Cc: Lai Jiangshan
Cc: Larry Woodman
Cc: Len Brown
Cc: Mel Gorman
Cc: Michal Nazarewicz
Cc: Minchan Kim
Cc: Prarit Bhargava
Cc: Rik van Riel
Cc: Taku Izumi
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Thomas Renninger
Cc: Toshi Kani
Cc: Vasilis Liaskovitis
Cc: Wanpeng Li
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2014-01-22 08:19:44 +0800
931d13f53 mm/memblock: debug: correct displaying of upper memory boundary ... Browse Code »

Current memblock APIs don't work on 32 PAE or LPAE extension arches
where the physical memory start address beyond 4GB. The problem was
discussed here [3] where Tejun, Yinghai(thanks) proposed a way forward
with memblock interfaces. Based on the proposal, this series adds
necessary memblock interfaces and convert the core kernel code to use
them. Architectures already converted to NO_BOOTMEM use these new
interfaces and other which still uses bootmem, these new interfaces just
fallback to exiting bootmem APIs.

So no functional change in behavior. In long run, once all the
architectures moves to NO_BOOTMEM, we can get rid of bootmem layer
completely. This is one step to remove the core code dependency with
bootmem and also gives path for architectures to move away from bootmem.

Testing is done on ARM architecture with 32 bit ARM LAPE machines with
normal as well sparse(faked) memory model.

This patch (of 23):

When debugging is enabled (cmdline has "memblock=debug") the memblock
will display upper memory boundary per each allocated/freed memory range
wrongly. For example:

memblock_reserve: [0x0000009e7e8000-0x0000009e7ed000] _memblock_early_alloc_try_nid_nopanic+0xfc/0x12c

The 0x0000009e7ed000 is displayed instead of 0x0000009e7ecfff

Hence, correct this by changing formula used to calculate upper memory
boundary to (u64)base + size - 1 instead of (u64)base + size everywhere
in the debug messages.

Signed-off-by: Grygorii Strashko
Signed-off-by: Santosh Shilimkar
Cc: Yinghai Lu
Acked-by: Tejun Heo
Cc: H. Peter Anvin
Cc: Russell King
Cc: "Rafael J. Wysocki"
Cc: Arnd Bergmann
Cc: Christoph Lameter
Cc: Greg Kroah-Hartman
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Cc: Konrad Rzeszutek Wilk
Cc: Michal Hocko
Cc: Paul Walmsley
Cc: Pavel Machek
Cc: Tony Lindgren
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Grygorii Strashko
2014-01-22 08:19:44 +0800

13 Nov, 2013

2 commits

79442ed18 mm/memblock.c: introduce bottom-up allocation mode ... Browse Code »

The Linux kernel cannot migrate pages used by the kernel. As a result,
kernel pages cannot be hot-removed. So we cannot allocate hotpluggable
memory for the kernel.

ACPI SRAT (System Resource Affinity Table) contains the memory hotplug
info. But before SRAT is parsed, memblock has already started to allocate
memory for the kernel. So we need to prevent memblock from doing this.

In a memory hotplug system, any numa node the kernel resides in should be
unhotpluggable. And for a modern server, each node could have at least
16GB memory. So memory around the kernel image is highly likely
unhotpluggable.

So the basic idea is: Allocate memory from the end of the kernel image and
to the higher memory. Since memory allocation before SRAT is parsed won't
be too much, it could highly likely be in the same node with kernel image.

The current memblock can only allocate memory top-down. So this patch
introduces a new bottom-up allocation mode to allocate memory bottom-up.
And later when we use this allocation direction to allocate memory, we
will limit the start address above the kernel.

Signed-off-by: Tang Chen
Signed-off-by: Zhang Yanfei
Acked-by: Toshi Kani
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Tejun Heo
Cc: Wanpeng Li
Cc: Thomas Renninger
Cc: Yinghai Lu
Cc: Jiang Liu
Cc: Wen Congyang
Cc: Lai Jiangshan
Cc: Yasuaki Ishimatsu
Cc: Taku Izumi
Cc: Mel Gorman
Cc: Michal Nazarewicz
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: Kamezawa Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2013-11-13 11:09:08 +0800
1402899e4 mm/memblock.c: factor out of top-down allocation ... Browse Code »

[Problem]

The current Linux cannot migrate pages used by the kernel because of the
kernel direct mapping. In Linux kernel space, va = pa + PAGE_OFFSET.
When the pa is changed, we cannot simply update the pagetable and keep the
va unmodified. So the kernel pages are not migratable.

There are also some other issues will cause the kernel pages not
migratable. For example, the physical address may be cached somewhere and
will be used. It is not to update all the caches.

When doing memory hotplug in Linux, we first migrate all the pages in one
memory device somewhere else, and then remove the device. But if pages
are used by the kernel, they are not migratable. As a result, memory used
by the kernel cannot be hot-removed.

Modifying the kernel direct mapping mechanism is too difficult to do. And
it may cause the kernel performance down and unstable. So we use the
following way to do memory hotplug.

[What we are doing]

In Linux, memory in one numa node is divided into several zones. One of
the zones is ZONE_MOVABLE, which the kernel won't use.

In order to implement memory hotplug in Linux, we are going to arrange all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use these
memory. To do this, we need ACPI's help.

In ACPI, SRAT(System Resource Affinity Table) contains NUMA info. The
memory affinities in SRAT record every memory range in the system, and
also, flags specifying if the memory range is hotpluggable. (Please refer
to ACPI spec 5.0 5.2.16)

With the help of SRAT, we have to do the following two things to achieve our
goal:

1. When doing memory hot-add, allow the users arranging hotpluggable as
ZONE_MOVABLE.
(This has been done by the MOVABLE_NODE functionality in Linux.)

2. when the system is booting, prevent bootmem allocator from allocating
hotpluggable memory for the kernel before the memory initialization
finishes.

The problem 2 is the key problem we are going to solve. But before solving it,
we need some preparation. Please see below.

[Preparation]

Bootloader has to load the kernel image into memory. And this memory must
be unhotpluggable. We cannot prevent this anyway. So in a memory hotplug
system, we can assume any node the kernel resides in is not hotpluggable.

Before SRAT is parsed, we don't know which memory ranges are hotpluggable.
But memblock has already started to work. In the current kernel,
memblock allocates the following memory before SRAT is parsed:

setup_arch()
|->memblock_x86_fill() /* memblock is ready */
|......
|->early_reserve_e820_mpc_new() /* allocate memory under 1MB */
|->reserve_real_mode() /* allocate memory under 1MB */
|->init_mem_mapping() /* allocate page tables, about 2MB to map 1GB memory */
|->dma_contiguous_reserve() /* specified by user, should be low */
|->setup_log_buf() /* specified by user, several mega bytes */
|->relocate_initrd() /* could be large, but will be freed after boot, should reorder */
|->acpi_initrd_override() /* several mega bytes */
|->reserve_crashkernel() /* could be large, should reorder */
|......
|->initmem_init() /* Parse SRAT */

According to Tejun's advice, before SRAT is parsed, we should try our best
to allocate memory near the kernel image. Since the whole node the kernel
resides in won't be hotpluggable, and for a modern server, a node may have
at least 16GB memory, allocating several mega bytes memory around the
kernel image won't cross to hotpluggable memory.

[About this patchset]

So this patchset is the preparation for the problem 2 that we want to
solve. It does the following:

1. Make memblock be able to allocate memory bottom up.
1) Keep all the memblock APIs' prototype unmodified.
2) When the direction is bottom up, keep the start address greater than the
end of kernel image.

2. Improve init_mem_mapping() to support allocate page tables in
bottom up direction.

3. Introduce "movable_node" boot option to enable and disable this
functionality.

This patch (of 6):

Create a new function __memblock_find_range_top_down to factor out of
top-down allocation from memblock_find_in_range_node. This is a
preparation because we will introduce a new bottom-up allocation mode in
the following patch.

Signed-off-by: Tang Chen
Signed-off-by: Zhang Yanfei
Acked-by: Tejun Heo
Acked-by: Toshi Kani
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Wanpeng Li
Cc: Thomas Renninger
Cc: Yinghai Lu
Cc: Jiang Liu
Cc: Wen Congyang
Cc: Lai Jiangshan
Cc: Yasuaki Ishimatsu
Cc: Taku Izumi
Cc: Mel Gorman
Cc: Michal Nazarewicz
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: Kamezawa Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2013-11-13 11:09:08 +0800

12 Sep, 2013

1 commit

e76b63f80 memblock, numa: binary search node id ... Browse Code »

Current early_pfn_to_nid() on arch that support memblock go over
memblock.memory one by one, so will take too many try near the end.

We can use existing memblock_search to find the node id for given pfn,
that could save some time on bigger system that have many entries
memblock.memory array.

Here are the timing differences for several machines. In each case with
the patch less time was spent in __early_pfn_to_nid().

3.11-rc5 with patch difference (%)
-------- ---------- --------------
UV1: 256 nodes 9TB: 411.66 402.47 -9.19 (2.23%)
UV2: 255 nodes 16TB: 1141.02 1138.12 -2.90 (0.25%)
UV2: 64 nodes 2TB: 128.15 126.53 -1.62 (1.26%)
UV2: 32 nodes 2TB: 121.87 121.07 -0.80 (0.66%)
Time in seconds.

Signed-off-by: Yinghai Lu
Cc: Tejun Heo
Acked-by: Russ Anderson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2013-09-12 06:57:51 +0800

10 Jul, 2013

1 commit

d8bbdd773 mm/memblock.c: fix wrong comment in __next_free_mem_range() ... Browse Code »

Remove one redundant "nid" in the comment.

Signed-off-by: Tang Chen
Signed-off-by: Linus Torvalds

Tang Chen
2013-07-10 01:33:23 +0800

30 Apr, 2013

2 commits

209ff86d6 memblock: fix missing comment of memblock_insert_region() ... Browse Code »

There is no comment for parameter nid of memblock_insert_region().
This patch adds comment for it.

Signed-off-by: Tang Chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2013-04-30 06:54:38 +0800
94f3d3afb memblock: add assertion for zero allocation alignment ... Browse Code »

This came to light when calling memblock allocator from arc port (for
copying flattended DT). If a "0" alignment is passed, the allocator
round_up() call incorrectly rounds up the size to 0.

round_up(num, alignto) => ((num - 1) | (alignto -1)) + 1

While the obvious allocation failure causes kernel to panic, it is better
to warn the caller to fix the code.

Tejun suggested that instead of BUG_ON(!align) - which might be
ineffective due to pending console init and such, it is better to WARN_ON,
and continue the boot with a reasonable default align.

Caller passing @size need not be handled similarly as the subsequent
panic will indicate that anyhow.

Signed-off-by: Vineet Gupta
Cc: Yinghai Lu
Cc: Wanpeng Li
Cc: Ingo Molnar
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vineet Gupta
2013-04-30 06:54:28 +0800

03 Mar, 2013

1 commit

20e6926dc x86, ACPI, mm: Revert movablemem_map support ... Browse Code »

Tim found:

WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
Hardware name: S2600CP
sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
smpboot: Booting Node 1, Processors #1
Modules linked in:
Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
Call Trace:
set_cpu_sibling_map+0x279/0x449
start_secondary+0x11d/0x1e5

Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")

It turns out movable_map has some problems, and it breaks several things

1. numa_init is called several times, NOT just for srat. so those
nodes_clear(numa_nodes_parsed)
memset(&numa_meminfo, 0, sizeof(numa_meminfo))
can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy.
and make fall back path working.

2. simply split acpi_numa_init to early_parse_srat.
a. that early_parse_srat is NOT called for ia64, so you break ia64.
b. for (i = 0; i < MAX_LOCAL_APIC; i++)
set_apicid_to_node(i, NUMA_NO_NODE)
still left in numa_init. So it will just clear result from early_parse_srat.
it should be moved before that....
c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
early before override from INITRD is settled.

3. that patch TITLE is total misleading, there is NO x86 in the title,
but it changes critical x86 code. It caused x86 guys did not
pay attention to find the problem early. Those patches really should
be routed via tip/x86/mm.

4. after that commit, following range can not use movable ram:
a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
b. initrd... it will be freed after booting, so it could be on movable...
c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
anymore.
d. init_mem_mapping: can not put page table high anymore.
e. initmem_init: vmemmap can not be high local node anymore. That is
not good.

If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.

We have workaround patch that could fix some problems, but some can not
be fixed.

So just remove that offending commit and related ones including:

f7210e6c4ac7 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
protect movablecore_map in memblock_overlaps_region().")

01a178a94e8e ("acpi, memory-hotplug: support getting hotplug info from
SRAT")

27168d38fa20 ("acpi, memory-hotplug: extend movablemem_map ranges to
the end of node")

e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is
ready")

fb06bc8e5f42 ("page_alloc: bootmem limit with movablecore_map")

42f47e27e761 ("page_alloc: make movablemem_map have higher priority")

6981ec31146c ("page_alloc: introduce zone_movable_limit[] to keep
movable limit for nodes")

34b71f1e04fc ("page_alloc: add movable_memmap kernel parameter")

4d59a75125d5 ("x86: get pg_data_t's memory from other node")

Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0. Also
need to find way to put other kernel used ram to local node ram.

Reported-by: Tim Gardner
Reported-by: Don Morris
Bisected-by: Don Morris
Tested-by: Don Morris
Signed-off-by: Yinghai Lu
Cc: Tony Luck
Cc: Thomas Renninger
Cc: Tejun Heo
Cc: Tang Chen
Cc: Yasuaki Ishimatsu
Signed-off-by: Linus Torvalds

Yinghai Lu
2013-03-03 01:34:39 +0800

24 Feb, 2013

2 commits

f7210e6c4 mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in m… ... Browse Code »

…emblock_overlaps_region().

The definition of struct movablecore_map is protected by
CONFIG_HAVE_MEMBLOCK_NODE_MAP but its use in memblock_overlaps_region()
is not. So add CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect the use of
movablecore_map in memblock_overlaps_region().

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Tang Chen
2013-02-24 09:50:14 +0800
fb06bc8e5 page_alloc: bootmem limit with movablecore_map ... Browse Code »

Ensure the bootmem will not allocate memory from areas that may be
ZONE_MOVABLE. The map info is from movablecore_map boot option.

Signed-off-by: Tang Chen
Reviewed-by: Wen Congyang
Reviewed-by: Lai Jiangshan
Tested-by: Lin Feng
Cc: Wu Jianguo
Cc: Mel Gorman
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2013-02-24 09:50:14 +0800

30 Jan, 2013

1 commit

595ad9af8 memblock: Add memblock_mem_size() ... Browse Code »

Use it to get mem size under the limit_pfn.
to replace local version in x86 reserved_initrd.

-v2: remove not needed cast that is pointed out by HPA.

Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/1359058816-7615-29-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin

Yinghai Lu
2013-01-30 11:32:57 +0800

12 Jan, 2013

1 commit

c0232ae86 mm: memblock: fix wrong memmove size in memblock_merge_regions() ... Browse Code »

The memmove span covers from (next+1) to the end of the array, and the
index of next is (i+1), so the index of (next+1) is (i+2). So the size
of remaining array elements is (type->cnt - (i + 2)).

Since the remaining elements of the memblock array are move forward by
one element and there is only one additional element caused by this bug.
So there won't be any write overflow here but read overflow. It may
read one more element out of the array address if the array happens to
be full. Commonly it doesn't matter at all but if the array happens to
be located at the end a memblock, it may cause a invalid read operation
for the physical address doesn't exist.

There are 2 *happens to be* here, so I think the probability is quite
low, I don't know if any guy is haunted by this bug before.

Mostly I think it's user-invisible.

Signed-off-by: Lin Feng
Acked-by: Tejun Heo
Reviewed-by: Wanpeng Li
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lin Feng
2013-01-12 06:54:54 +0800

25 Oct, 2012

1 commit

6ede1fd3c x86, mm: Trim memory in memblock to be page aligned ... Browse Code »

We will not map partial pages, so need to make sure memblock
allocation will not allocate those bytes out.

Also we will use for_each_mem_pfn_range() to loop to map memory
range to keep them consistent.

Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com
Acked-by: Jacob Shin
Signed-off-by: H. Peter Anvin
Cc:

Yinghai Lu
2012-10-25 02:52:21 +0800