Eric Lee / smarc-fsl-linux-kernel

17 Oct, 2020

40 commits

b7621ebf8 kernel: acct.c: fix some kernel-doc nits ... Browse Code »

Fix kernel-doc notation to use the documented Returns: syntax and place
the function description for acct_process() on the first line where it
should be.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Link: https://lkml.kernel.org/r/b4c33e5d-98e8-0c47-77b6-ac1859f94d7f@infradead.org
Signed-off-by: Linus Torvalds

Randy Dunlap
2020-10-17 02:11:19 +0800
7b7b8a2c9 kernel/: fix repeated words in comments ... Browse Code »

Fix multiple occurrences of duplicated words in kernel/.

Fix one typo/spello on the same line as a duplicate word. Change one
instance of "the the" to "that the". Otherwise just drop one of the
repeated words.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/98202fa6-8919-ef63-9efe-c0fad5ca7af1@infradead.org
Signed-off-by: Linus Torvalds

Randy Dunlap
2020-10-17 02:11:19 +0800
15ec0fcff kernel/sys.c: replace do_brk with do_brk_flags in comment of prctl_set_mm_map() ... Browse Code »

Replace do_brk with do_brk_flags in comment of prctl_set_mm_map(), since
do_brk was removed in following commit.

Fixes: bb177a732c4369 ("mm: do not bug_on on incorrect length in __mm_populate()")
Signed-off-by: Liao Pingfang
Signed-off-by: Yi Wang
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/1600650751-43127-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Linus Torvalds

Liao Pingfang
2020-10-17 02:11:19 +0800
b296a6d53 kernel.h: split out min()/max() et al. helpers ... Browse Code »

kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out min()/max()
et al. helpers.

At the same time convert users in header and lib folder to use new header.
Though for time being include new header back to kernel.h to avoid
twisted indirected includes for other existing users.

Signed-off-by: Andy Shevchenko
Signed-off-by: Andrew Morton
Cc: "Rafael J. Wysocki"
Cc: Steven Rostedt
Cc: Rasmus Villemoes
Cc: Joe Perches
Cc: Linus Torvalds
Link: https://lkml.kernel.org/r/20200910164152.GA1891694@smile.fi.intel.com
Signed-off-by: Linus Torvalds

Andy Shevchenko
2020-10-17 02:11:19 +0800
ce9bebe68 fs: configfs: delete repeated words in comments ... Browse Code »

Drop duplicated words {the, that} in comments.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Cc: Joel Becker
Cc: Christoph Hellwig
Link: https://lkml.kernel.org/r/20200811021826.25032-1-rdunlap@infradead.org
Signed-off-by: Linus Torvalds

Randy Dunlap
2020-10-17 02:11:19 +0800
ab130f910 mm: rename page_order() to buddy_order() ... Browse Code »

The current page_order() can only be called on pages in the buddy
allocator. For compound pages, you have to use compound_order(). This is
confusing and led to a bug, so rename page_order() to buddy_order().

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/20201001152259.14932-2-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-10-17 02:11:19 +0800
1f0f8c0de include/linux/mmzone.h: remove unused early_pfn_valid() ... Browse Code »

The early_pfn_valid() macro is defined but it is never used. Remove it.

Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Acked-by: David Hildenbrand
Link: https://lkml.kernel.org/r/20200923162915.26935-1-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-10-17 02:11:19 +0800
73eb7f9a4 mm: use helper function put_write_access() ... Browse Code »

In commit 1da177e4c3f4 ("Linux-2.6.12-rc2"), the helper put_write_access()
came with the atomic_dec operation of the i_writecount field. But it
forgot to use this helper in __vma_link_file() and dup_mmap().

Signed-off-by: Miaohe Lin
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200924115235.5111-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds

Miaohe Lin
2020-10-17 02:11:19 +0800
e755f4af0 mm/workingset.c: fix some doc warnings ... Browse Code »

Fix following warnings caused by mismatch bewteen function parameters and
comments.

mm/workingset.c:228: warning: Function parameter or member 'lruvec' not described in 'workingset_age_nonresident'
mm/workingset.c:228: warning: Excess function parameter 'memcg' description in 'workingset_age_nonresident'

Signed-off-by: Xiaofei Tan
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/1600485913-11192-1-git-send-email-tanxiaofei@huawei.com
Signed-off-by: Linus Torvalds

Xiaofei Tan
2020-10-17 02:11:19 +0800
70b6d25ec mm: fix some comments formatting ... Browse Code »

Correct one function name "get_partials" with "get_partial". Update the
old struct name of list3 with kmem_cache_node.

Signed-off-by: Chen Tao
Signed-off-by: Andrew Morton
Reviewed-by: Mike Rapoport
Link: https://lkml.kernel.org/r/Message-ID:
Signed-off-by: Linus Torvalds

Chen Tao
2020-10-17 02:11:19 +0800
0e9aa6755 mm: fix some broken comments ... Browse Code »

Fix some broken comments including typo, grammar error and wrong function
name.

Signed-off-by: Miaohe Lin
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200913095456.54873-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds

Miaohe Lin
2020-10-17 02:11:19 +0800
ed0173733 mm: use self-explanatory macros rather than "2" ... Browse Code »

Signed-off-by: Yu Zhao
Signed-off-by: Andrew Morton
Cc: Alex Shi
Link: http://lkml.kernel.org/r/20200831175042.3527153-2-yuzhao@google.com
Signed-off-by: Linus Torvalds

Yu Zhao
2020-10-17 02:11:19 +0800
955cc774f mm/highmem.c: clean up endif comments ... Browse Code »

The #endif at the end of the file matches up with the '#if
defined(HASHED_PAGE_VIRTUAL)' on line 374. Not the CONFIG_HIGHMEM #if
earlier.

Fix comments on both of the #endif's to indicate the correct end of
blocks for each.

Signed-off-by: Ira Weiny
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Reviewed-by: Mike Rapoport
Link: https://lkml.kernel.org/r/20200819184635.112579-1-ira.weiny@intel.com
Signed-off-by: Linus Torvalds

Ira Weiny
2020-10-17 02:11:18 +0800
58f6f0349 mm/page_reporting.c: drop stale list head check in page_reporting_cycle ... Browse Code »

list_for_each_entry_safe() guarantees that we will never stumble over the
list head; "&page->lru != list" will always evaluate to true. Let's
simplify.

[david@redhat.com: Changelog refinements]

Signed-off-by: Wei Yang
Signed-off-by: Andrew Morton
Reviewed-by: David Hildenbrand
Reviewed-by: Alexander Duyck
Link: http://lkml.kernel.org/r/20200818084448.33969-1-richard.weiyang@linux.alibaba.com
Signed-off-by: Linus Torvalds

Wei Yang
2020-10-17 02:11:18 +0800
c7df08f19 mm/slab.h: remove duplicate include ... Browse Code »

Remove duplicate header which is included twice.

Signed-off-by: YueHaibing
Signed-off-by: Andrew Morton
Reviewed-by: Pekka Enberg
Link: http://lkml.kernel.org/r/20200818114323.58156-1-yuehaibing@huawei.com
Signed-off-by: Linus Torvalds

YueHaibing
2020-10-17 02:11:18 +0800
4e79603bb zram: failing to decompress is WARN_ON worthy ... Browse Code »

If we fail to decompress in zram it's a pretty serious problem. We were
entrusted to be able to decompress the old data but we failed. Either
we've got some crazy bug in the compression code or we've got memory
corruption.

At the moment, when this happens the log looks like this:

ERR kernel: [ 1833.099861] zram: Decompression failed! err=-22, page=336112
ERR kernel: [ 1833.099881] zram: Decompression failed! err=-22, page=336112
ALERT kernel: [ 1833.099886] Read-error on swap-device (253:0:2688896)

It is true that we have an "ALERT" level log in there, but (at least to
me) it feels like even this isn't enough to impart the seriousness of this
error. Let's convert to a WARN_ON. Note that WARN_ON is automatically
"unlikely" so we can simply replace the old annotation with the new one.

Signed-off-by: Douglas Anderson
Signed-off-by: Andrew Morton
Acked-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Sonny Rao
Cc: Jens Axboe
Link: https://lkml.kernel.org/r/20200917174059.1.If09c882545dbe432268f7a67a4d4cfcb6caace4f@changeid
Signed-off-by: Linus Torvalds

Douglas Anderson
2020-10-17 02:11:18 +0800
b86c5fc4e mm/memory_hotplug: update comment regarding zone shuffling ... Browse Code »

As we no longer shuffle via generic_online_page() and when undoing
isolation, we can simplify the comment.

We now effectively shuffle only once (properly) when onlining new memory.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Wei Yang
Acked-by: Michal Hocko
Cc: Alexander Duyck
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Wei Yang
Cc: Oscar Salvador
Cc: Mike Rapoport
Cc: Pankaj Gupta
Cc: Haiyang Zhang
Cc: "K. Y. Srinivasan"
Cc: Matthew Wilcox
Cc: Michael Ellerman
Cc: Scott Cheloha
Cc: Stephen Hemminger
Cc: Wei Liu
Link: https://lkml.kernel.org/r/20201005121534.15649-6-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
7fef431be mm/page_alloc: place pages to tail in __free_pages_core() ... Browse Code »

__free_pages_core() is used when exposing fresh memory to the buddy during
system boot and when onlining memory in generic_online_page().

generic_online_page() is used in two cases:

1. Direct memory onlining in online_pages().
2. Deferred memory onlining in memory-ballooning-like mechanisms (HyperV
balloon and virtio-mem), when parts of a section are kept
fake-offline to be fake-onlined later on.

In 1, we already place pages to the tail of the freelist. Pages will be
freed to MIGRATE_ISOLATE lists first and moved to the tail of the
freelists via undo_isolate_page_range().

In 2, we currently don't implement a proper rule. In case of virtio-mem,
where we currently always online MAX_ORDER - 1 pages, the pages will be
placed to the HEAD of the freelist - undesireable. While the hyper-v
balloon calls generic_online_page() with single pages, usually it will
call it on successive single pages in a larger block.

The pages are fresh, so place them to the tail of the freelist and avoid
the PCP. In __free_pages_core(), remove the now superflouos call to
set_page_refcounted() and add a comment regarding page initialization and
the refcount.

Note: In 2. we currently don't shuffle. If ever relevant (page shuffling
is usually of limited use in virtualized environments), we might want to
shuffle after a sequence of generic_online_page() calls in the relevant
callers.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Oscar Salvador
Reviewed-by: Wei Yang
Acked-by: Pankaj Gupta
Acked-by: Michal Hocko
Cc: Alexander Duyck
Cc: Mel Gorman
Cc: Dave Hansen
Cc: Mike Rapoport
Cc: "K. Y. Srinivasan"
Cc: Haiyang Zhang
Cc: Stephen Hemminger
Cc: Wei Liu
Cc: Matthew Wilcox
Cc: Michael Ellerman
Cc: Michal Hocko
Cc: Scott Cheloha
Link: https://lkml.kernel.org/r/20201005121534.15649-5-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
293ffa5eb mm/page_alloc: move pages to tail in move_to_free_list() ... Browse Code »

Whenever we move pages between freelists via move_to_free_list()/
move_freepages_block(), we don't actually touch the pages:
1. Page isolation doesn't actually touch the pages, it simply isolates
pageblocks and moves all free pages to the MIGRATE_ISOLATE freelist.
When undoing isolation, we move the pages back to the target list.
2. Page stealing (steal_suitable_fallback()) moves free pages directly
between lists without touching them.
3. reserve_highatomic_pageblock()/unreserve_highatomic_pageblock() moves
free pages directly between freelists without touching them.

We already place pages to the tail of the freelists when undoing isolation
via __putback_isolated_page(), let's do it in any case (e.g., if order
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Reviewed-by: Wei Yang
Acked-by: Pankaj Gupta
Acked-by: Michal Hocko
Cc: Alexander Duyck
Cc: Mel Gorman
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Mike Rapoport
Cc: Scott Cheloha
Cc: Michael Ellerman
Cc: Haiyang Zhang
Cc: "K. Y. Srinivasan"
Cc: Matthew Wilcox
Cc: Michal Hocko
Cc: Stephen Hemminger
Cc: Wei Liu
Link: https://lkml.kernel.org/r/20201005121534.15649-4-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
47b6a24a2 mm/page_alloc: place pages to tail in __putback_isolated_page() ... Browse Code »

__putback_isolated_page() already documents that pages will be placed to
the tail of the freelist - this is, however, not the case for "order >=
MAX_ORDER - 2" (see buddy_merge_likely()) - which should be the case for
all existing users.

This change affects two users:
- free page reporting
- page isolation, when undoing the isolation (including memory onlining).

This behavior is desirable for pages that haven't really been touched
lately, so exactly the two users that don't actually read/write page
content, but rather move untouched pages.

The new behavior is especially desirable for memory onlining, where we
allow allocation of newly onlined pages via undo_isolate_page_range() in
online_pages(). Right now, we always place them to the head of the
freelist, resulting in undesireable behavior: Assume we add individual
memory chunks via add_memory() and online them right away to the NORMAL
zone. We create a dependency chain of unmovable allocations e.g., via the
memmap. The memmap of the next chunk will be placed onto previous chunks
- if the last block cannot get offlined+removed, all dependent ones cannot
get offlined+removed. While this can already be observed with individual
DIMMs, it's more of an issue for virtio-mem (and I suspect also ppc
DLPAR).

Document that this should only be used for optimizations, and no code
should rely on this behavior for correction (if the order of the freelists
ever changes).

We won't care about page shuffling: memory onlining already properly
shuffles after onlining. free page reporting doesn't care about
physically contiguous ranges, and there are already cases where page
isolation will simply move (physically close) free pages to (currently)
the head of the freelists via move_freepages_block() instead of shuffling.
If this becomes ever relevant, we should shuffle the whole zone when
undoing isolation of larger ranges, and after free_contig_range().

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Alexander Duyck
Reviewed-by: Oscar Salvador
Reviewed-by: Wei Yang
Reviewed-by: Pankaj Gupta
Acked-by: Michal Hocko
Cc: Mel Gorman
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Mike Rapoport
Cc: Scott Cheloha
Cc: Michael Ellerman
Cc: Haiyang Zhang
Cc: "K. Y. Srinivasan"
Cc: Matthew Wilcox
Cc: Michal Hocko
Cc: Stephen Hemminger
Cc: Wei Liu
Link: https://lkml.kernel.org/r/20201005121534.15649-3-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
f04a5d5d9 mm/page_alloc: convert "report" flag of __free_one_page() to a proper flag ... Browse Code »

Patch series "mm: place pages to the freelist tail when onlining and undoing isolation", v2.

When adding separate memory blocks via add_memory*() and onlining them
immediately, the metadata (especially the memmap) of the next block will
be placed onto one of the just added+onlined block. This creates a chain
of unmovable allocations: If the last memory block cannot get
offlined+removed() so will all dependent ones. We directly have unmovable
allocations all over the place.

This can be observed quite easily using virtio-mem, however, it can also
be observed when using DIMMs. The freshly onlined pages will usually be
placed to the head of the freelists, meaning they will be allocated next,
turning the just-added memory usually immediately un-removable. The fresh
pages are cold, prefering to allocate others (that might be hot) also
feels to be the natural thing to do.

It also applies to the hyper-v balloon xen-balloon, and ppc64 dlpar: when
adding separate, successive memory blocks, each memory block will have
unmovable allocations on them - for example gigantic pages will fail to
allocate.

While the ZONE_NORMAL doesn't provide any guarantees that memory can get
offlined+removed again (any kind of fragmentation with unmovable
allocations is possible), there are many scenarios (hotplugging a lot of
memory, running workload, hotunplug some memory/as much as possible) where
we can offline+remove quite a lot with this patchset.

a) To visualize the problem, a very simple example:

Start a VM with 4GB and 8GB of virtio-mem memory:

[root@localhost ~]# lsmem
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x00000000bfffffff 3G online yes 0-23
0x0000000100000000-0x000000033fffffff 9G online yes 32-103

Memory block size: 128M
Total online memory: 12G
Total offline memory: 0B

Then try to unplug as much as possible using virtio-mem. Observe which
memory blocks are still around. Without this patch set:

[root@localhost ~]# lsmem
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x00000000bfffffff 3G online yes 0-23
0x0000000100000000-0x000000013fffffff 1G online yes 32-39
0x0000000148000000-0x000000014fffffff 128M online yes 41
0x0000000158000000-0x000000015fffffff 128M online yes 43
0x0000000168000000-0x000000016fffffff 128M online yes 45
0x0000000178000000-0x000000017fffffff 128M online yes 47
0x0000000188000000-0x0000000197ffffff 256M online yes 49-50
0x00000001a0000000-0x00000001a7ffffff 128M online yes 52
0x00000001b0000000-0x00000001b7ffffff 128M online yes 54
0x00000001c0000000-0x00000001c7ffffff 128M online yes 56
0x00000001d0000000-0x00000001d7ffffff 128M online yes 58
0x00000001e0000000-0x00000001e7ffffff 128M online yes 60
0x00000001f0000000-0x00000001f7ffffff 128M online yes 62
0x0000000200000000-0x0000000207ffffff 128M online yes 64
0x0000000210000000-0x0000000217ffffff 128M online yes 66
0x0000000220000000-0x0000000227ffffff 128M online yes 68
0x0000000230000000-0x0000000237ffffff 128M online yes 70
0x0000000240000000-0x0000000247ffffff 128M online yes 72
0x0000000250000000-0x0000000257ffffff 128M online yes 74
0x0000000260000000-0x0000000267ffffff 128M online yes 76
0x0000000270000000-0x0000000277ffffff 128M online yes 78
0x0000000280000000-0x0000000287ffffff 128M online yes 80
0x0000000290000000-0x0000000297ffffff 128M online yes 82
0x00000002a0000000-0x00000002a7ffffff 128M online yes 84
0x00000002b0000000-0x00000002b7ffffff 128M online yes 86
0x00000002c0000000-0x00000002c7ffffff 128M online yes 88
0x00000002d0000000-0x00000002d7ffffff 128M online yes 90
0x00000002e0000000-0x00000002e7ffffff 128M online yes 92
0x00000002f0000000-0x00000002f7ffffff 128M online yes 94
0x0000000300000000-0x0000000307ffffff 128M online yes 96
0x0000000310000000-0x0000000317ffffff 128M online yes 98
0x0000000320000000-0x0000000327ffffff 128M online yes 100
0x0000000330000000-0x000000033fffffff 256M online yes 102-103

Memory block size: 128M
Total online memory: 8.1G
Total offline memory: 0B

With this patch set:

[root@localhost ~]# lsmem
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x00000000bfffffff 3G online yes 0-23
0x0000000100000000-0x000000013fffffff 1G online yes 32-39

Memory block size: 128M
Total online memory: 4G
Total offline memory: 0B

All memory can get unplugged, all memory block can get removed. Of
course, no workload ran and the system was basically idle, but it
highlights the issue - the fairly deterministic chain of unmovable
allocations. When a huge page for the 2MB memmap is needed, a
just-onlined 4MB page will be split. The remaining 2MB page will be used
for the memmap of the next memory block. So one memory block will hold
the memmap of the two following memory blocks. Finally the pages of the
last-onlined memory block will get used for the next bigger allocations -
if any allocation is unmovable, all dependent memory blocks cannot get
unplugged and removed until that allocation is gone.

Note that with bigger memory blocks (e.g., 256MB), *all* memory
blocks are dependent and none can get unplugged again!

b) Experiment with memory intensive workload

I performed an experiment with an older version of this patch set (before
we used undo_isolate_page_range() in online_pages(): Hotplug 56GB to a VM
with an initial 4GB, onlining all memory to ZONE_NORMAL right from the
kernel when adding it. I then run various memory intensive workloads that
consume most system memory for a total of 45 minutes. Once finished, I
try to unplug as much memory as possible.

With this change, I am able to remove via virtio-mem (adding individual
128MB memory blocks) 413 out of 448 added memory blocks. Via individual
(256MB) DIMMs 380 out of 448 added memory blocks. (I don't have any
numbers without this patchset, but looking at the above example, it's at
most half of the 448 memory blocks for virtio-mem, and most probably none
for DIMMs).

Again, there are workloads that might behave very differently due to the
nature of ZONE_NORMAL.

This change also affects (besides memory onlining):
- Other users of undo_isolate_page_range(): Pages are always placed to the
tail.
-- When memory offlining fails
-- When memory isolation fails after having isolated some pageblocks
-- When alloc_contig_range() either succeeds or fails
- Other users of __putback_isolated_page(): Pages are always placed to the
tail.
-- Free page reporting
- Other users of __free_pages_core()
-- AFAIKs, any memory that is getting exposed to the buddy during boot.
IIUC we will now usually allocate memory from lower addresses within
a zone first (especially during boot).
- Other users of generic_online_page()
-- Hyper-V balloon

This patch (of 5):

Let's prepare for additional flags and avoid long parameter lists of
bools. Follow-up patches will also make use of the flags in
__free_pages_ok().

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Alexander Duyck
Reviewed-by: Vlastimil Babka
Reviewed-by: Oscar Salvador
Reviewed-by: Wei Yang
Reviewed-by: Pankaj Gupta
Acked-by: Michal Hocko
Cc: Mel Gorman
Cc: Dave Hansen
Cc: Mike Rapoport
Cc: Matthew Wilcox
Cc: Haiyang Zhang
Cc: "K. Y. Srinivasan"
Cc: Michael Ellerman
Cc: Michal Hocko
Cc: Scott Cheloha
Cc: Stephen Hemminger
Cc: Wei Liu
Cc: Michal Hocko
Link: https://lkml.kernel.org/r/20201005121534.15649-1-david@redhat.com
Link: https://lkml.kernel.org/r/20201005121534.15649-2-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
90c7eaeb1 mm: don't panic when links can't be created in sysfs ... Browse Code »

At boot time, or when doing memory hot-add operations, if the links in
sysfs can't be created, the system is still able to run, so just report
the error in the kernel log rather than BUG_ON and potentially make system
unusable because the callpath can be called with locks held.

Since the number of memory blocks managed could be high, the messages are
rate limited.

As a consequence, link_mem_sections() has no status to report anymore.

Signed-off-by: Laurent Dufour
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Acked-by: David Hildenbrand
Cc: Greg Kroah-Hartman
Cc: Fenghua Yu
Cc: Nathan Lynch
Cc: "Rafael J . Wysocki"
Cc: Scott Cheloha
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200915094143.79181-4-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds

Laurent Dufour
2020-10-17 02:11:18 +0800
cb8e3c8b4 kernel/resource: make iomem_resource implicit in release_mem_region_adjustable() ... Browse Code »

"mem" in the name already indicates the root, similar to
release_mem_region() and devm_request_mem_region(). Make it implicit.
The only single caller always passes iomem_resource, other parents are not
applicable.

Suggested-by: Wei Yang
Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Wei Yang
Cc: Michal Hocko
Cc: Dan Williams
Cc: Jason Gunthorpe
Cc: Kees Cook
Cc: Ard Biesheuvel
Cc: Pankaj Gupta
Cc: Baoquan He
Link: https://lkml.kernel.org/r/20200916073041.10355-1-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
2c76e7f6c hv_balloon: try to merge system ram resources ... Browse Code »

Let's try to merge system ram resources we add, to minimize the number of
resources in /proc/iomem. We don't care about the boundaries of
individual chunks we added.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Wei Liu
Cc: Michal Hocko
Cc: "K. Y. Srinivasan"
Cc: Haiyang Zhang
Cc: Stephen Hemminger
Cc: Wei Liu
Cc: Pankaj Gupta
Cc: Baoquan He
Cc: Wei Yang
Cc: Anton Blanchard
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Boris Ostrovsky
Cc: Christian Borntraeger
Cc: Dan Williams
Cc: Dave Jiang
Cc: Eric Biederman
Cc: Greg Kroah-Hartman
Cc: Heiko Carstens
Cc: Jason Gunthorpe
Cc: Jason Wang
Cc: Juergen Gross
Cc: Julien Grall
Cc: Kees Cook
Cc: Len Brown
Cc: Leonardo Bras
Cc: Libor Pechacek
Cc: Michael Ellerman
Cc: "Michael S. Tsirkin"
Cc: Nathan Lynch
Cc: "Oliver O'Halloran"
Cc: Paul Mackerras
Cc: Pingfan Liu
Cc: "Rafael J. Wysocki"
Cc: Roger Pau Monné
Cc: Stefano Stabellini
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vishal Verma
Link: https://lkml.kernel.org/r/20200911103459.10306-9-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
1b989d5d7 xen/balloon: try to merge system ram resources ... Browse Code »

Let's try to merge system ram resources we add, to minimize the number of
resources in /proc/iomem. We don't care about the boundaries of
individual chunks we added.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Juergen Gross
Cc: Michal Hocko
Cc: Boris Ostrovsky
Cc: Stefano Stabellini
Cc: Roger Pau Monné
Cc: Julien Grall
Cc: Pankaj Gupta
Cc: Baoquan He
Cc: Wei Yang
Cc: Anton Blanchard
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Christian Borntraeger
Cc: Dan Williams
Cc: Dave Jiang
Cc: Eric Biederman
Cc: Greg Kroah-Hartman
Cc: Haiyang Zhang
Cc: Heiko Carstens
Cc: Jason Gunthorpe
Cc: Jason Wang
Cc: Kees Cook
Cc: "K. Y. Srinivasan"
Cc: Len Brown
Cc: Leonardo Bras
Cc: Libor Pechacek
Cc: Michael Ellerman
Cc: "Michael S. Tsirkin"
Cc: Nathan Lynch
Cc: "Oliver O'Halloran"
Cc: Paul Mackerras
Cc: Pingfan Liu
Cc: "Rafael J. Wysocki"
Cc: Stephen Hemminger
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vishal Verma
Cc: Wei Liu
Link: https://lkml.kernel.org/r/20200911103459.10306-8-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
9b24247a2 virtio-mem: try to merge system ram resources ... Browse Code »

virtio-mem adds memory in memory block granularity, to be able to remove
it in the same granularity again later, and to grow slowly on demand.
This, however, results in quite a lot of resources when adding a lot of
memory. Resources are effectively stored in a list-based tree. Having a
lot of resources not only wastes memory, it also makes traversing that
tree more expensive, and makes /proc/iomem explode in size (e.g.,
requiring kexec-tools to manually merge resources later when e.g., trying
to create a kdump header).

Before this patch, we get (/proc/iomem) when hotplugging 2G via virtio-mem
on x86-64:
[...]
100000000-13fffffff : System RAM
140000000-33fffffff : virtio0
140000000-147ffffff : System RAM (virtio_mem)
148000000-14fffffff : System RAM (virtio_mem)
150000000-157ffffff : System RAM (virtio_mem)
158000000-15fffffff : System RAM (virtio_mem)
160000000-167ffffff : System RAM (virtio_mem)
168000000-16fffffff : System RAM (virtio_mem)
170000000-177ffffff : System RAM (virtio_mem)
178000000-17fffffff : System RAM (virtio_mem)
180000000-187ffffff : System RAM (virtio_mem)
188000000-18fffffff : System RAM (virtio_mem)
190000000-197ffffff : System RAM (virtio_mem)
198000000-19fffffff : System RAM (virtio_mem)
1a0000000-1a7ffffff : System RAM (virtio_mem)
1a8000000-1afffffff : System RAM (virtio_mem)
1b0000000-1b7ffffff : System RAM (virtio_mem)
1b8000000-1bfffffff : System RAM (virtio_mem)
3280000000-32ffffffff : PCI Bus 0000:00

With this patch, we get (/proc/iomem):
[...]
fffc0000-ffffffff : Reserved
100000000-13fffffff : System RAM
140000000-33fffffff : virtio0
140000000-1bfffffff : System RAM (virtio_mem)
3280000000-32ffffffff : PCI Bus 0000:00

Of course, with more hotplugged memory, it gets worse. When unplugging
memory blocks again, try_remove_memory() (via offline_and_remove_memory())
will properly split the resource up again.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Pankaj Gupta
Cc: Michal Hocko
Cc: Dan Williams
Cc: Michael S. Tsirkin
Cc: Jason Wang
Cc: Baoquan He
Cc: Wei Yang
Cc: Anton Blanchard
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Boris Ostrovsky
Cc: Christian Borntraeger
Cc: Dave Jiang
Cc: Eric Biederman
Cc: Greg Kroah-Hartman
Cc: Haiyang Zhang
Cc: Heiko Carstens
Cc: Jason Gunthorpe
Cc: Juergen Gross
Cc: Julien Grall
Cc: Kees Cook
Cc: "K. Y. Srinivasan"
Cc: Len Brown
Cc: Leonardo Bras
Cc: Libor Pechacek
Cc: Michael Ellerman
Cc: Nathan Lynch
Cc: "Oliver O'Halloran"
Cc: Paul Mackerras
Cc: Pingfan Liu
Cc: "Rafael J. Wysocki"
Cc: Roger Pau Monné
Cc: Stefano Stabellini
Cc: Stephen Hemminger
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vishal Verma
Cc: Wei Liu
Link: https://lkml.kernel.org/r/20200911103459.10306-7-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
9ca6551ee mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM resources ... Browse Code »

Some add_memory*() users add memory in small, contiguous memory blocks.
Examples include virtio-mem, hyper-v balloon, and the XEN balloon.

This can quickly result in a lot of memory resources, whereby the actual
resource boundaries are not of interest (e.g., it might be relevant for
DIMMs, exposed via /proc/iomem to user space). We really want to merge
added resources in this scenario where possible.

Let's provide a flag (MEMHP_MERGE_RESOURCE) to specify that a resource
either created within add_memory*() or passed via add_memory_resource()
shall be marked mergeable and merged with applicable siblings.

To implement that, we need a kernel/resource interface to mark selected
System RAM resources mergeable (IORESOURCE_SYSRAM_MERGEABLE) and trigger
merging.

Note: We really want to merge after the whole operation succeeded, not
directly when adding a resource to the resource tree (it would break
add_memory_resource() and require splitting resources again when the
operation failed - e.g., due to -ENOMEM).

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Pankaj Gupta
Cc: Michal Hocko
Cc: Dan Williams
Cc: Jason Gunthorpe
Cc: Kees Cook
Cc: Ard Biesheuvel
Cc: Thomas Gleixner
Cc: "K. Y. Srinivasan"
Cc: Haiyang Zhang
Cc: Stephen Hemminger
Cc: Wei Liu
Cc: Boris Ostrovsky
Cc: Juergen Gross
Cc: Stefano Stabellini
Cc: Roger Pau Monné
Cc: Julien Grall
Cc: Baoquan He
Cc: Wei Yang
Cc: Anton Blanchard
Cc: Benjamin Herrenschmidt
Cc: Christian Borntraeger
Cc: Dave Jiang
Cc: Eric Biederman
Cc: Greg Kroah-Hartman
Cc: Heiko Carstens
Cc: Jason Wang
Cc: Len Brown
Cc: Leonardo Bras
Cc: Libor Pechacek
Cc: Michael Ellerman
Cc: "Michael S. Tsirkin"
Cc: Nathan Lynch
Cc: "Oliver O'Halloran"
Cc: Paul Mackerras
Cc: Pingfan Liu
Cc: "Rafael J. Wysocki"
Cc: Vasily Gorbik
Cc: Vishal Verma
Link: https://lkml.kernel.org/r/20200911103459.10306-6-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
b61171997 mm/memory_hotplug: prepare passing flags to add_memory() and friends ... Browse Code »

We soon want to pass flags, e.g., to mark added System RAM resources.
mergeable. Prepare for that.

This patch is based on a similar patch by Oscar Salvador:

https://lkml.kernel.org/r/20190625075227.15193-3-osalvador@suse.de

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Juergen Gross # Xen related part
Reviewed-by: Pankaj Gupta
Acked-by: Wei Liu
Cc: Michal Hocko
Cc: Dan Williams
Cc: Jason Gunthorpe
Cc: Baoquan He
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "Rafael J. Wysocki"
Cc: Len Brown
Cc: Greg Kroah-Hartman
Cc: Vishal Verma
Cc: Dave Jiang
Cc: "K. Y. Srinivasan"
Cc: Haiyang Zhang
Cc: Stephen Hemminger
Cc: Wei Liu
Cc: Heiko Carstens
Cc: Vasily Gorbik
Cc: Christian Borntraeger
Cc: David Hildenbrand
Cc: "Michael S. Tsirkin"
Cc: Jason Wang
Cc: Boris Ostrovsky
Cc: Stefano Stabellini
Cc: "Oliver O'Halloran"
Cc: Pingfan Liu
Cc: Nathan Lynch
Cc: Libor Pechacek
Cc: Anton Blanchard
Cc: Leonardo Bras
Cc: Ard Biesheuvel
Cc: Eric Biederman
Cc: Julien Grall
Cc: Kees Cook
Cc: Roger Pau Monné
Cc: Thomas Gleixner
Cc: Wei Yang
Link: https://lkml.kernel.org/r/20200911103459.10306-5-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
3a0aaefe4 mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG ... Browse Code »

We soon want to pass flags via a new type to add_memory() and friends.
That revealed that we currently don't guard some declarations by
CONFIG_MEMORY_HOTPLUG.

While some definitions could be moved to different places, let's keep it
minimal for now and use CONFIG_MEMORY_HOTPLUG for all functions only
compiled with CONFIG_MEMORY_HOTPLUG.

Wrap sparse_decode_mem_map() into CONFIG_MEMORY_HOTPLUG, it's only called
from CONFIG_MEMORY_HOTPLUG code.

While at it, remove allow_online_pfn_range(), which is no longer around,
and mhp_notimplemented(), which is unused.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Cc: Michal Hocko
Cc: Dan Williams
Cc: Pankaj Gupta
Cc: Baoquan He
Cc: Wei Yang
Cc: Anton Blanchard
Cc: Ard Biesheuvel
Cc: Benjamin Herrenschmidt
Cc: Boris Ostrovsky
Cc: Christian Borntraeger
Cc: Dave Jiang
Cc: Eric Biederman
Cc: Greg Kroah-Hartman
Cc: Haiyang Zhang
Cc: Heiko Carstens
Cc: Jason Gunthorpe
Cc: Jason Wang
Cc: Juergen Gross
Cc: Julien Grall
Cc: Kees Cook
Cc: "K. Y. Srinivasan"
Cc: Len Brown
Cc: Leonardo Bras
Cc: Libor Pechacek
Cc: Michael Ellerman
Cc: "Michael S. Tsirkin"
Cc: Nathan Lynch
Cc: "Oliver O'Halloran"
Cc: Paul Mackerras
Cc: Pingfan Liu
Cc: "Rafael J. Wysocki"
Cc: Roger Pau Monné
Cc: Stefano Stabellini
Cc: Stephen Hemminger
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vishal Verma
Cc: Wei Liu
Link: https://lkml.kernel.org/r/20200911103459.10306-4-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
7cf603d17 kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED ... Browse Code »

IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is
always set to 0 by hardware. This is far from beautiful (and confusing),
and the bit only applies to SYSRAM. So let's move it out of the
bus-specific (PnP) defined bits.

We'll add another SYSRAM specific bit soon. If we ever need more bits for
other purposes, we can steal some from "desc", or reshuffle/regroup what
we have.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Cc: Michal Hocko
Cc: Dan Williams
Cc: Jason Gunthorpe
Cc: Kees Cook
Cc: Ard Biesheuvel
Cc: Pankaj Gupta
Cc: Baoquan He
Cc: Wei Yang
Cc: Eric Biederman
Cc: Thomas Gleixner
Cc: Greg Kroah-Hartman
Cc: Anton Blanchard
Cc: Benjamin Herrenschmidt
Cc: Boris Ostrovsky
Cc: Christian Borntraeger
Cc: Dave Jiang
Cc: Haiyang Zhang
Cc: Heiko Carstens
Cc: Jason Wang
Cc: Juergen Gross
Cc: Julien Grall
Cc: "K. Y. Srinivasan"
Cc: Len Brown
Cc: Leonardo Bras
Cc: Libor Pechacek
Cc: Michael Ellerman
Cc: "Michael S. Tsirkin"
Cc: Nathan Lynch
Cc: "Oliver O'Halloran"
Cc: Paul Mackerras
Cc: Pingfan Liu
Cc: "Rafael J. Wysocki"
Cc: Roger Pau Monné
Cc: Stefano Stabellini
Cc: Stephen Hemminger
Cc: Vasily Gorbik
Cc: Vishal Verma
Cc: Wei Liu
Link: https://lkml.kernel.org/r/20200911103459.10306-3-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:18 +0800
ec62d04e3 kernel/resource: make release_mem_region_adjustable() never fail ... Browse Code »

Patch series "selective merging of system ram resources", v4.

Some add_memory*() users add memory in small, contiguous memory blocks.
Examples include virtio-mem, hyper-v balloon, and the XEN balloon.

This can quickly result in a lot of memory resources, whereby the actual
resource boundaries are not of interest (e.g., it might be relevant for
DIMMs, exposed via /proc/iomem to user space). We really want to merge
added resources in this scenario where possible.

Resources are effectively stored in a list-based tree. Having a lot of
resources not only wastes memory, it also makes traversing that tree more
expensive, and makes /proc/iomem explode in size (e.g., requiring
kexec-tools to manually merge resources when creating a kdump header. The
current kexec-tools resource count limit does not allow for more than
~100GB of memory with a memory block size of 128MB on x86-64).

Let's allow to selectively merge system ram resources by specifying a new
flag for add_memory*(). Patch #5 contains a /proc/iomem example. Only
tested with virtio-mem.

This patch (of 8):

Let's make sure splitting a resource on memory hotunplug will never fail.
This will become more relevant once we merge selected System RAM resources
- then, we'll trigger that case more often on memory hotunplug.

In general, this function is already unlikely to fail. When we remove
memory, we free up quite a lot of metadata (memmap, page tables, memory
block device, etc.). The only reason it could really fail would be when
injecting allocation errors.

All other error cases inside release_mem_region_adjustable() seem to be
sanity checks if the function would be abused in different context - let's
add WARN_ON_ONCE() in these cases so we can catch them.

[natechancellor@gmail.com: fix use of ternary condition in release_mem_region_adjustable]
Link: https://lkml.kernel.org/r/20200922060748.2452056-1-natechancellor@gmail.com
Link: https://github.com/ClangBuiltLinux/linux/issues/1159

Signed-off-by: David Hildenbrand
Signed-off-by: Nathan Chancellor
Signed-off-by: Andrew Morton
Cc: Michal Hocko
Cc: Dan Williams
Cc: Jason Gunthorpe
Cc: Kees Cook
Cc: Ard Biesheuvel
Cc: Pankaj Gupta
Cc: Baoquan He
Cc: Wei Yang
Cc: Anton Blanchard
Cc: Benjamin Herrenschmidt
Cc: Boris Ostrovsky
Cc: Christian Borntraeger
Cc: Dave Jiang
Cc: Eric Biederman
Cc: Greg Kroah-Hartman
Cc: Haiyang Zhang
Cc: Heiko Carstens
Cc: Jason Wang
Cc: Juergen Gross
Cc: Julien Grall
Cc: "K. Y. Srinivasan"
Cc: Len Brown
Cc: Leonardo Bras
Cc: Libor Pechacek
Cc: Michael Ellerman
Cc: "Michael S. Tsirkin"
Cc: Nathan Lynch
Cc: "Oliver O'Halloran"
Cc: Paul Mackerras
Cc: Pingfan Liu
Cc: "Rafael J. Wysocki"
Cc: Roger Pau Monn
Cc: Stefano Stabellini
Cc: Stephen Hemminger
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vishal Verma
Cc: Wei Liu
Link: https://lkml.kernel.org/r/20200911103459.10306-2-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:17 +0800
b30c59279 mm/memory_hotplug: mark pageblocks MIGRATE_ISOLATE while onlining memory ... Browse Code »

Currently, it can happen that pages are allocated (and freed) via the
buddy before we finished basic memory onlining.

For example, pages are exposed to the buddy and can be allocated before we
actually mark the sections online. Allocated pages could suddenly fail
pfn_to_online_page() checks. We had similar issues with pcp handling,
when pages are allocated+freed before we reach zone_pcp_update() in
online_pages() [1].

Instead, mark all pageblocks MIGRATE_ISOLATE, such that allocations are
impossible. Once done with the heavy lifting, use
undo_isolate_page_range() to move the pages to the MIGRATE_MOVABLE
freelist, marking them ready for allocation. Similar to offline_pages(),
we have to manually adjust zone->nr_isolate_pageblock.

[1] https://lkml.kernel.org/r/1597150703-19003-1-git-send-email-charante@codeaurora.org

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Charan Teja Reddy
Cc: Dan Williams
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: "Matthew Wilcox (Oracle)"
Cc: Mel Gorman
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Mike Rapoport
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200819175957.28465-11-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:17 +0800
d882c0067 mm: pass migratetype into memmap_init_zone() and move_pfn_range_to_zone() ... Browse Code »

On the memory onlining path, we want to start with MIGRATE_ISOLATE, to
un-isolate the pages after memory onlining is complete. Let's allow
passing in the migratetype.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Tony Luck
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: Dan Williams
Cc: Mike Rapoport
Cc: "Matthew Wilcox (Oracle)"
Cc: Michel Lespinasse
Cc: Charan Teja Reddy
Cc: Mel Gorman
Link: https://lkml.kernel.org/r/20200819175957.28465-10-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:17 +0800
4eb29bd9d mm/page_alloc: drop stale pageblock comment in memmap_init_zone*() ... Browse Code »

Commit ac5d2539b238 ("mm: meminit: reduce number of times pageblocks are
set during struct page init") moved the actual zone range check, leaving
only the alignment check for pageblocks.

Let's drop the stale comment and make the pageblock check easier to read.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Mel Gorman
Cc: Charan Teja Reddy
Cc: Dan Williams
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: "Matthew Wilcox (Oracle)"
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Mike Rapoport
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200819175957.28465-9-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:17 +0800
aac65321b mm/memory_hotplug: simplify page onlining ... Browse Code »

We don't allow to offline memory with holes, all boot memory is online,
and all hotplugged memory cannot have holes.

We can now simplify onlining of pages. As we only allow to online/offline
full sections and sections always span full MAX_ORDER_NR_PAGES, we can
just process MAX_ORDER - 1 pages without further special handling.

The number of onlined pages simply corresponds to the number of pages we
were requested to online.

While at it, refine the comment regarding the callback not exposing all
pages to the buddy.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Charan Teja Reddy
Cc: Dan Williams
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: "Matthew Wilcox (Oracle)"
Cc: Mel Gorman
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Mike Rapoport
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200819175957.28465-8-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:17 +0800
3fa0c7c79 mm/page_isolation: simplify return value of start_isolate_page_range() ... Browse Code »

Callers no longer need the number of isolated pageblocks. Let's simplify.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Charan Teja Reddy
Cc: Dan Williams
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: "Matthew Wilcox (Oracle)"
Cc: Mel Gorman
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Mike Rapoport
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200819175957.28465-7-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:17 +0800
ea15153c3 mm/memory_hotplug: drop nr_isolate_pageblock in offline_pages() ... Browse Code »

We make sure that we cannot have any memory holes right at the beginning
of offline_pages() and we only support to online/offline full sections.
Both, sections and pageblocks are a power of two in size, and sections
always span full pageblocks.

We can directly calculate the number of isolated pageblocks from nr_pages.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Charan Teja Reddy
Cc: Dan Williams
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: "Matthew Wilcox (Oracle)"
Cc: Mel Gorman
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Mike Rapoport
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200819175957.28465-6-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:17 +0800
257bea715 mm/page_alloc: simplify __offline_isolated_pages() ... Browse Code »

offline_pages() is the only user. __offline_isolated_pages() never gets
called with ranges that contain memory holes and we no longer care about
the return value. Drop the return value handling and all pfn_valid()
checks.

Update the documentation.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Charan Teja Reddy
Cc: Dan Williams
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: "Matthew Wilcox (Oracle)"
Cc: Mel Gorman
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Mike Rapoport
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200819175957.28465-5-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:17 +0800
0a1a9a000 mm/memory_hotplug: simplify page offlining ... Browse Code »

We make sure that we cannot have any memory holes right at the beginning
of offline_pages(). We no longer need walk_system_ram_range() and can
call test_pages_isolated() and __offline_isolated_pages() directly.

offlined_pages always corresponds to nr_pages, so we can simplify that.

[akpm@linux-foundation.org: patch conflict resolution]

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Charan Teja Reddy
Cc: Dan Williams
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: "Matthew Wilcox (Oracle)"
Cc: Mel Gorman
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Mike Rapoport
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200819175957.28465-4-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:17 +0800
4986fac16 mm/memory_hotplug: enforce section granularity when onlining/offlining ... Browse Code »

Already two people (including me) tried to offline subsections, because
the function looks like it can deal with it. But we really can only
online/offline full sections that are properly aligned (e.g., we can only
mark full sections online/offline via SECTION_IS_ONLINE).

Add a simple safety net to document the restriction now. Current users
(core and powernv/memtrace) respect these restrictions.

Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Charan Teja Reddy
Cc: Dan Williams
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: "Matthew Wilcox (Oracle)"
Cc: Mel Gorman
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Mike Rapoport
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200819175957.28465-3-david@redhat.com
Signed-off-by: Linus Torvalds

David Hildenbrand
2020-10-17 02:11:17 +0800