17 Oct, 2020
3 commits
-
The current page_order() can only be called on pages in the buddy
allocator. For compound pages, you have to use compound_order(). This is
confusing and led to a bug, so rename page_order() to buddy_order().Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/20201001152259.14932-2-willy@infradead.org
Signed-off-by: Linus Torvalds -
Whenever we move pages between freelists via move_to_free_list()/
move_freepages_block(), we don't actually touch the pages:
1. Page isolation doesn't actually touch the pages, it simply isolates
pageblocks and moves all free pages to the MIGRATE_ISOLATE freelist.
When undoing isolation, we move the pages back to the target list.
2. Page stealing (steal_suitable_fallback()) moves free pages directly
between lists without touching them.
3. reserve_highatomic_pageblock()/unreserve_highatomic_pageblock() moves
free pages directly between freelists without touching them.We already place pages to the tail of the freelists when undoing isolation
via __putback_isolated_page(), let's do it in any case (e.g., if order
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Reviewed-by: Wei Yang
Acked-by: Pankaj Gupta
Acked-by: Michal Hocko
Cc: Alexander Duyck
Cc: Mel Gorman
Cc: Dave Hansen
Cc: Vlastimil Babka
Cc: Mike Rapoport
Cc: Scott Cheloha
Cc: Michael Ellerman
Cc: Haiyang Zhang
Cc: "K. Y. Srinivasan"
Cc: Matthew Wilcox
Cc: Michal Hocko
Cc: Stephen Hemminger
Cc: Wei Liu
Link: https://lkml.kernel.org/r/20201005121534.15649-4-david@redhat.com
Signed-off-by: Linus Torvalds -
Callers no longer need the number of isolated pageblocks. Let's simplify.
Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Wei Yang
Cc: Baoquan He
Cc: Pankaj Gupta
Cc: Charan Teja Reddy
Cc: Dan Williams
Cc: Fenghua Yu
Cc: Logan Gunthorpe
Cc: "Matthew Wilcox (Oracle)"
Cc: Mel Gorman
Cc: Mel Gorman
Cc: Michel Lespinasse
Cc: Mike Rapoport
Cc: Tony Luck
Link: https://lkml.kernel.org/r/20200819175957.28465-7-david@redhat.com
Signed-off-by: Linus Torvalds
14 Oct, 2020
3 commits
-
Let's clean it up a bit, simplifying the exit paths.
Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Baoquan He
Reviewed-by: Pankaj Gupta
Cc: Michal Hocko
Cc: Michael S. Tsirkin
Cc: Mike Kravetz
Cc: Jason Wang
Cc: Mike Rapoport
Cc: Qian Cai
Link: http://lkml.kernel.org/r/20200816125333.7434-5-david@redhat.com
Signed-off-by: Linus Torvalds -
Inside has_unmovable_pages(), we have a comment describing how unmovable
data could end up in ZONE_MOVABLE - via "movablecore". Also, besides
checking if the first page in the pageblock is reserved, we don't perform
any further checks in case of ZONE_MOVABLE.In case of memory offlining, we set REPORT_FAILURE, properly dump_page()
the page and handle the error gracefully. alloc_contig_pages() users
currently never allocate from ZONE_MOVABLE. E.g., hugetlb uses
alloc_contig_pages() for the allocation of gigantic pages only, which will
never end up on the MOVABLE zone (see htlb_alloc_mask()).Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Baoquan He
Cc: Michal Hocko
Cc: Michael S. Tsirkin
Cc: Mike Kravetz
Cc: Pankaj Gupta
Cc: Jason Wang
Cc: Mike Rapoport
Cc: Qian Cai
Link: http://lkml.kernel.org/r/20200816125333.7434-4-david@redhat.com
Signed-off-by: Linus Torvalds -
Right now, if we have two isolations racing on a pageblock that's in the
MOVABLE zone, we would trigger the WARN_ON_ONCE(). Let's just return
directly, simplifying error handling.The change was introduced in commit 3d680bdf60a5 ("mm/page_isolation: fix
potential warning from user"). As far as I can see, we currently don't
have alloc_contig_range() users that use the ZONE_MOVABLE (anymore), so
it's currently more a cleanup and a preparation for the future than a fix.Signed-off-by: David Hildenbrand
Signed-off-by: Andrew Morton
Reviewed-by: Baoquan He
Reviewed-by: Pankaj Gupta
Acked-by: Mike Kravetz
Cc: Michal Hocko
Cc: Michael S. Tsirkin
Cc: Qian Cai
Cc: Jason Wang
Cc: Mike Rapoport
Link: http://lkml.kernel.org/r/20200816125333.7434-3-david@redhat.com
Signed-off-by: Linus Torvalds
20 Sep, 2020
1 commit
-
There is a race during page offline that can lead to infinite loop:
a page never ends up on a buddy list and __offline_pages() keeps
retrying infinitely or until a termination signal is received.Thread#1 - a new process:
load_elf_binary
begin_new_exec
exec_mmap
mmput
exit_mmap
tlb_finish_mmu
tlb_flush_mmu
release_pages
free_unref_page_list
free_unref_page_prepare
set_pcppage_migratetype(page, migratetype);
// Set page->index migration type below MIGRATE_PCPTYPESThread#2 - hot-removes memory
__offline_pages
start_isolate_page_range
set_migratetype_isolate
set_pageblock_migratetype(page, MIGRATE_ISOLATE);
Set migration type to MIGRATE_ISOLATE-> set
drain_all_pages(zone);
// drain per-cpu page lists to buddy allocator.Thread#1 - continue
free_unref_page_commit
migratetype = get_pcppage_migratetype(page);
// get old migration type
list_add(&page->lru, &pcp->lists[migratetype]);
// add new page to already drained pcp listThread#2
Never drains pcp again, and therefore gets stuck in the loop.The fix is to try to drain per-cpu lists again after
check_pages_isolated_cb() fails.Fixes: c52e75935f8d ("mm: remove extra drain pages on pcp list")
Signed-off-by: Pavel Tatashin
Signed-off-by: Andrew Morton
Acked-by: David Rientjes
Acked-by: Vlastimil Babka
Acked-by: Michal Hocko
Acked-by: David Hildenbrand
Cc: Oscar Salvador
Cc: Wei Yang
Cc:
Link: https://lkml.kernel.org/r/20200903140032.380431-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20200904151448.100489-2-pasha.tatashin@soleen.com
Link: http://lkml.kernel.org/r/20200904070235.GA15277@dhcp22.suse.cz
Signed-off-by: Linus Torvalds
13 Aug, 2020
3 commits
-
There is a well-defined standard migration target callback. Use it
directly.Signed-off-by: Joonsoo Kim
Signed-off-by: Andrew Morton
Acked-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Christoph Hellwig
Cc: Mike Kravetz
Cc: Naoya Horiguchi
Cc: Roman Gushchin
Link: http://lkml.kernel.org/r/1594622517-20681-8-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds -
There are some similar functions for migration target allocation. Since
there is no fundamental difference, it's better to keep just one rather
than keeping all variants. This patch implements base migration target
allocation function. In the following patches, variants will be converted
to use this function.Changes should be mechanical, but, unfortunately, there are some
differences. First, some callers' nodemask is assgined to NULL since NULL
nodemask will be considered as all available nodes, that is,
&node_states[N_MEMORY]. Second, for hugetlb page allocation, gfp_mask is
redefined as regular hugetlb allocation gfp_mask plus __GFP_THISNODE if
user provided gfp_mask has it. This is because future caller of this
function requires to set this node constaint. Lastly, if provided nodeid
is NUMA_NO_NODE, nodeid is set up to the node where migration source
lives. It helps to remove simple wrappers for setting up the nodeid.Note that PageHighmem() call in previous function is changed to open-code
"is_highmem_idx()" since it provides more readability.[akpm@linux-foundation.org: tweak patch title, per Vlastimil]
[akpm@linux-foundation.org: fix typo in comment]Signed-off-by: Joonsoo Kim
Signed-off-by: Andrew Morton
Acked-by: Vlastimil Babka
Acked-by: Michal Hocko
Cc: Christoph Hellwig
Cc: Mike Kravetz
Cc: Naoya Horiguchi
Cc: Roman Gushchin
Link: http://lkml.kernel.org/r/1594622517-20681-6-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds -
Patch series "clean-up the migration target allocation functions", v5.
This patch (of 9):
For locality, it's better to migrate the page to the same node rather than
the node of the current caller's cpu.Signed-off-by: Joonsoo Kim
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Acked-by: Roman Gushchin
Acked-by: Michal Hocko
Cc: Christoph Hellwig
Cc: Mike Kravetz
Cc: Naoya Horiguchi
Link: http://lkml.kernel.org/r/1594622517-20681-1-git-send-email-iamjoonsoo.kim@lge.com
Link: http://lkml.kernel.org/r/1594622517-20681-2-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Linus Torvalds
05 Jun, 2020
1 commit
-
virtio-mem wants to allow to offline memory blocks of which some parts
were unplugged (allocated via alloc_contig_range()), especially, to later
offline and remove completely unplugged memory blocks. The important part
is that PageOffline() has to remain set until the section is offline, so
these pages will never get accessed (e.g., when dumping). The pages should
not be handed back to the buddy (which would require clearing PageOffline()
and result in issues if offlining fails and the pages are suddenly in the
buddy).Let's allow to do that by allowing to isolate any PageOffline() page
when offlining. This way, we can reach the memory hotplug notifier
MEM_GOING_OFFLINE, where the driver can signal that he is fine with
offlining this page by dropping its reference count. PageOffline() pages
with a reference count of 0 can then be skipped when offlining the
pages (like if they were free, however they are not in the buddy).Anybody who uses PageOffline() pages and does not agree to offline them
(e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not
decrement the reference count and make offlining fail when trying to
migrate such an unmovable page. So there should be no observable change.
Same applies to balloon compaction users (movable PageOffline() pages), the
pages will simply be migrated.Note 1: If offlining fails, a driver has to increment the reference
count again in MEM_CANCEL_OFFLINE.Note 2: A driver that makes use of this has to be aware that re-onlining
the memory block has to be handled by hooking into onlining code
(online_page_callback_t), resetting the page PageOffline() and
not giving them to the buddy.Reviewed-by: Alexander Duyck
Acked-by: Michal Hocko
Tested-by: Pankaj Gupta
Acked-by: Andrew Morton
Cc: Andrew Morton
Cc: Juergen Gross
Cc: Konrad Rzeszutek Wilk
Cc: Pavel Tatashin
Cc: Alexander Duyck
Cc: Vlastimil Babka
Cc: Johannes Weiner
Cc: Anthony Yznaga
Cc: Michal Hocko
Cc: Oscar Salvador
Cc: Mel Gorman
Cc: Mike Rapoport
Cc: Dan Williams
Cc: Anshuman Khandual
Cc: Qian Cai
Cc: Pingfan Liu
Signed-off-by: David Hildenbrand
Link: https://lore.kernel.org/r/20200507140139.17083-7-david@redhat.com
Signed-off-by: Michael S. Tsirkin
08 Apr, 2020
1 commit
-
There are cases where we would benefit from avoiding having to go through
the allocation and free cycle to return an isolated page.Examples for this might include page poisoning in which we isolate a page
and then put it back in the free list without ever having actually
allocated it.This will enable us to also avoid notifiers for the future free page
reporting which will need to avoid retriggering page reporting when
returning pages that have been reported on.Signed-off-by: Alexander Duyck
Signed-off-by: Andrew Morton
Acked-by: David Hildenbrand
Acked-by: Mel Gorman
Cc: Andrea Arcangeli
Cc: Dan Williams
Cc: Dave Hansen
Cc: Konrad Rzeszutek Wilk
Cc: Luiz Capitulino
Cc: Matthew Wilcox
Cc: Michael S. Tsirkin
Cc: Michal Hocko
Cc: Nitesh Narayan Lal
Cc: Oscar Salvador
Cc: Pankaj Gupta
Cc: Paolo Bonzini
Cc: Rik van Riel
Cc: Vlastimil Babka
Cc: Wei Wang
Cc: Yang Zhang
Cc: wei qi
Link: http://lkml.kernel.org/r/20200211224624.29318.89287.stgit@localhost.localdomain
Signed-off-by: Linus Torvalds
01 Feb, 2020
4 commits
-
It makes sense to call the WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE)
from start_isolate_page_range(), but should avoid triggering it from
userspace, i.e, from is_mem_section_removable() because it could crash
the system by a non-root user if warn_on_panic is set.While at it, simplify the code a bit by removing an unnecessary jump
label.Link: http://lkml.kernel.org/r/20200120163915.1469-1-cai@lca.pw
Signed-off-by: Qian Cai
Suggested-by: Michal Hocko
Acked-by: Michal Hocko
Reviewed-by: David Hildenbrand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It is not that hard to trigger lockdep splats by calling printk from
under zone->lock. Most of them are false positives caused by lock
chains introduced early in the boot process and they do not cause any
real problems (although most of the early boot lock dependencies could
happen after boot as well). There are some console drivers which do
allocate from the printk context as well and those should be fixed. In
any case, false positives are not that trivial to workaround and it is
far from optimal to lose lockdep functionality for something that is a
non-issue.So change has_unmovable_pages() so that it no longer calls dump_page()
itself - instead it returns a "struct page *" of the unmovable page back
to the caller so that in the case of a has_unmovable_pages() failure,
the caller can call dump_page() after releasing zone->lock. Also, make
dump_page() is able to report a CMA page as well, so the reason string
from has_unmovable_pages() can be removed.Even though has_unmovable_pages doesn't hold any reference to the
returned page this should be reasonably safe for the purpose of
reporting the page (dump_page) because it cannot be hotremoved in the
context of memory unplug. The state of the page might change but that
is the case even with the existing code as zone->lock only plays role
for free pages.While at it, remove a similar but unnecessary debug-only printk() as
well. A sample of one of those lockdep splats is,WARNING: possible circular locking dependency detected
------------------------------------------------------
test.sh/8653 is trying to acquire lock:
ffffffff865a4460 (console_owner){-.-.}, at:
console_unlock+0x207/0x750but task is already holding lock:
ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&(&zone->lock)->rlock){-.-.}:
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
_raw_spin_lock+0x2f/0x40
rmqueue_bulk.constprop.21+0xb6/0x1160
get_page_from_freelist+0x898/0x22c0
__alloc_pages_nodemask+0x2f3/0x1cd0
alloc_pages_current+0x9c/0x110
allocate_slab+0x4c6/0x19c0
new_slab+0x46/0x70
___slab_alloc+0x58b/0x960
__slab_alloc+0x43/0x70
__kmalloc+0x3ad/0x4b0
__tty_buffer_request_room+0x100/0x250
tty_insert_flip_string_fixed_flag+0x67/0x110
pty_write+0xa2/0xf0
n_tty_write+0x36b/0x7b0
tty_write+0x284/0x4c0
__vfs_write+0x50/0xa0
vfs_write+0x105/0x290
redirected_tty_write+0x6a/0xc0
do_iter_write+0x248/0x2a0
vfs_writev+0x106/0x1e0
do_writev+0xd4/0x180
__x64_sys_writev+0x45/0x50
do_syscall_64+0xcc/0x76c
entry_SYSCALL_64_after_hwframe+0x49/0xbe-> #2 (&(&port->lock)->rlock){-.-.}:
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
_raw_spin_lock_irqsave+0x3a/0x50
tty_port_tty_get+0x20/0x60
tty_port_default_wakeup+0xf/0x30
tty_port_tty_wakeup+0x39/0x40
uart_write_wakeup+0x2a/0x40
serial8250_tx_chars+0x22e/0x440
serial8250_handle_irq.part.8+0x14a/0x170
serial8250_default_handle_irq+0x5c/0x90
serial8250_interrupt+0xa6/0x130
__handle_irq_event_percpu+0x78/0x4f0
handle_irq_event_percpu+0x70/0x100
handle_irq_event+0x5a/0x8b
handle_edge_irq+0x117/0x370
do_IRQ+0x9e/0x1e0
ret_from_intr+0x0/0x2a
cpuidle_enter_state+0x156/0x8e0
cpuidle_enter+0x41/0x70
call_cpuidle+0x5e/0x90
do_idle+0x333/0x370
cpu_startup_entry+0x1d/0x1f
start_secondary+0x290/0x330
secondary_startup_64+0xb6/0xc0-> #1 (&port_lock_key){-.-.}:
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
_raw_spin_lock_irqsave+0x3a/0x50
serial8250_console_write+0x3e4/0x450
univ8250_console_write+0x4b/0x60
console_unlock+0x501/0x750
vprintk_emit+0x10d/0x340
vprintk_default+0x1f/0x30
vprintk_func+0x44/0xd4
printk+0x9f/0xc5-> #0 (console_owner){-.-.}:
check_prev_add+0x107/0xea0
validate_chain+0x8fc/0x1200
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
console_unlock+0x269/0x750
vprintk_emit+0x10d/0x340
vprintk_default+0x1f/0x30
vprintk_func+0x44/0xd4
printk+0x9f/0xc5
__offline_isolated_pages.cold.52+0x2f/0x30a
offline_isolated_pages_cb+0x17/0x30
walk_system_ram_range+0xda/0x160
__offline_pages+0x79c/0xa10
offline_pages+0x11/0x20
memory_subsys_offline+0x7e/0xc0
device_offline+0xd5/0x110
state_store+0xc6/0xe0
dev_attr_store+0x3f/0x60
sysfs_kf_write+0x89/0xb0
kernfs_fop_write+0x188/0x240
__vfs_write+0x50/0xa0
vfs_write+0x105/0x290
ksys_write+0xc6/0x160
__x64_sys_write+0x43/0x50
do_syscall_64+0xcc/0x76c
entry_SYSCALL_64_after_hwframe+0x49/0xbeother info that might help us debug this:
Chain exists of:
console_owner --> &(&port->lock)->rlock --> &(&zone->lock)->rlockPossible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(&zone->lock)->rlock);
lock(&(&port->lock)->rlock);
lock(&(&zone->lock)->rlock);
lock(console_owner);*** DEADLOCK ***
9 locks held by test.sh/8653:
#0: ffff88839ba7d408 (sb_writers#4){.+.+}, at:
vfs_write+0x25f/0x290
#1: ffff888277618880 (&of->mutex){+.+.}, at:
kernfs_fop_write+0x128/0x240
#2: ffff8898131fc218 (kn->count#115){.+.+}, at:
kernfs_fop_write+0x138/0x240
#3: ffffffff86962a80 (device_hotplug_lock){+.+.}, at:
lock_device_hotplug_sysfs+0x16/0x50
#4: ffff8884374f4990 (&dev->mutex){....}, at:
device_offline+0x70/0x110
#5: ffffffff86515250 (cpu_hotplug_lock.rw_sem){++++}, at:
__offline_pages+0xbf/0xa10
#6: ffffffff867405f0 (mem_hotplug_lock.rw_sem){++++}, at:
percpu_down_write+0x87/0x2f0
#7: ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0
#8: ffffffff865a4920 (console_lock){+.+.}, at:
vprintk_emit+0x100/0x340stack backtrace:
Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10,
BIOS U34 05/21/2019
Call Trace:
dump_stack+0x86/0xca
print_circular_bug.cold.31+0x243/0x26e
check_noncircular+0x29e/0x2e0
check_prev_add+0x107/0xea0
validate_chain+0x8fc/0x1200
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
console_unlock+0x269/0x750
vprintk_emit+0x10d/0x340
vprintk_default+0x1f/0x30
vprintk_func+0x44/0xd4
printk+0x9f/0xc5
__offline_isolated_pages.cold.52+0x2f/0x30a
offline_isolated_pages_cb+0x17/0x30
walk_system_ram_range+0xda/0x160
__offline_pages+0x79c/0xa10
offline_pages+0x11/0x20
memory_subsys_offline+0x7e/0xc0
device_offline+0xd5/0x110
state_store+0xc6/0xe0
dev_attr_store+0x3f/0x60
sysfs_kf_write+0x89/0xb0
kernfs_fop_write+0x188/0x240
__vfs_write+0x50/0xa0
vfs_write+0x105/0x290
ksys_write+0xc6/0x160
__x64_sys_write+0x43/0x50
do_syscall_64+0xcc/0x76c
entry_SYSCALL_64_after_hwframe+0x49/0xbeLink: http://lkml.kernel.org/r/20200117181200.20299-1-cai@lca.pw
Signed-off-by: Qian Cai
Reviewed-by: David Hildenbrand
Cc: Michal Hocko
Cc: Sergey Senozhatsky
Cc: Petr Mladek
Cc: Steven Rostedt (VMware)
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Now that the memory isolate notifier is gone, the parameter is always 0.
Drop it and cleanup has_unmovable_pages().Link: http://lkml.kernel.org/r/20191114131911.11783-3-david@redhat.com
Signed-off-by: David Hildenbrand
Acked-by: Michal Hocko
Cc: Oscar Salvador
Cc: Anshuman Khandual
Cc: Qian Cai
Cc: Pingfan Liu
Cc: Stephen Rothwell
Cc: Dan Williams
Cc: Pavel Tatashin
Cc: Vlastimil Babka
Cc: Mel Gorman
Cc: Mike Rapoport
Cc: Wei Yang
Cc: Alexander Duyck
Cc: Alexander Potapenko
Cc: Arun KS
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Luckily, we have no users left, so we can get rid of it. Cleanup
set_migratetype_isolate() a little bit.Link: http://lkml.kernel.org/r/20191114131911.11783-2-david@redhat.com
Signed-off-by: David Hildenbrand
Reviewed-by: Greg Kroah-Hartman
Acked-by: Michal Hocko
Cc: "Rafael J. Wysocki"
Cc: Pavel Tatashin
Cc: Dan Williams
Cc: Oscar Salvador
Cc: Qian Cai
Cc: Anshuman Khandual
Cc: Pingfan Liu
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Dec, 2019
1 commit
-
We have two types of users of page isolation:
1. Memory offlining: Offline memory so it can be unplugged. Memory
won't be touched.2. Memory allocation: Allocate memory (e.g., alloc_contig_range()) to
become the owner of the memory and make use of
it.For example, in case we want to offline memory, we can ignore (skip
over) PageHWPoison() pages, as the memory won't get used. We can allow
to offline memory. In contrast, we don't want to allow to allocate such
memory.Let's generalize the approach so we can special case other types of
pages we want to skip over in case we offline memory. While at it, also
pass the same flags to test_pages_isolated().Link: http://lkml.kernel.org/r/20191021172353.3056-3-david@redhat.com
Signed-off-by: David Hildenbrand
Suggested-by: Michal Hocko
Acked-by: Michal Hocko
Cc: Oscar Salvador
Cc: Anshuman Khandual
Cc: David Hildenbrand
Cc: Pingfan Liu
Cc: Qian Cai
Cc: Pavel Tatashin
Cc: Dan Williams
Cc: Vlastimil Babka
Cc: Mel Gorman
Cc: Mike Rapoport
Cc: Alexander Duyck
Cc: Mike Rapoport
Cc: Pavel Tatashin
Cc: Wei Yang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Jul, 2019
1 commit
-
undo_isolate_page_range() never fails, so no need to return value.
Link: http://lkml.kernel.org/r/1562075604-8979-1-git-send-email-kernelfans@gmail.com
Signed-off-by: Pingfan Liu
Acked-by: Michal Hocko
Reviewed-by: Oscar Salvador
Reviewed-by: Anshuman Khandual
Cc: Qian Cai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 May, 2019
1 commit
-
pfn_valid_within() calls pfn_valid() when CONFIG_HOLES_IN_ZONE making it
redundant for both definitions (w/wo CONFIG_MEMORY_HOTPLUG) of the helper
pfn_to_online_page() which either calls pfn_valid() or pfn_valid_within().
pfn_valid_within() being 1 when !CONFIG_HOLES_IN_ZONE is irrelevant
either way. This does not change functionality.Link: http://lkml.kernel.org/r/1553141595-26907-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual
Reviewed-by: Zi Yan
Reviewed-by: Oscar Salvador
Acked-by: Michal Hocko
Cc: Mike Kravetz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 Mar, 2019
2 commits
-
Due to has_unmovable_pages() taking an incorrect irqsave flag instead of
the isolation flag in set_migratetype_isolate(), there are issues with
HWPOSION and error reporting where dump_page() is not called when there
is an unmovable page.Link: http://lkml.kernel.org/r/20190320204941.53731-1-cai@lca.pw
Fixes: d381c54760dc ("mm: only report isolation failures when offlining memory")
Acked-by: Michal Hocko
Reviewed-by: Oscar Salvador
Signed-off-by: Qian Cai
Cc: [5.0.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
memory to zones until online") introduced move_pfn_range_to_zone() which
calls memmap_init_zone() during onlining a memory block.
memmap_init_zone() will reset pagetype flags and makes migrate type to
be MOVABLE.However, in __offline_pages(), it also call undo_isolate_page_range()
after offline_isolated_pages() to do the same thing. Due to commit
2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed
__first_valid_page() to skip offline pages, undo_isolate_page_range()
here just waste CPU cycles looping around the offlining PFN range while
doing nothing, because __first_valid_page() will return NULL as
offline_isolated_pages() has already marked all memory sections within
the pfn range as offline via offline_mem_sections().Also, after calling the "useless" undo_isolate_page_range() here, it
reaches the point of no returning by notifying MEM_OFFLINE. Those pages
will be marked as MIGRATE_MOVABLE again once onlining. The only thing
left to do is to decrease the number of isolated pageblocks zone counter
which would make some paths of the page allocation slower that the above
commit introduced.Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages
on ppc64, an "int" should still be enough to represent the number of
pageblocks there. Fix an incorrect comment along the way.[cai@lca.pw: v4]
Link: http://lkml.kernel.org/r/20190314150641.59358-1-cai@lca.pw
Link: http://lkml.kernel.org/r/20190313143133.46200-1-cai@lca.pw
Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages")
Signed-off-by: Qian Cai
Acked-by: Michal Hocko
Reviewed-by: Oscar Salvador
Cc: Vlastimil Babka
Cc: [4.13+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Dec, 2018
1 commit
-
Heiko has complained that his log is swamped by warnings from
has_unmovable_pages[ 20.536664] page dumped because: has_unmovable_pages
[ 20.536792] page:000003d081ff4080 count:1 mapcount:0 mapping:000000008ff88600 index:0x0 compound_mapcount: 0
[ 20.536794] flags: 0x3fffe0000010200(slab|head)
[ 20.536795] raw: 03fffe0000010200 0000000000000100 0000000000000200 000000008ff88600
[ 20.536796] raw: 0000000000000000 0020004100000000 ffffffff00000001 0000000000000000
[ 20.536797] page dumped because: has_unmovable_pages
[ 20.536814] page:000003d0823b0000 count:1 mapcount:0 mapping:0000000000000000 index:0x0
[ 20.536815] flags: 0x7fffe0000000000()
[ 20.536817] raw: 07fffe0000000000 0000000000000100 0000000000000200 0000000000000000
[ 20.536818] raw: 0000000000000000 0000000000000000 ffffffff00000001 0000000000000000which are not triggered by the memory hotplug but rather CMA allocator.
The original idea behind dumping the page state for all call paths was
that these messages will be helpful debugging failures. From the above it
seems that this is not the case for the CMA path because we are lacking
much more context. E.g the second reported page might be a CMA allocated
page. It is still interesting to see a slab page in the CMA area but it
is hard to tell whether this is bug from the above output alone.Address this issue by dumping the page state only on request. Both
start_isolate_page_range and has_unmovable_pages already have an argument
to ignore hwpoison pages so make this argument more generic and turn it
into flags and allow callers to combine non-default modes into a mask.
While we are at it, has_unmovable_pages call from
is_pageblock_removable_nolock (sysfs removable file) is questionable to
report the failure so drop it from there as well.Link: http://lkml.kernel.org/r/20181218092802.31429-1-mhocko@kernel.org
Signed-off-by: Michal Hocko
Reported-by: Heiko Carstens
Reviewed-by: Oscar Salvador
Cc: Anshuman Khandual
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Apr, 2018
1 commit
-
No allocation callback is using this argument anymore. new_page_node
used to use this parameter to convey node_id resp. migration error up
to move_pages code (do_move_page_to_node_array). The error status never
made it into the final status field and we have a better way to
communicate node id to the status field now. All other allocation
callbacks simply ignored the argument so we can drop it finally.[mhocko@suse.com: fix migration callback]
Link: http://lkml.kernel.org/r/20180105085259.GH2801@dhcp22.suse.cz
[akpm@linux-foundation.org: fix alloc_misplaced_dst_page()]
[mhocko@kernel.org: fix build]
Link: http://lkml.kernel.org/r/20180103091134.GB11319@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20180103082555.14592-3-mhocko@kernel.org
Signed-off-by: Michal Hocko
Reviewed-by: Zi Yan
Cc: Andrea Reale
Cc: Anshuman Khandual
Cc: Kirill A. Shutemov
Cc: Mike Kravetz
Cc: Naoya Horiguchi
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Apr, 2018
1 commit
-
start_isolate_page_range() is used to set the migrate type of a set of
pageblocks to MIGRATE_ISOLATE while attempting to start a migration
operation. It assumes that only one thread is calling it for the
specified range. This routine is used by CMA, memory hotplug and
gigantic huge pages. Each of these users synchronize access to the
range within their subsystem. However, two subsystems (CMA and gigantic
huge pages for example) could attempt operations on the same range. If
this happens, one thread may 'undo' the work another thread is doing.
This can result in pageblocks being incorrectly left marked as
MIGRATE_ISOLATE and therefore not available for page allocation.What is ideally needed is a way to synchronize access to a set of
pageblocks that are undergoing isolation and migration. The only thing
we know about these pageblocks is that they are all in the same zone. A
per-node mutex is too coarse as we want to allow multiple operations on
different ranges within the same zone concurrently. Instead, we will
use the migration type of the pageblocks themselves as a form of
synchronization.start_isolate_page_range sets the migration type on a set of page-
blocks going in order from the one associated with the smallest pfn to
the largest pfn. The zone lock is acquired to check and set the
migration type. When going through the list of pageblocks check if
MIGRATE_ISOLATE is already set. If so, this indicates another thread is
working on this pageblock. We know exactly which pageblocks we set, so
clean up by undo those and return -EBUSY.This allows start_isolate_page_range to serve as a synchronization
mechanism and will allow for more general use of callers making use of
these interfaces. Update comments in alloc_contig_range to reflect this
new functionality.Each CPU holds the associated zone lock to modify or examine the
migration type of a pageblock. And, it will only examine/update a
single pageblock per lock acquire/release cycle.Link: http://lkml.kernel.org/r/20180309224731.16978-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz
Reviewed-by: Andrew Morton
Cc: KAMEZAWA Hiroyuki
Cc: Luiz Capitulino
Cc: Michal Nazarewicz
Cc: Michal Hocko
Cc: Vlastimil Babka
Cc: Mel Gorman
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Nov, 2017
1 commit
-
Joonsoo has noticed that "mm: drop migrate type checks from
has_unmovable_pages" would break CMA allocator because it relies on
has_unmovable_pages returning false even for CMA pageblocks which in
fact don't have to be movable:alloc_contig_range
start_isolate_page_range
set_migratetype_isolate
has_unmovable_pagesThis is a result of the code sharing between CMA and memory hotplug
while each one has a different idea of what has_unmovable_pages should
return. This is unfortunate but fixing it properly would require a lot
of code duplication.Fix the issue by introducing the requested migrate type argument and
special case MIGRATE_CMA case where CMA page blocks are handled
properly. This will work for memory hotplug because it requires
MIGRATE_MOVABLE.Link: http://lkml.kernel.org/r/20171019122118.y6cndierwl2vnguj@dhcp22.suse.cz
Signed-off-by: Michal Hocko
Reported-by: Joonsoo Kim
Tested-by: Stefan Wahren
Tested-by: Ran Wang
Cc: Michael Ellerman
Cc: Vlastimil Babka
Cc: Igor Mammedov
Cc: KAMEZAWA Hiroyuki
Cc: Reza Arbab
Cc: Vitaly Kuznetsov
Cc: Xishi Qiu
Cc: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
11 Jul, 2017
1 commit
-
Commit 394e31d2ceb4 ("mem-hotplug: alloc new page from a nearest
neighbor node when mem-offline") has duplicated a large part of
alloc_migrate_target with some hotplug specific special casing.To be more precise it tried to enfore the allocation from a different
node than the original page. As a result the two function diverged in
their shared logic, e.g. the hugetlb allocation strategy.Let's unify the two and express different NUMA requirements by the given
nodemask. new_node_page will simply exclude the node it doesn't care
about and alloc_migrate_target will use all the available nodes.
alloc_migrate_target will then learn to migrate hugetlb pages more
sanely and use preallocated pool when possible.Please note that alloc_migrate_target used to call alloc_page resp.
alloc_pages_current so the memory policy of the current context which is
quite strange when we consider that it is used in the context of
alloc_contig_range which just tries to migrate pages which stand in the
way.Link: http://lkml.kernel.org/r/20170608074553.22152-4-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Naoya Horiguchi
Cc: Xishi Qiu
Cc: zhong jiang
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Jul, 2017
1 commit
-
__first_valid_page skips over invalid pfns in the range but it might
still stumble over offline pages. At least start_isolate_page_range
will mark those set_migratetype_isolate. This doesn't represent any
immediate AFAICS because alloc_contig_range will fail to isolate those
pages but it relies on not fully initialized page which will become a
problem later when we stop associating offline pages to zones. Use
pfn_to_online_page to handle this.This is more a preparatory patch than a fix.
Link: http://lkml.kernel.org/r/20170515085827.16474-10-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Andi Kleen
Cc: Andrea Arcangeli
Cc: Balbir Singh
Cc: Dan Williams
Cc: Daniel Kiper
Cc: David Rientjes
Cc: Heiko Carstens
Cc: Igor Mammedov
Cc: Jerome Glisse
Cc: Joonsoo Kim
Cc: Martin Schwidefsky
Cc: Mel Gorman
Cc: Reza Arbab
Cc: Tobias Regnery
Cc: Toshi Kani
Cc: Vitaly Kuznetsov
Cc: Xishi Qiu
Cc: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 May, 2017
1 commit
-
When stealing pages from pageblock of a different migratetype, we count
how many free pages were stolen, and change the pageblock's migratetype
if more than half of the pageblock was free. This might be too
conservative, as there might be other pages that are not free, but were
allocated with the same migratetype as our allocation requested.While we cannot determine the migratetype of allocated pages precisely
(at least without the page_owner functionality enabled), we can count
pages that compaction would try to isolate for migration - those are
either on LRU or __PageMovable(). The rest can be assumed to be
MIGRATE_RECLAIMABLE or MIGRATE_UNMOVABLE, which we cannot easily
distinguish. This counting can be done as part of free page stealing
with little additional overhead.The page stealing code is changed so that it considers free pages plus
pages of the "good" migratetype for the decision whether to change
pageblock's migratetype.The result should be more accurate migratetype of pageblocks wrt the
actual pages in the pageblocks, when stealing from semi-occupied
pageblocks. This should help the efficiency of page grouping by
mobility.In testing based on 4.9 kernel with stress-highalloc from mmtests
configured for order-4 GFP_KERNEL allocations, this patch has reduced
the number of unmovable allocations falling back to movable pageblocks
by 47%. The number of movable allocations falling back to other
pageblocks are increased by 55%, but these events don't cause permanent
fragmentation, so the tradeoff should be positive. Later patches also
offset the movable fallback increase to some extent.[akpm@linux-foundation.org: merge fix]
Link: http://lkml.kernel.org/r/20170307131545.28577-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: Mel Gorman
Cc: Johannes Weiner
Cc: Joonsoo Kim
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 May, 2017
1 commit
-
Use is_migrate_isolate_page() to simplify the code, no functional
changes.Link: http://lkml.kernel.org/r/58B94FB1.8020802@huawei.com
Signed-off-by: Xishi Qiu
Acked-by: Michal Hocko
Cc: Vlastimil Babka
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Feb, 2017
2 commits
-
On architectures that allow memory holes, page_is_buddy() has to perform
page_to_pfn() to check for the memory hole. After the previous patch,
we have the pfn already available in __free_one_page(), which is the
only caller of page_is_buddy(), so move the check there and avoid
page_to_pfn().Link: http://lkml.kernel.org/r/20161216120009.20064-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: Mel Gorman
Cc: Joonsoo Kim
Cc: Michal Hocko
Cc: "Kirill A. Shutemov"
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In __free_one_page() we do the buddy merging arithmetics on "page/buddy
index", which is just the lower MAX_ORDER bits of pfn. The operations
we do that affect the higher bits are bitwise AND and subtraction (in
that order), where the final result will be the same with the higher
bits left unmasked, as long as these bits are equal for both buddies -
which must be true by the definition of a buddy.We can therefore use pfn's directly instead of "index" and skip the
zeroing of >MAX_ORDER bits. This can help a bit by itself, although
compiler might be smart enough already. It also helps the next patch to
avoid page_to_pfn() for memory hole checks.Link: http://lkml.kernel.org/r/20161216120009.20064-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: Mel Gorman
Cc: Joonsoo Kim
Cc: Michal Hocko
Cc: "Kirill A. Shutemov"
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Oct, 2016
1 commit
-
Fix typo in comment.
Link: http://lkml.kernel.org/r/1474788764-5774-1-git-send-email-ysxie@foxmail.com
Signed-off-by: Yisheng Xie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jul, 2016
3 commits
-
When there is an isolated_page, post_alloc_hook() is called with page
but __free_pages() is called with isolated_page. Since they are the
same so no problem but it's very confusing. To reduce it, this patch
changes isolated_page to boolean type and uses page variable
consistently.Link: http://lkml.kernel.org/r/1466150259-27727-10-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim
Acked-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch is motivated from Hugh and Vlastimil's concern [1].
There are two ways to get freepage from the allocator. One is using
normal memory allocation API and the other is __isolate_free_page()
which is internally used for compaction and pageblock isolation. Later
usage is rather tricky since it doesn't do whole post allocation
processing done by normal API.One problematic thing I already know is that poisoned page would not be
checked if it is allocated by __isolate_free_page(). Perhaps, there
would be more.We could add more debug logic for allocated page in the future and this
separation would cause more problem. I'd like to fix this situation at
this time. Solution is simple. This patch commonize some logic for
newly allocated page and uses it on all sites. This will solve the
problem.[1] http://marc.info/?i=alpine.LSU.2.11.1604270029350.7066%40eggly.anvils%3E
[iamjoonsoo.kim@lge.com: mm-page_alloc-introduce-post-allocation-processing-on-page-allocator-v3]
Link: http://lkml.kernel.org/r/1464230275-25791-7-git-send-email-iamjoonsoo.kim@lge.com
Link: http://lkml.kernel.org/r/1466150259-27727-9-git-send-email-iamjoonsoo.kim@lge.com
Link: http://lkml.kernel.org/r/1464230275-25791-7-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim
Acked-by: Vlastimil Babka
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Alexander Potapenko
Cc: Hugh Dickins
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It's not necessary to initialized page_owner with holding the zone lock.
It would cause more contention on the zone lock although it's not a big
problem since it is just debug feature. But, it is better than before
so do it. This is also preparation step to use stackdepot in page owner
feature. Stackdepot allocates new pages when there is no reserved space
and holding the zone lock in this case will cause deadlock.Link: http://lkml.kernel.org/r/1464230275-25791-2-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim
Acked-by: Vlastimil Babka
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Alexander Potapenko
Cc: Hugh Dickins
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 May, 2016
2 commits
-
__offline_isolated_pages() and test_pages_isolated() are used by memory
hotplug. These functions require that range is in a single zone but
there is no code to do this because memory hotplug checks it before
calling these functions. To avoid confusing future user of these
functions, this patch adds comments to them.Signed-off-by: Joonsoo Kim
Acked-by: Vlastimil Babka
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: Laura Abbott
Cc: Minchan Kim
Cc: Marek Szyprowski
Cc: Michal Nazarewicz
Cc: "Aneesh Kumar K.V"
Cc: "Rafael J. Wysocki"
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Lots of code does
node = next_node(node, XXX);
if (node == MAX_NUMNODES)
node = first_node(XXX);so create next_node_in() to do this and use it in various places.
[mhocko@suse.com: use next_node_in() helper]
Acked-by: Vlastimil Babka
Acked-by: Michal Hocko
Signed-off-by: Michal Hocko
Cc: Xishi Qiu
Cc: Joonsoo Kim
Cc: David Rientjes
Cc: Naoya Horiguchi
Cc: Laura Abbott
Cc: Hui Zhu
Cc: Wang Xiaoqiang
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Apr, 2016
2 commits
-
Commit fea85cff11de ("mm/page_isolation.c: return last tested pfn rather
than failure indicator") changed the meaning of the return value. Let's
change the function comments as well.Signed-off-by: Neil Zhang
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It is incorrect to use next_node to find a target node, it will return
MAX_NUMNODES or invalid node. This will lead to crash in buddy system
allocation.Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Xishi Qiu
Acked-by: Vlastimil Babka
Acked-by: Naoya Horiguchi
Cc: Joonsoo Kim
Cc: David Rientjes
Cc: "Laura Abbott"
Cc: Hui Zhu
Cc: Wang Xiaoqiang
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds