13 Jul, 2019

1 commit


15 May, 2019

1 commit

  • pfn_valid_within() calls pfn_valid() when CONFIG_HOLES_IN_ZONE making it
    redundant for both definitions (w/wo CONFIG_MEMORY_HOTPLUG) of the helper
    pfn_to_online_page() which either calls pfn_valid() or pfn_valid_within().
    pfn_valid_within() being 1 when !CONFIG_HOLES_IN_ZONE is irrelevant
    either way. This does not change functionality.

    Link: http://lkml.kernel.org/r/1553141595-26907-1-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual
    Reviewed-by: Zi Yan
    Reviewed-by: Oscar Salvador
    Acked-by: Michal Hocko
    Cc: Mike Kravetz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     

30 Mar, 2019

2 commits

  • Due to has_unmovable_pages() taking an incorrect irqsave flag instead of
    the isolation flag in set_migratetype_isolate(), there are issues with
    HWPOSION and error reporting where dump_page() is not called when there
    is an unmovable page.

    Link: http://lkml.kernel.org/r/20190320204941.53731-1-cai@lca.pw
    Fixes: d381c54760dc ("mm: only report isolation failures when offlining memory")
    Acked-by: Michal Hocko
    Reviewed-by: Oscar Salvador
    Signed-off-by: Qian Cai
    Cc: [5.0.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     
  • Commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
    memory to zones until online") introduced move_pfn_range_to_zone() which
    calls memmap_init_zone() during onlining a memory block.
    memmap_init_zone() will reset pagetype flags and makes migrate type to
    be MOVABLE.

    However, in __offline_pages(), it also call undo_isolate_page_range()
    after offline_isolated_pages() to do the same thing. Due to commit
    2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed
    __first_valid_page() to skip offline pages, undo_isolate_page_range()
    here just waste CPU cycles looping around the offlining PFN range while
    doing nothing, because __first_valid_page() will return NULL as
    offline_isolated_pages() has already marked all memory sections within
    the pfn range as offline via offline_mem_sections().

    Also, after calling the "useless" undo_isolate_page_range() here, it
    reaches the point of no returning by notifying MEM_OFFLINE. Those pages
    will be marked as MIGRATE_MOVABLE again once onlining. The only thing
    left to do is to decrease the number of isolated pageblocks zone counter
    which would make some paths of the page allocation slower that the above
    commit introduced.

    Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages
    on ppc64, an "int" should still be enough to represent the number of
    pageblocks there. Fix an incorrect comment along the way.

    [cai@lca.pw: v4]
    Link: http://lkml.kernel.org/r/20190314150641.59358-1-cai@lca.pw
    Link: http://lkml.kernel.org/r/20190313143133.46200-1-cai@lca.pw
    Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages")
    Signed-off-by: Qian Cai
    Acked-by: Michal Hocko
    Reviewed-by: Oscar Salvador
    Cc: Vlastimil Babka
    Cc: [4.13+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     

29 Dec, 2018

1 commit

  • Heiko has complained that his log is swamped by warnings from
    has_unmovable_pages

    [ 20.536664] page dumped because: has_unmovable_pages
    [ 20.536792] page:000003d081ff4080 count:1 mapcount:0 mapping:000000008ff88600 index:0x0 compound_mapcount: 0
    [ 20.536794] flags: 0x3fffe0000010200(slab|head)
    [ 20.536795] raw: 03fffe0000010200 0000000000000100 0000000000000200 000000008ff88600
    [ 20.536796] raw: 0000000000000000 0020004100000000 ffffffff00000001 0000000000000000
    [ 20.536797] page dumped because: has_unmovable_pages
    [ 20.536814] page:000003d0823b0000 count:1 mapcount:0 mapping:0000000000000000 index:0x0
    [ 20.536815] flags: 0x7fffe0000000000()
    [ 20.536817] raw: 07fffe0000000000 0000000000000100 0000000000000200 0000000000000000
    [ 20.536818] raw: 0000000000000000 0000000000000000 ffffffff00000001 0000000000000000

    which are not triggered by the memory hotplug but rather CMA allocator.
    The original idea behind dumping the page state for all call paths was
    that these messages will be helpful debugging failures. From the above it
    seems that this is not the case for the CMA path because we are lacking
    much more context. E.g the second reported page might be a CMA allocated
    page. It is still interesting to see a slab page in the CMA area but it
    is hard to tell whether this is bug from the above output alone.

    Address this issue by dumping the page state only on request. Both
    start_isolate_page_range and has_unmovable_pages already have an argument
    to ignore hwpoison pages so make this argument more generic and turn it
    into flags and allow callers to combine non-default modes into a mask.
    While we are at it, has_unmovable_pages call from
    is_pageblock_removable_nolock (sysfs removable file) is questionable to
    report the failure so drop it from there as well.

    Link: http://lkml.kernel.org/r/20181218092802.31429-1-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reported-by: Heiko Carstens
    Reviewed-by: Oscar Salvador
    Cc: Anshuman Khandual
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

12 Apr, 2018

1 commit

  • No allocation callback is using this argument anymore. new_page_node
    used to use this parameter to convey node_id resp. migration error up
    to move_pages code (do_move_page_to_node_array). The error status never
    made it into the final status field and we have a better way to
    communicate node id to the status field now. All other allocation
    callbacks simply ignored the argument so we can drop it finally.

    [mhocko@suse.com: fix migration callback]
    Link: http://lkml.kernel.org/r/20180105085259.GH2801@dhcp22.suse.cz
    [akpm@linux-foundation.org: fix alloc_misplaced_dst_page()]
    [mhocko@kernel.org: fix build]
    Link: http://lkml.kernel.org/r/20180103091134.GB11319@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/20180103082555.14592-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reviewed-by: Zi Yan
    Cc: Andrea Reale
    Cc: Anshuman Khandual
    Cc: Kirill A. Shutemov
    Cc: Mike Kravetz
    Cc: Naoya Horiguchi
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

06 Apr, 2018

1 commit

  • start_isolate_page_range() is used to set the migrate type of a set of
    pageblocks to MIGRATE_ISOLATE while attempting to start a migration
    operation. It assumes that only one thread is calling it for the
    specified range. This routine is used by CMA, memory hotplug and
    gigantic huge pages. Each of these users synchronize access to the
    range within their subsystem. However, two subsystems (CMA and gigantic
    huge pages for example) could attempt operations on the same range. If
    this happens, one thread may 'undo' the work another thread is doing.
    This can result in pageblocks being incorrectly left marked as
    MIGRATE_ISOLATE and therefore not available for page allocation.

    What is ideally needed is a way to synchronize access to a set of
    pageblocks that are undergoing isolation and migration. The only thing
    we know about these pageblocks is that they are all in the same zone. A
    per-node mutex is too coarse as we want to allow multiple operations on
    different ranges within the same zone concurrently. Instead, we will
    use the migration type of the pageblocks themselves as a form of
    synchronization.

    start_isolate_page_range sets the migration type on a set of page-
    blocks going in order from the one associated with the smallest pfn to
    the largest pfn. The zone lock is acquired to check and set the
    migration type. When going through the list of pageblocks check if
    MIGRATE_ISOLATE is already set. If so, this indicates another thread is
    working on this pageblock. We know exactly which pageblocks we set, so
    clean up by undo those and return -EBUSY.

    This allows start_isolate_page_range to serve as a synchronization
    mechanism and will allow for more general use of callers making use of
    these interfaces. Update comments in alloc_contig_range to reflect this
    new functionality.

    Each CPU holds the associated zone lock to modify or examine the
    migration type of a pageblock. And, it will only examine/update a
    single pageblock per lock acquire/release cycle.

    Link: http://lkml.kernel.org/r/20180309224731.16978-1-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz
    Reviewed-by: Andrew Morton
    Cc: KAMEZAWA Hiroyuki
    Cc: Luiz Capitulino
    Cc: Michal Nazarewicz
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

16 Nov, 2017

1 commit

  • Joonsoo has noticed that "mm: drop migrate type checks from
    has_unmovable_pages" would break CMA allocator because it relies on
    has_unmovable_pages returning false even for CMA pageblocks which in
    fact don't have to be movable:

    alloc_contig_range
    start_isolate_page_range
    set_migratetype_isolate
    has_unmovable_pages

    This is a result of the code sharing between CMA and memory hotplug
    while each one has a different idea of what has_unmovable_pages should
    return. This is unfortunate but fixing it properly would require a lot
    of code duplication.

    Fix the issue by introducing the requested migrate type argument and
    special case MIGRATE_CMA case where CMA page blocks are handled
    properly. This will work for memory hotplug because it requires
    MIGRATE_MOVABLE.

    Link: http://lkml.kernel.org/r/20171019122118.y6cndierwl2vnguj@dhcp22.suse.cz
    Signed-off-by: Michal Hocko
    Reported-by: Joonsoo Kim
    Tested-by: Stefan Wahren
    Tested-by: Ran Wang
    Cc: Michael Ellerman
    Cc: Vlastimil Babka
    Cc: Igor Mammedov
    Cc: KAMEZAWA Hiroyuki
    Cc: Reza Arbab
    Cc: Vitaly Kuznetsov
    Cc: Xishi Qiu
    Cc: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

11 Jul, 2017

1 commit

  • Commit 394e31d2ceb4 ("mem-hotplug: alloc new page from a nearest
    neighbor node when mem-offline") has duplicated a large part of
    alloc_migrate_target with some hotplug specific special casing.

    To be more precise it tried to enfore the allocation from a different
    node than the original page. As a result the two function diverged in
    their shared logic, e.g. the hugetlb allocation strategy.

    Let's unify the two and express different NUMA requirements by the given
    nodemask. new_node_page will simply exclude the node it doesn't care
    about and alloc_migrate_target will use all the available nodes.
    alloc_migrate_target will then learn to migrate hugetlb pages more
    sanely and use preallocated pool when possible.

    Please note that alloc_migrate_target used to call alloc_page resp.
    alloc_pages_current so the memory policy of the current context which is
    quite strange when we consider that it is used in the context of
    alloc_contig_range which just tries to migrate pages which stand in the
    way.

    Link: http://lkml.kernel.org/r/20170608074553.22152-4-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Naoya Horiguchi
    Cc: Xishi Qiu
    Cc: zhong jiang
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

07 Jul, 2017

1 commit

  • __first_valid_page skips over invalid pfns in the range but it might
    still stumble over offline pages. At least start_isolate_page_range
    will mark those set_migratetype_isolate. This doesn't represent any
    immediate AFAICS because alloc_contig_range will fail to isolate those
    pages but it relies on not fully initialized page which will become a
    problem later when we stop associating offline pages to zones. Use
    pfn_to_online_page to handle this.

    This is more a preparatory patch than a fix.

    Link: http://lkml.kernel.org/r/20170515085827.16474-10-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Balbir Singh
    Cc: Dan Williams
    Cc: Daniel Kiper
    Cc: David Rientjes
    Cc: Heiko Carstens
    Cc: Igor Mammedov
    Cc: Jerome Glisse
    Cc: Joonsoo Kim
    Cc: Martin Schwidefsky
    Cc: Mel Gorman
    Cc: Reza Arbab
    Cc: Tobias Regnery
    Cc: Toshi Kani
    Cc: Vitaly Kuznetsov
    Cc: Xishi Qiu
    Cc: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

09 May, 2017

1 commit

  • When stealing pages from pageblock of a different migratetype, we count
    how many free pages were stolen, and change the pageblock's migratetype
    if more than half of the pageblock was free. This might be too
    conservative, as there might be other pages that are not free, but were
    allocated with the same migratetype as our allocation requested.

    While we cannot determine the migratetype of allocated pages precisely
    (at least without the page_owner functionality enabled), we can count
    pages that compaction would try to isolate for migration - those are
    either on LRU or __PageMovable(). The rest can be assumed to be
    MIGRATE_RECLAIMABLE or MIGRATE_UNMOVABLE, which we cannot easily
    distinguish. This counting can be done as part of free page stealing
    with little additional overhead.

    The page stealing code is changed so that it considers free pages plus
    pages of the "good" migratetype for the decision whether to change
    pageblock's migratetype.

    The result should be more accurate migratetype of pageblocks wrt the
    actual pages in the pageblocks, when stealing from semi-occupied
    pageblocks. This should help the efficiency of page grouping by
    mobility.

    In testing based on 4.9 kernel with stress-highalloc from mmtests
    configured for order-4 GFP_KERNEL allocations, this patch has reduced
    the number of unmovable allocations falling back to movable pageblocks
    by 47%. The number of movable allocations falling back to other
    pageblocks are increased by 55%, but these events don't cause permanent
    fragmentation, so the tradeoff should be positive. Later patches also
    offset the movable fallback increase to some extent.

    [akpm@linux-foundation.org: merge fix]
    Link: http://lkml.kernel.org/r/20170307131545.28577-5-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Mel Gorman
    Cc: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

04 May, 2017

1 commit

  • Use is_migrate_isolate_page() to simplify the code, no functional
    changes.

    Link: http://lkml.kernel.org/r/58B94FB1.8020802@huawei.com
    Signed-off-by: Xishi Qiu
    Acked-by: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     

23 Feb, 2017

2 commits

  • On architectures that allow memory holes, page_is_buddy() has to perform
    page_to_pfn() to check for the memory hole. After the previous patch,
    we have the pfn already available in __free_one_page(), which is the
    only caller of page_is_buddy(), so move the check there and avoid
    page_to_pfn().

    Link: http://lkml.kernel.org/r/20161216120009.20064-2-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Mel Gorman
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: "Kirill A. Shutemov"
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • In __free_one_page() we do the buddy merging arithmetics on "page/buddy
    index", which is just the lower MAX_ORDER bits of pfn. The operations
    we do that affect the higher bits are bitwise AND and subtraction (in
    that order), where the final result will be the same with the higher
    bits left unmasked, as long as these bits are equal for both buddies -
    which must be true by the definition of a buddy.

    We can therefore use pfn's directly instead of "index" and skip the
    zeroing of >MAX_ORDER bits. This can help a bit by itself, although
    compiler might be smart enough already. It also helps the next patch to
    avoid page_to_pfn() for memory hole checks.

    Link: http://lkml.kernel.org/r/20161216120009.20064-1-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Mel Gorman
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: "Kirill A. Shutemov"
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

08 Oct, 2016

1 commit


27 Jul, 2016

3 commits

  • When there is an isolated_page, post_alloc_hook() is called with page
    but __free_pages() is called with isolated_page. Since they are the
    same so no problem but it's very confusing. To reduce it, this patch
    changes isolated_page to boolean type and uses page variable
    consistently.

    Link: http://lkml.kernel.org/r/1466150259-27727-10-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • This patch is motivated from Hugh and Vlastimil's concern [1].

    There are two ways to get freepage from the allocator. One is using
    normal memory allocation API and the other is __isolate_free_page()
    which is internally used for compaction and pageblock isolation. Later
    usage is rather tricky since it doesn't do whole post allocation
    processing done by normal API.

    One problematic thing I already know is that poisoned page would not be
    checked if it is allocated by __isolate_free_page(). Perhaps, there
    would be more.

    We could add more debug logic for allocated page in the future and this
    separation would cause more problem. I'd like to fix this situation at
    this time. Solution is simple. This patch commonize some logic for
    newly allocated page and uses it on all sites. This will solve the
    problem.

    [1] http://marc.info/?i=alpine.LSU.2.11.1604270029350.7066%40eggly.anvils%3E

    [iamjoonsoo.kim@lge.com: mm-page_alloc-introduce-post-allocation-processing-on-page-allocator-v3]
    Link: http://lkml.kernel.org/r/1464230275-25791-7-git-send-email-iamjoonsoo.kim@lge.com
    Link: http://lkml.kernel.org/r/1466150259-27727-9-git-send-email-iamjoonsoo.kim@lge.com
    Link: http://lkml.kernel.org/r/1464230275-25791-7-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Alexander Potapenko
    Cc: Hugh Dickins
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • It's not necessary to initialized page_owner with holding the zone lock.
    It would cause more contention on the zone lock although it's not a big
    problem since it is just debug feature. But, it is better than before
    so do it. This is also preparation step to use stackdepot in page owner
    feature. Stackdepot allocates new pages when there is no reserved space
    and holding the zone lock in this case will cause deadlock.

    Link: http://lkml.kernel.org/r/1464230275-25791-2-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Alexander Potapenko
    Cc: Hugh Dickins
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

20 May, 2016

2 commits

  • __offline_isolated_pages() and test_pages_isolated() are used by memory
    hotplug. These functions require that range is in a single zone but
    there is no code to do this because memory hotplug checks it before
    calling these functions. To avoid confusing future user of these
    functions, this patch adds comments to them.

    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Laura Abbott
    Cc: Minchan Kim
    Cc: Marek Szyprowski
    Cc: Michal Nazarewicz
    Cc: "Aneesh Kumar K.V"
    Cc: "Rafael J. Wysocki"
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Lots of code does

    node = next_node(node, XXX);
    if (node == MAX_NUMNODES)
    node = first_node(XXX);

    so create next_node_in() to do this and use it in various places.

    [mhocko@suse.com: use next_node_in() helper]
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Signed-off-by: Michal Hocko
    Cc: Xishi Qiu
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Naoya Horiguchi
    Cc: Laura Abbott
    Cc: Hui Zhu
    Cc: Wang Xiaoqiang
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

02 Apr, 2016

2 commits

  • Commit fea85cff11de ("mm/page_isolation.c: return last tested pfn rather
    than failure indicator") changed the meaning of the return value. Let's
    change the function comments as well.

    Signed-off-by: Neil Zhang
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Zhang
     
  • It is incorrect to use next_node to find a target node, it will return
    MAX_NUMNODES or invalid node. This will lead to crash in buddy system
    allocation.

    Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
    Signed-off-by: Xishi Qiu
    Acked-by: Vlastimil Babka
    Acked-by: Naoya Horiguchi
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: "Laura Abbott"
    Cc: Hui Zhu
    Cc: Wang Xiaoqiang
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     

16 Jan, 2016

1 commit


15 Jan, 2016

3 commits


09 Sep, 2015

2 commits

  • Nowaday, set/unset_migratetype_isolate() is defined and used only in
    mm/page_isolation, so let's limit the scope within the file.

    Signed-off-by: Naoya Horiguchi
    Acked-by: David Rientjes
    Acked-by: Vlastimil Babka
    Cc: Joonsoo Kim
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • The __test_page_isolated_in_pageblock() is used to verify whether all
    pages in pageblock were either successfully isolated, or are hwpoisoned.
    Two of the possible state of pages, that are tested, are however bogus
    and misleading.

    Both tests rely on get_freepage_migratetype(page), which however has no
    guarantees about pages on freelists. Specifically, it doesn't guarantee
    that the migratetype returned by the function actually matches the
    migratetype of the freelist that the page is on. Such guarantee is not
    its purpose and would have negative impact on allocator performance.

    The first test checks whether the freepage_migratetype equals
    MIGRATE_ISOLATE, supposedly to catch races between page isolation and
    allocator activity. These races should be fixed nowadays with
    51bb1a4093 ("mm/page_alloc: add freepage on isolate pageblock to correct
    buddy list") and related patches. As explained above, the check
    wouldn't be able to catch them reliably anyway. For the same reason
    false positives can happen, although they are harmless, as the
    move_freepages() call would just move the page to the same freelist it's
    already on. So removing the test is not a bug fix, just cleanup. After
    this patch, we assume that all PageBuddy pages are on the correct
    freelist and that the races were really fixed. A truly reliable
    verification in the form of e.g. VM_BUG_ON() would be complicated and
    is arguably not needed.

    The second test (page_count(page) == 0 && get_freepage_migratetype(page)
    == MIGRATE_ISOLATE) is probably supposed (the code comes from a big
    memory isolation patch from 2007) to catch pages on MIGRATE_ISOLATE
    pcplists. However, pcplists don't contain MIGRATE_ISOLATE freepages
    nowadays, those are freed directly to free lists, so the check is
    obsolete. Remove it as well.

    Signed-off-by: Vlastimil Babka
    Acked-by: Joonsoo Kim
    Cc: Minchan Kim
    Acked-by: Michal Nazarewicz
    Cc: Laura Abbott
    Reviewed-by: Naoya Horiguchi
    Cc: Seungho Park
    Cc: Johannes Weiner
    Cc: "Kirill A. Shutemov"
    Acked-by: Mel Gorman
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

15 May, 2015

1 commit

  • I had an issue:

    Unable to handle kernel NULL pointer dereference at virtual address 0000082a
    pgd = cc970000
    [0000082a] *pgd=00000000
    Internal error: Oops: 5 [#1] PREEMPT SMP ARM
    PC is at get_pageblock_flags_group+0x5c/0xb0
    LR is at unset_migratetype_isolate+0x148/0x1b0
    pc : [] lr : [] psr: 80000093
    sp : c7029d00 ip : 00000105 fp : c7029d1c
    r10: 00000001 r9 : 0000000a r8 : 00000004
    r7 : 60000013 r6 : 000000a4 r5 : c0a357e4 r4 : 00000000
    r3 : 00000826 r2 : 00000002 r1 : 00000000 r0 : 0000003f
    Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
    Control: 10c5387d Table: 2cb7006a DAC: 00000015
    Backtrace:
    get_pageblock_flags_group+0x0/0xb0
    unset_migratetype_isolate+0x0/0x1b0
    undo_isolate_page_range+0x0/0xdc
    __alloc_contig_range+0x0/0x34c
    alloc_contig_range+0x0/0x18

    This issue is because when calling unset_migratetype_isolate() to unset
    a part of CMA memory, it try to access the buddy page to get its status:

    if (order >= pageblock_order) {
    page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
    buddy_idx = __find_buddy_index(page_idx, order);
    buddy = page + (buddy_idx - page_idx);

    if (!is_migrate_isolate_page(buddy)) {

    But the begin addr of this part of CMA memory is very close to a part of
    memory that is reserved at boot time (not in buddy system). So add a
    check before accessing it.

    [akpm@linux-foundation.org: use conventional code layout]
    Signed-off-by: Hui Zhu
    Suggested-by: Laura Abbott
    Suggested-by: Joonsoo Kim
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     

26 Mar, 2015

1 commit

  • Commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on
    isolated pageblock") changed the logic of unset_migratetype_isolate to
    check the buddy allocator and explicitly call __free_pages to merge.

    The page that is being freed in this path never had prep_new_page called
    so set_page_refcounted is called explicitly but there is no call to
    kernel_map_pages. With the default kernel_map_pages this is mostly
    harmless but if kernel_map_pages does any manipulation of the page
    tables (unmapping or setting pages to read only) this may trigger a
    fault:

    alloc_contig_range test_pages_isolated(ceb00, ced00) failed
    Unable to handle kernel paging request at virtual address ffffffc0cec00000
    pgd = ffffffc045fc4000
    [ffffffc0cec00000] *pgd=0000000000000000
    Internal error: Oops: 9600004f [#1] PREEMPT SMP
    Modules linked in: exfatfs
    CPU: 1 PID: 23237 Comm: TimedEventQueue Not tainted 3.10.49-gc72ad36-dirty #1
    task: ffffffc03de52100 ti: ffffffc015388000 task.ti: ffffffc015388000
    PC is at memset+0xc8/0x1c0
    LR is at kernel_map_pages+0x1ec/0x244

    Fix this by calling kernel_map_pages to ensure the page is set in the
    page table properly

    Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
    Signed-off-by: Laura Abbott
    Cc: Naoya Horiguchi
    Cc: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Xishi Qiu
    Cc: Vladimir Davydov
    Acked-by: Joonsoo Kim
    Cc: Gioh Kim
    Cc: Michal Nazarewicz
    Cc: Marek Szyprowski
    Cc: Vlastimil Babka
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Laura Abbott
     

11 Dec, 2014

2 commits

  • When setting MIGRATETYPE_ISOLATE on a pageblock, pcplists are drained to
    have a better chance that all pages will be successfully isolated and
    not left in the per-cpu caches. Since isolation is always concerned
    with a single zone, we can reduce the pcplists drain to the single zone,
    which is now possible.

    The change should make memory isolation faster and not disturbing
    unrelated pcplists anymore.

    Signed-off-by: Vlastimil Babka
    Cc: Naoya Horiguchi
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Xishi Qiu
    Cc: Vladimir Davydov
    Cc: Joonsoo Kim
    Cc: Michal Nazarewicz
    Cc: Marek Szyprowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • The functions for draining per-cpu pages back to buddy allocators
    currently always operate on all zones. There are however several cases
    where the drain is only needed in the context of a single zone, and
    spilling other pcplists is a waste of time both due to the extra
    spilling and later refilling.

    This patch introduces new zone pointer parameter to drain_all_pages()
    and changes the dummy parameter of drain_local_pages() to be also a zone
    pointer. When NULL is passed, the functions operate on all zones as
    usual. Passing a specific zone pointer reduces the work to the single
    zone.

    All callers are updated to pass the NULL pointer in this patch.
    Conversion to single zone (where appropriate) is done in further
    patches.

    Signed-off-by: Vlastimil Babka
    Cc: Naoya Horiguchi
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Xishi Qiu
    Cc: Vladimir Davydov
    Cc: Joonsoo Kim
    Cc: Michal Nazarewicz
    Cc: Marek Szyprowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

14 Nov, 2014

2 commits

  • Current pageblock isolation logic could isolate each pageblock
    individually. This causes freepage accounting problem if freepage with
    pageblock order on isolate pageblock is merged with other freepage on
    normal pageblock. We can prevent merging by restricting max order of
    merging to pageblock order if freepage is on isolate pageblock.

    A side-effect of this change is that there could be non-merged buddy
    freepage even if finishing pageblock isolation, because undoing
    pageblock isolation is just to move freepage from isolate buddy list to
    normal buddy list rather than to consider merging. So, the patch also
    makes undoing pageblock isolation consider freepage merge. When
    un-isolation, freepage with more than pageblock order and it's buddy are
    checked. If they are on normal pageblock, instead of just moving, we
    isolate the freepage and free it in order to get merged.

    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Tang Chen
    Cc: Naoya Horiguchi
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Wen Congyang
    Cc: Marek Szyprowski
    Cc: Michal Nazarewicz
    Cc: Laura Abbott
    Cc: Heesub Shin
    Cc: "Aneesh Kumar K.V"
    Cc: Ritesh Harjani
    Cc: Gioh Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Before describing bugs itself, I first explain definition of freepage.

    1. pages on buddy list are counted as freepage.
    2. pages on isolate migratetype buddy list are *not* counted as freepage.
    3. pages on cma buddy list are counted as CMA freepage, too.

    Now, I describe problems and related patch.

    Patch 1: There is race conditions on getting pageblock migratetype that
    it results in misplacement of freepages on buddy list, incorrect
    freepage count and un-availability of freepage.

    Patch 2: Freepages on pcp list could have stale cached information to
    determine migratetype of buddy list to go. This causes misplacement of
    freepages on buddy list and incorrect freepage count.

    Patch 4: Merging between freepages on different migratetype of
    pageblocks will cause freepages accouting problem. This patch fixes it.

    Without patchset [3], above problem doesn't happens on my CMA allocation
    test, because CMA reserved pages aren't used at all. So there is no
    chance for above race.

    With patchset [3], I did simple CMA allocation test and get below
    result:

    - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
    - run kernel build (make -j16) on background
    - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
    - Result: more than 5000 freepage count are missed

    With patchset [3] and this patchset, I found that no freepage count are
    missed so that I conclude that problems are solved.

    On my simple memory offlining test, these problems also occur on that
    environment, too.

    This patch (of 4):

    There are two paths to reach core free function of buddy allocator,
    __free_one_page(), one is free_one_page()->__free_one_page() and the
    other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
    Each paths has race condition causing serious problems. At first, this
    patch is focused on first type of freepath. And then, following patch
    will solve the problem in second type of freepath.

    In the first type of freepath, we got migratetype of freeing page
    without holding the zone lock, so it could be racy. There are two cases
    of this race.

    1. pages are added to isolate buddy list after restoring orignal
    migratetype

    CPU1 CPU2

    get migratetype => return MIGRATE_ISOLATE
    call free_one_page() with MIGRATE_ISOLATE

    grab the zone lock
    unisolate pageblock
    release the zone lock

    grab the zone lock
    call __free_one_page() with MIGRATE_ISOLATE
    freepage go into isolate buddy list,
    although pageblock is already unisolated

    This may cause two problems. One is that we can't use this page anymore
    until next isolation attempt of this pageblock, because freepage is on
    isolate buddy list. The other is that freepage accouting could be wrong
    due to merging between different buddy list. Freepages on isolate buddy
    list aren't counted as freepage, but ones on normal buddy list are
    counted as freepage. If merge happens, buddy freepage on normal buddy
    list is inevitably moved to isolate buddy list without any consideration
    of freepage accouting so it could be incorrect.

    2. pages are added to normal buddy list while pageblock is isolated.
    It is similar with above case.

    This also may cause two problems. One is that we can't keep these
    freepages from being allocated. Although this pageblock is isolated,
    freepage would be added to normal buddy list so that it could be
    allocated without any restriction. And the other problem is same as
    case 1, that it, incorrect freepage accouting.

    This race condition would be prevented by checking migratetype again
    with holding the zone lock. Because it is somewhat heavy operation and
    it isn't needed in common case, we want to avoid rechecking as much as
    possible. So this patch introduce new variable, nr_isolate_pageblock in
    struct zone to check if there is isolated pageblock. With this, we can
    avoid to re-check migratetype in common case and do it only if there is
    isolated pageblock or migratetype is MIGRATE_ISOLATE. This solve above
    mentioned problems.

    Changes from v3:
    Add one more check in free_one_page() that checks whether migratetype is
    MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens.

    Signed-off-by: Joonsoo Kim
    Acked-by: Minchan Kim
    Acked-by: Michal Nazarewicz
    Acked-by: Vlastimil Babka
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Tang Chen
    Cc: Naoya Horiguchi
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Wen Congyang
    Cc: Marek Szyprowski
    Cc: Laura Abbott
    Cc: Heesub Shin
    Cc: "Aneesh Kumar K.V"
    Cc: Ritesh Harjani
    Cc: Gioh Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

12 Sep, 2013

1 commit

  • Until now we can't offline memory blocks which contain hugepages because a
    hugepage is considered as an unmovable page. But now with this patch
    series, a hugepage has become movable, so by using hugepage migration we
    can offline such memory blocks.

    What's different from other users of hugepage migration is that we need to
    decompose all the hugepages inside the target memory block into free buddy
    pages after hugepage migration, because otherwise free hugepages remaining
    in the memory block intervene the memory offlining. For this reason we
    introduce new functions dissolve_free_huge_page() and
    dissolve_free_huge_pages().

    Other than that, what this patch does is straightforwardly to add hugepage
    migration code, that is, adding hugepage code to the functions which scan
    over pfn and collect hugepages to be migrated, and adding a hugepage
    allocation function to alloc_migrate_target().

    As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
    over them because it's larger than memory block. So we now simply leave
    it to fail as it is.

    [yongjun_wei@trendmicro.com.cn: remove duplicated include]
    Signed-off-by: Naoya Horiguchi
    Acked-by: Andi Kleen
    Cc: Hillf Danton
    Cc: Wanpeng Li
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: KOSAKI Motohiro
    Cc: Michal Hocko
    Cc: Rik van Riel
    Cc: "Aneesh Kumar K.V"
    Signed-off-by: Wei Yongjun
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

20 Aug, 2013

1 commit


05 Jan, 2013

1 commit

  • Commit 702d1a6e0766 ("memory-hotplug: fix kswapd looping forever
    problem") added an isolated pageblocks counter (nr_pageblock_isolate in
    struct zone) and used it to adjust free pages counter in
    zone_watermark_ok_safe() to prevent kswapd looping forever problem.

    Then later, commit 2139cbe627b8 ("cma: fix counting of isolated pages")
    fixed accounting of isolated pages in global free pages counter. It
    made the previous zone_watermark_ok_safe() fix unnecessary and
    potentially harmful (cause now isolated pages may be accounted twice
    making free pages counter incorrect).

    This patch removes the special isolated pageblocks counter altogether
    which fixes zone_watermark_ok_safe() free pages check.

    Reported-by: Tomasz Stanislawski
    Signed-off-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Kyungmin Park
    Cc: Minchan Kim
    Cc: KOSAKI Motohiro
    Cc: Aaditya Kumar
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Marek Szyprowski
    Cc: Michal Nazarewicz
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bartlomiej Zolnierkiewicz
     

12 Dec, 2012

1 commit

  • hwpoisoned may be set when we offline a page by the sysfs interface
    /sys/devices/system/memory/soft_offline_page or
    /sys/devices/system/memory/hard_offline_page. If we don't clear
    this flag when onlining pages, this page can't be freed, and will
    not in free list. So we can't offline these pages again. So we
    should skip such page when offlining pages.

    Signed-off-by: Wen Congyang
    Cc: David Rientjes
    Cc: Jiang Liu
    Cc: Len Brown
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Christoph Lameter
    Cc: Minchan Kim
    Cc: KOSAKI Motohiro
    Cc: Yasuaki Ishimatsu
    Cc: Andi Kleen
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wen Congyang
     

09 Oct, 2012

1 commit

  • __alloc_contig_migrate_alloc() can be used by memory-hotplug so refactor
    it out (move + rename as a common name) into page_isolation.c.

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Minchan Kim
    Cc: Kamezawa Hiroyuki
    Reviewed-by: Yasuaki Ishimatsu
    Acked-by: Michal Nazarewicz
    Cc: Marek Szyprowski
    Cc: Wen Congyang
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim