08 Oct, 2016

1 commit


27 Jul, 2016

3 commits

  • When there is an isolated_page, post_alloc_hook() is called with page
    but __free_pages() is called with isolated_page. Since they are the
    same so no problem but it's very confusing. To reduce it, this patch
    changes isolated_page to boolean type and uses page variable
    consistently.

    Link: http://lkml.kernel.org/r/1466150259-27727-10-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • This patch is motivated from Hugh and Vlastimil's concern [1].

    There are two ways to get freepage from the allocator. One is using
    normal memory allocation API and the other is __isolate_free_page()
    which is internally used for compaction and pageblock isolation. Later
    usage is rather tricky since it doesn't do whole post allocation
    processing done by normal API.

    One problematic thing I already know is that poisoned page would not be
    checked if it is allocated by __isolate_free_page(). Perhaps, there
    would be more.

    We could add more debug logic for allocated page in the future and this
    separation would cause more problem. I'd like to fix this situation at
    this time. Solution is simple. This patch commonize some logic for
    newly allocated page and uses it on all sites. This will solve the
    problem.

    [1] http://marc.info/?i=alpine.LSU.2.11.1604270029350.7066%40eggly.anvils%3E

    [iamjoonsoo.kim@lge.com: mm-page_alloc-introduce-post-allocation-processing-on-page-allocator-v3]
    Link: http://lkml.kernel.org/r/1464230275-25791-7-git-send-email-iamjoonsoo.kim@lge.com
    Link: http://lkml.kernel.org/r/1466150259-27727-9-git-send-email-iamjoonsoo.kim@lge.com
    Link: http://lkml.kernel.org/r/1464230275-25791-7-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Alexander Potapenko
    Cc: Hugh Dickins
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • It's not necessary to initialized page_owner with holding the zone lock.
    It would cause more contention on the zone lock although it's not a big
    problem since it is just debug feature. But, it is better than before
    so do it. This is also preparation step to use stackdepot in page owner
    feature. Stackdepot allocates new pages when there is no reserved space
    and holding the zone lock in this case will cause deadlock.

    Link: http://lkml.kernel.org/r/1464230275-25791-2-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Alexander Potapenko
    Cc: Hugh Dickins
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

20 May, 2016

2 commits

  • __offline_isolated_pages() and test_pages_isolated() are used by memory
    hotplug. These functions require that range is in a single zone but
    there is no code to do this because memory hotplug checks it before
    calling these functions. To avoid confusing future user of these
    functions, this patch adds comments to them.

    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Laura Abbott
    Cc: Minchan Kim
    Cc: Marek Szyprowski
    Cc: Michal Nazarewicz
    Cc: "Aneesh Kumar K.V"
    Cc: "Rafael J. Wysocki"
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Lots of code does

    node = next_node(node, XXX);
    if (node == MAX_NUMNODES)
    node = first_node(XXX);

    so create next_node_in() to do this and use it in various places.

    [mhocko@suse.com: use next_node_in() helper]
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Signed-off-by: Michal Hocko
    Cc: Xishi Qiu
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Naoya Horiguchi
    Cc: Laura Abbott
    Cc: Hui Zhu
    Cc: Wang Xiaoqiang
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

02 Apr, 2016

2 commits

  • Commit fea85cff11de ("mm/page_isolation.c: return last tested pfn rather
    than failure indicator") changed the meaning of the return value. Let's
    change the function comments as well.

    Signed-off-by: Neil Zhang
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Zhang
     
  • It is incorrect to use next_node to find a target node, it will return
    MAX_NUMNODES or invalid node. This will lead to crash in buddy system
    allocation.

    Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
    Signed-off-by: Xishi Qiu
    Acked-by: Vlastimil Babka
    Acked-by: Naoya Horiguchi
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: "Laura Abbott"
    Cc: Hui Zhu
    Cc: Wang Xiaoqiang
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     

16 Jan, 2016

1 commit


15 Jan, 2016

3 commits


09 Sep, 2015

2 commits

  • Nowaday, set/unset_migratetype_isolate() is defined and used only in
    mm/page_isolation, so let's limit the scope within the file.

    Signed-off-by: Naoya Horiguchi
    Acked-by: David Rientjes
    Acked-by: Vlastimil Babka
    Cc: Joonsoo Kim
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • The __test_page_isolated_in_pageblock() is used to verify whether all
    pages in pageblock were either successfully isolated, or are hwpoisoned.
    Two of the possible state of pages, that are tested, are however bogus
    and misleading.

    Both tests rely on get_freepage_migratetype(page), which however has no
    guarantees about pages on freelists. Specifically, it doesn't guarantee
    that the migratetype returned by the function actually matches the
    migratetype of the freelist that the page is on. Such guarantee is not
    its purpose and would have negative impact on allocator performance.

    The first test checks whether the freepage_migratetype equals
    MIGRATE_ISOLATE, supposedly to catch races between page isolation and
    allocator activity. These races should be fixed nowadays with
    51bb1a4093 ("mm/page_alloc: add freepage on isolate pageblock to correct
    buddy list") and related patches. As explained above, the check
    wouldn't be able to catch them reliably anyway. For the same reason
    false positives can happen, although they are harmless, as the
    move_freepages() call would just move the page to the same freelist it's
    already on. So removing the test is not a bug fix, just cleanup. After
    this patch, we assume that all PageBuddy pages are on the correct
    freelist and that the races were really fixed. A truly reliable
    verification in the form of e.g. VM_BUG_ON() would be complicated and
    is arguably not needed.

    The second test (page_count(page) == 0 && get_freepage_migratetype(page)
    == MIGRATE_ISOLATE) is probably supposed (the code comes from a big
    memory isolation patch from 2007) to catch pages on MIGRATE_ISOLATE
    pcplists. However, pcplists don't contain MIGRATE_ISOLATE freepages
    nowadays, those are freed directly to free lists, so the check is
    obsolete. Remove it as well.

    Signed-off-by: Vlastimil Babka
    Acked-by: Joonsoo Kim
    Cc: Minchan Kim
    Acked-by: Michal Nazarewicz
    Cc: Laura Abbott
    Reviewed-by: Naoya Horiguchi
    Cc: Seungho Park
    Cc: Johannes Weiner
    Cc: "Kirill A. Shutemov"
    Acked-by: Mel Gorman
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

15 May, 2015

1 commit

  • I had an issue:

    Unable to handle kernel NULL pointer dereference at virtual address 0000082a
    pgd = cc970000
    [0000082a] *pgd=00000000
    Internal error: Oops: 5 [#1] PREEMPT SMP ARM
    PC is at get_pageblock_flags_group+0x5c/0xb0
    LR is at unset_migratetype_isolate+0x148/0x1b0
    pc : [] lr : [] psr: 80000093
    sp : c7029d00 ip : 00000105 fp : c7029d1c
    r10: 00000001 r9 : 0000000a r8 : 00000004
    r7 : 60000013 r6 : 000000a4 r5 : c0a357e4 r4 : 00000000
    r3 : 00000826 r2 : 00000002 r1 : 00000000 r0 : 0000003f
    Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
    Control: 10c5387d Table: 2cb7006a DAC: 00000015
    Backtrace:
    get_pageblock_flags_group+0x0/0xb0
    unset_migratetype_isolate+0x0/0x1b0
    undo_isolate_page_range+0x0/0xdc
    __alloc_contig_range+0x0/0x34c
    alloc_contig_range+0x0/0x18

    This issue is because when calling unset_migratetype_isolate() to unset
    a part of CMA memory, it try to access the buddy page to get its status:

    if (order >= pageblock_order) {
    page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
    buddy_idx = __find_buddy_index(page_idx, order);
    buddy = page + (buddy_idx - page_idx);

    if (!is_migrate_isolate_page(buddy)) {

    But the begin addr of this part of CMA memory is very close to a part of
    memory that is reserved at boot time (not in buddy system). So add a
    check before accessing it.

    [akpm@linux-foundation.org: use conventional code layout]
    Signed-off-by: Hui Zhu
    Suggested-by: Laura Abbott
    Suggested-by: Joonsoo Kim
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     

26 Mar, 2015

1 commit

  • Commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on
    isolated pageblock") changed the logic of unset_migratetype_isolate to
    check the buddy allocator and explicitly call __free_pages to merge.

    The page that is being freed in this path never had prep_new_page called
    so set_page_refcounted is called explicitly but there is no call to
    kernel_map_pages. With the default kernel_map_pages this is mostly
    harmless but if kernel_map_pages does any manipulation of the page
    tables (unmapping or setting pages to read only) this may trigger a
    fault:

    alloc_contig_range test_pages_isolated(ceb00, ced00) failed
    Unable to handle kernel paging request at virtual address ffffffc0cec00000
    pgd = ffffffc045fc4000
    [ffffffc0cec00000] *pgd=0000000000000000
    Internal error: Oops: 9600004f [#1] PREEMPT SMP
    Modules linked in: exfatfs
    CPU: 1 PID: 23237 Comm: TimedEventQueue Not tainted 3.10.49-gc72ad36-dirty #1
    task: ffffffc03de52100 ti: ffffffc015388000 task.ti: ffffffc015388000
    PC is at memset+0xc8/0x1c0
    LR is at kernel_map_pages+0x1ec/0x244

    Fix this by calling kernel_map_pages to ensure the page is set in the
    page table properly

    Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
    Signed-off-by: Laura Abbott
    Cc: Naoya Horiguchi
    Cc: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Xishi Qiu
    Cc: Vladimir Davydov
    Acked-by: Joonsoo Kim
    Cc: Gioh Kim
    Cc: Michal Nazarewicz
    Cc: Marek Szyprowski
    Cc: Vlastimil Babka
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Laura Abbott
     

11 Dec, 2014

2 commits

  • When setting MIGRATETYPE_ISOLATE on a pageblock, pcplists are drained to
    have a better chance that all pages will be successfully isolated and
    not left in the per-cpu caches. Since isolation is always concerned
    with a single zone, we can reduce the pcplists drain to the single zone,
    which is now possible.

    The change should make memory isolation faster and not disturbing
    unrelated pcplists anymore.

    Signed-off-by: Vlastimil Babka
    Cc: Naoya Horiguchi
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Xishi Qiu
    Cc: Vladimir Davydov
    Cc: Joonsoo Kim
    Cc: Michal Nazarewicz
    Cc: Marek Szyprowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • The functions for draining per-cpu pages back to buddy allocators
    currently always operate on all zones. There are however several cases
    where the drain is only needed in the context of a single zone, and
    spilling other pcplists is a waste of time both due to the extra
    spilling and later refilling.

    This patch introduces new zone pointer parameter to drain_all_pages()
    and changes the dummy parameter of drain_local_pages() to be also a zone
    pointer. When NULL is passed, the functions operate on all zones as
    usual. Passing a specific zone pointer reduces the work to the single
    zone.

    All callers are updated to pass the NULL pointer in this patch.
    Conversion to single zone (where appropriate) is done in further
    patches.

    Signed-off-by: Vlastimil Babka
    Cc: Naoya Horiguchi
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Xishi Qiu
    Cc: Vladimir Davydov
    Cc: Joonsoo Kim
    Cc: Michal Nazarewicz
    Cc: Marek Szyprowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

14 Nov, 2014

2 commits

  • Current pageblock isolation logic could isolate each pageblock
    individually. This causes freepage accounting problem if freepage with
    pageblock order on isolate pageblock is merged with other freepage on
    normal pageblock. We can prevent merging by restricting max order of
    merging to pageblock order if freepage is on isolate pageblock.

    A side-effect of this change is that there could be non-merged buddy
    freepage even if finishing pageblock isolation, because undoing
    pageblock isolation is just to move freepage from isolate buddy list to
    normal buddy list rather than to consider merging. So, the patch also
    makes undoing pageblock isolation consider freepage merge. When
    un-isolation, freepage with more than pageblock order and it's buddy are
    checked. If they are on normal pageblock, instead of just moving, we
    isolate the freepage and free it in order to get merged.

    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Tang Chen
    Cc: Naoya Horiguchi
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Wen Congyang
    Cc: Marek Szyprowski
    Cc: Michal Nazarewicz
    Cc: Laura Abbott
    Cc: Heesub Shin
    Cc: "Aneesh Kumar K.V"
    Cc: Ritesh Harjani
    Cc: Gioh Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Before describing bugs itself, I first explain definition of freepage.

    1. pages on buddy list are counted as freepage.
    2. pages on isolate migratetype buddy list are *not* counted as freepage.
    3. pages on cma buddy list are counted as CMA freepage, too.

    Now, I describe problems and related patch.

    Patch 1: There is race conditions on getting pageblock migratetype that
    it results in misplacement of freepages on buddy list, incorrect
    freepage count and un-availability of freepage.

    Patch 2: Freepages on pcp list could have stale cached information to
    determine migratetype of buddy list to go. This causes misplacement of
    freepages on buddy list and incorrect freepage count.

    Patch 4: Merging between freepages on different migratetype of
    pageblocks will cause freepages accouting problem. This patch fixes it.

    Without patchset [3], above problem doesn't happens on my CMA allocation
    test, because CMA reserved pages aren't used at all. So there is no
    chance for above race.

    With patchset [3], I did simple CMA allocation test and get below
    result:

    - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
    - run kernel build (make -j16) on background
    - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
    - Result: more than 5000 freepage count are missed

    With patchset [3] and this patchset, I found that no freepage count are
    missed so that I conclude that problems are solved.

    On my simple memory offlining test, these problems also occur on that
    environment, too.

    This patch (of 4):

    There are two paths to reach core free function of buddy allocator,
    __free_one_page(), one is free_one_page()->__free_one_page() and the
    other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
    Each paths has race condition causing serious problems. At first, this
    patch is focused on first type of freepath. And then, following patch
    will solve the problem in second type of freepath.

    In the first type of freepath, we got migratetype of freeing page
    without holding the zone lock, so it could be racy. There are two cases
    of this race.

    1. pages are added to isolate buddy list after restoring orignal
    migratetype

    CPU1 CPU2

    get migratetype => return MIGRATE_ISOLATE
    call free_one_page() with MIGRATE_ISOLATE

    grab the zone lock
    unisolate pageblock
    release the zone lock

    grab the zone lock
    call __free_one_page() with MIGRATE_ISOLATE
    freepage go into isolate buddy list,
    although pageblock is already unisolated

    This may cause two problems. One is that we can't use this page anymore
    until next isolation attempt of this pageblock, because freepage is on
    isolate buddy list. The other is that freepage accouting could be wrong
    due to merging between different buddy list. Freepages on isolate buddy
    list aren't counted as freepage, but ones on normal buddy list are
    counted as freepage. If merge happens, buddy freepage on normal buddy
    list is inevitably moved to isolate buddy list without any consideration
    of freepage accouting so it could be incorrect.

    2. pages are added to normal buddy list while pageblock is isolated.
    It is similar with above case.

    This also may cause two problems. One is that we can't keep these
    freepages from being allocated. Although this pageblock is isolated,
    freepage would be added to normal buddy list so that it could be
    allocated without any restriction. And the other problem is same as
    case 1, that it, incorrect freepage accouting.

    This race condition would be prevented by checking migratetype again
    with holding the zone lock. Because it is somewhat heavy operation and
    it isn't needed in common case, we want to avoid rechecking as much as
    possible. So this patch introduce new variable, nr_isolate_pageblock in
    struct zone to check if there is isolated pageblock. With this, we can
    avoid to re-check migratetype in common case and do it only if there is
    isolated pageblock or migratetype is MIGRATE_ISOLATE. This solve above
    mentioned problems.

    Changes from v3:
    Add one more check in free_one_page() that checks whether migratetype is
    MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens.

    Signed-off-by: Joonsoo Kim
    Acked-by: Minchan Kim
    Acked-by: Michal Nazarewicz
    Acked-by: Vlastimil Babka
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Tang Chen
    Cc: Naoya Horiguchi
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Wen Congyang
    Cc: Marek Szyprowski
    Cc: Laura Abbott
    Cc: Heesub Shin
    Cc: "Aneesh Kumar K.V"
    Cc: Ritesh Harjani
    Cc: Gioh Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

12 Sep, 2013

1 commit

  • Until now we can't offline memory blocks which contain hugepages because a
    hugepage is considered as an unmovable page. But now with this patch
    series, a hugepage has become movable, so by using hugepage migration we
    can offline such memory blocks.

    What's different from other users of hugepage migration is that we need to
    decompose all the hugepages inside the target memory block into free buddy
    pages after hugepage migration, because otherwise free hugepages remaining
    in the memory block intervene the memory offlining. For this reason we
    introduce new functions dissolve_free_huge_page() and
    dissolve_free_huge_pages().

    Other than that, what this patch does is straightforwardly to add hugepage
    migration code, that is, adding hugepage code to the functions which scan
    over pfn and collect hugepages to be migrated, and adding a hugepage
    allocation function to alloc_migrate_target().

    As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
    over them because it's larger than memory block. So we now simply leave
    it to fail as it is.

    [yongjun_wei@trendmicro.com.cn: remove duplicated include]
    Signed-off-by: Naoya Horiguchi
    Acked-by: Andi Kleen
    Cc: Hillf Danton
    Cc: Wanpeng Li
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: KOSAKI Motohiro
    Cc: Michal Hocko
    Cc: Rik van Riel
    Cc: "Aneesh Kumar K.V"
    Signed-off-by: Wei Yongjun
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

20 Aug, 2013

1 commit


05 Jan, 2013

1 commit

  • Commit 702d1a6e0766 ("memory-hotplug: fix kswapd looping forever
    problem") added an isolated pageblocks counter (nr_pageblock_isolate in
    struct zone) and used it to adjust free pages counter in
    zone_watermark_ok_safe() to prevent kswapd looping forever problem.

    Then later, commit 2139cbe627b8 ("cma: fix counting of isolated pages")
    fixed accounting of isolated pages in global free pages counter. It
    made the previous zone_watermark_ok_safe() fix unnecessary and
    potentially harmful (cause now isolated pages may be accounted twice
    making free pages counter incorrect).

    This patch removes the special isolated pageblocks counter altogether
    which fixes zone_watermark_ok_safe() free pages check.

    Reported-by: Tomasz Stanislawski
    Signed-off-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Kyungmin Park
    Cc: Minchan Kim
    Cc: KOSAKI Motohiro
    Cc: Aaditya Kumar
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Marek Szyprowski
    Cc: Michal Nazarewicz
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bartlomiej Zolnierkiewicz
     

12 Dec, 2012

1 commit

  • hwpoisoned may be set when we offline a page by the sysfs interface
    /sys/devices/system/memory/soft_offline_page or
    /sys/devices/system/memory/hard_offline_page. If we don't clear
    this flag when onlining pages, this page can't be freed, and will
    not in free list. So we can't offline these pages again. So we
    should skip such page when offlining pages.

    Signed-off-by: Wen Congyang
    Cc: David Rientjes
    Cc: Jiang Liu
    Cc: Len Brown
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Christoph Lameter
    Cc: Minchan Kim
    Cc: KOSAKI Motohiro
    Cc: Yasuaki Ishimatsu
    Cc: Andi Kleen
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wen Congyang
     

09 Oct, 2012

6 commits

  • __alloc_contig_migrate_alloc() can be used by memory-hotplug so refactor
    it out (move + rename as a common name) into page_isolation.c.

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Minchan Kim
    Cc: Kamezawa Hiroyuki
    Reviewed-by: Yasuaki Ishimatsu
    Acked-by: Michal Nazarewicz
    Cc: Marek Szyprowski
    Cc: Wen Congyang
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • If race between allocation and isolation in memory-hotplug offline
    happens, some pages could be in MIGRATE_MOVABLE of free_list although the
    pageblock's migratetype of the page is MIGRATE_ISOLATE.

    The race could be detected by get_freepage_migratetype in
    __test_page_isolated_in_pageblock. If it is detected, now EBUSY gets
    bubbled all the way up and the hotplug operations fails.

    But better idea is instead of returning and failing memory-hotremove, move
    the free page to the correct list at the time it is detected. It could
    enhance memory-hotremove operation success ratio although the race is
    really rare.

    Suggested by Mel Gorman.

    [akpm@linux-foundation.org: small cleanup]
    Signed-off-by: Minchan Kim
    Cc: KAMEZAWA Hiroyuki
    Reviewed-by: Yasuaki Ishimatsu
    Acked-by: Mel Gorman
    Cc: Xishi Qiu
    Cc: Wen Congyang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Like below, memory-hotplug makes race between page-isolation
    and page-allocation so it can hit BUG_ON in __offline_isolated_pages.

    CPU A CPU B

    start_isolate_page_range
    set_migratetype_isolate
    spin_lock_irqsave(zone->lock)

    free_hot_cold_page(Page A)
    /* without zone->lock */
    migratetype = get_pageblock_migratetype(Page A);
    /*
    * Page could be moved into MIGRATE_MOVABLE
    * of per_cpu_pages
    */
    list_add_tail(&page->lru, &pcp->lists[migratetype]);

    set_pageblock_isolate
    move_freepages_block
    drain_all_pages

    /* Page A could be in MIGRATE_MOVABLE of free_list. */

    check_pages_isolated
    __test_page_isolated_in_pageblock
    /*
    * We can't catch freed page which
    * is free_list[MIGRATE_MOVABLE]
    */
    if (PageBuddy(page A))
    pfn += 1 << page_order(page A);

    /* So, Page A could be allocated */

    __offline_isolated_pages
    /*
    * BUG_ON hit or offline page
    * which is used by someone
    */
    BUG_ON(!PageBuddy(page A));

    This patch checks page's migratetype in freelist in
    __test_page_isolated_in_pageblock. So now
    __test_page_isolated_in_pageblock can check the page caused by above race
    and can fail of memory offlining.

    Signed-off-by: Minchan Kim
    Acked-by: KAMEZAWA Hiroyuki
    Reviewed-by: Yasuaki Ishimatsu
    Acked-by: Mel Gorman
    Cc: Xishi Qiu
    Cc: Wen Congyang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • The page allocator uses set_page_private and page_private for handling
    migratetype when it frees page. Let's replace them with [set|get]
    _freepage_migratetype to make it more clear.

    Signed-off-by: Minchan Kim
    Acked-by: KAMEZAWA Hiroyuki
    Reviewed-by: Yasuaki Ishimatsu
    Acked-by: Mel Gorman
    Cc: Xishi Qiu
    Cc: Wen Congyang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Add NR_FREE_CMA_PAGES counter to be later used for checking watermark in
    __zone_watermark_ok(). For simplicity and to avoid #ifdef hell make this
    counter always available (not only when CONFIG_CMA=y).

    [akpm@linux-foundation.org: use conventional migratetype naming]
    Signed-off-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Kyungmin Park
    Cc: Marek Szyprowski
    Cc: Michal Nazarewicz
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bartlomiej Zolnierkiewicz
     
  • Isolated free pages shouldn't be accounted to NR_FREE_PAGES counter. Fix
    it by properly decreasing/increasing NR_FREE_PAGES counter in
    set_migratetype_isolate()/unset_migratetype_isolate() and removing counter
    adjustment for isolated pages from free_one_page() and split_free_page().

    Signed-off-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Kyungmin Park
    Cc: Marek Szyprowski
    Cc: Michal Nazarewicz
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bartlomiej Zolnierkiewicz
     

01 Aug, 2012

2 commits

  • When hotplug offlining happens on zone A, it starts to mark freed page as
    MIGRATE_ISOLATE type in buddy for preventing further allocation.
    (MIGRATE_ISOLATE is very irony type because it's apparently on buddy but
    we can't allocate them).

    When the memory shortage happens during hotplug offlining, current task
    starts to reclaim, then wake up kswapd. Kswapd checks watermark, then go
    sleep because current zone_watermark_ok_safe doesn't consider
    MIGRATE_ISOLATE freed page count. Current task continue to reclaim in
    direct reclaim path without kswapd's helping. The problem is that
    zone->all_unreclaimable is set by only kswapd so that current task would
    be looping forever like below.

    __alloc_pages_slowpath
    restart:
    wake_all_kswapd
    rebalance:
    __alloc_pages_direct_reclaim
    do_try_to_free_pages
    if global_reclaim && !all_unreclaimable
    return 1; /* It means we did did_some_progress */
    skip __alloc_pages_may_oom
    should_alloc_retry
    goto rebalance;

    If we apply KOSAKI's patch[1] which doesn't depends on kswapd about
    setting zone->all_unreclaimable, we can solve this problem by killing some
    task in direct reclaim path. But it doesn't wake up kswapd, still. It
    could be a problem still if other subsystem needs GFP_ATOMIC request. So
    kswapd should consider MIGRATE_ISOLATE when it calculate free pages BEFORE
    going sleep.

    This patch counts the number of MIGRATE_ISOLATE page block and
    zone_watermark_ok_safe will consider it if the system has such blocks
    (fortunately, it's very rare so no problem in POV overhead and kswapd is
    never hotpath).

    Copy/modify from Mel's quote
    "
    Ideal solution would be "allocating" the pageblock.
    It would keep the free space accounting as it is but historically,
    memory hotplug didn't allocate pages because it would be difficult to
    detect if a pageblock was isolated or if part of some balloon.
    Allocating just full pageblocks would work around this, However,
    it would play very badly with CMA.
    "

    [1] http://lkml.org/lkml/2012/6/14/74

    [akpm@linux-foundation.org: simplify nr_zone_isolate_freepages(), rework zone_watermark_ok_safe() comment, simplify set_pageblock_isolate() and restore_pageblock_isolate()]
    [akpm@linux-foundation.org: fix CONFIG_MEMORY_ISOLATION=n build]
    Signed-off-by: Minchan Kim
    Suggested-by: KOSAKI Motohiro
    Tested-by: Aaditya Kumar
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • mm/page_alloc.c has some memory isolation functions but they are used only
    when we enable CONFIG_{CMA|MEMORY_HOTPLUG|MEMORY_FAILURE}. So let's make
    it configurable by new CONFIG_MEMORY_ISOLATION so that it can reduce
    binary size and we can check it simple by CONFIG_MEMORY_ISOLATION, not if
    defined CONFIG_{CMA|MEMORY_HOTPLUG|MEMORY_FAILURE}.

    Signed-off-by: Minchan Kim
    Cc: Andi Kleen
    Cc: Marek Szyprowski
    Acked-by: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

21 May, 2012

1 commit

  • This commit changes various functions that change pages and
    pageblocks migrate type between MIGRATE_ISOLATE and
    MIGRATE_MOVABLE in such a way as to allow to work with
    MIGRATE_CMA migrate type.

    Signed-off-by: Michal Nazarewicz
    Signed-off-by: Marek Szyprowski
    Reviewed-by: KAMEZAWA Hiroyuki
    Tested-by: Rob Clark
    Tested-by: Ohad Ben-Cohen
    Tested-by: Benjamin Gaignard
    Tested-by: Robert Nelson
    Tested-by: Barry Song

    Michal Nazarewicz
     

27 Oct, 2010

1 commit

  • __test_page_isolated_in_pageblock() returns 1 if all pages in the range
    are isolated, so fix the comment. Variable `pfn' will be initialised in
    the following loop so remove it.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     

07 Nov, 2008

1 commit

  • My last bugfix here (adding zone->lock) introduced a new problem: Using
    page_zone(pfn_to_page(pfn)) to get the zone after the for() loop is wrong.
    pfn will then be >= end_pfn, which may be in a different zone or not
    present at all. This may lead to an addressing exception in page_zone()
    or spin_lock_irqsave().

    Now I use __first_valid_page() again after the loop to find a valid page
    for page_zone().

    Signed-off-by: Gerald Schaefer
    Acked-by: Nathan Fontenot
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerald Schaefer
     

03 Oct, 2008

1 commit

  • __test_page_isolated_in_pageblock() in mm/page_isolation.c has a comment
    saying that the caller must hold zone->lock. But the only caller of that
    function, test_pages_isolated(), does not hold zone->lock and the lock is
    also not acquired anywhere before. This patch adds the missing zone->lock
    to test_pages_isolated().

    We reproducibly run into BUG_ON(!PageBuddy(page)) in __offline_isolated_pages()
    during memory hotplug stress test, see trace below. This patch fixes that
    problem, it would be good if we could have it in 2.6.27.

    kernel BUG at /home/autobuild/BUILD/linux-2.6.26-20080909/mm/page_alloc.c:4561!
    illegal operation: 0001 [#1] PREEMPT SMP
    Modules linked in: dm_multipath sunrpc bonding qeth_l3 dm_mod qeth ccwgroup vmur
    CPU: 1 Not tainted 2.6.26-29.x.20080909-s390default #1
    Process memory_loop_all (pid: 10025, task: 2f444028, ksp: 2b10dd28)
    Krnl PSW : 040c0000 801727ea (__offline_isolated_pages+0x18e/0x1c4)
    R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0
    Krnl GPRS: 00000000 7e27fc00 00000000 7e27fc00
    00000000 00000400 00014000 7e27fc01
    00606f00 7e27fc00 00013fe0 2b10dd28
    00000005 80172662 801727b2 2b10dd28
    Krnl Code: 801727de: 5810900c l %r1,12(%r9)
    801727e2: a7f4ffb3 brc 15,80172748
    801727e6: a7f40001 brc 15,801727e8
    >801727ea: a7f4ffbc brc 15,80172762
    801727ee: a7f40001 brc 15,801727f0
    801727f2: a7f4ffaf brc 15,80172750
    801727f6: 0707 bcr 0,%r7
    801727f8: 0017 unknown
    Call Trace:
    ([] __offline_isolated_pages+0x116/0x1c4)
    [] offline_isolated_pages_cb+0x22/0x34
    [] walk_memory_resource+0xcc/0x11c
    [] offline_pages+0x36a/0x498
    [] remove_memory+0x36/0x44
    [] memory_block_change_state+0x112/0x150
    [] store_mem_state+0x90/0xe4
    [] sysdev_store+0x34/0x40
    [] sysfs_write_file+0xd0/0x178
    [] vfs_write+0x74/0x118
    [] sys_write+0x46/0x7c
    [] sysc_do_restart+0x12/0x16
    [] 0x77f3e8ca

    Signed-off-by: Gerald Schaefer
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerald Schaefer
     

02 Sep, 2008

1 commit


15 Nov, 2007

1 commit

  • We should unset migrate type "ISOLATE" when we successfully removed memory.
    But current code has BUG and cannot works well.

    This patch also includes bugfix? to change get_pageblock_flags to
    get_pageblock_migratetype().

    Thanks to Badari Pulavarty for finding this.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

17 Oct, 2007

1 commit

  • Implement generic chunk-of-pages isolation method by using page grouping ops.

    This patch add MIGRATE_ISOLATE to MIGRATE_TYPES. By this
    - MIGRATE_TYPES increases.
    - bitmap for migratetype is enlarged.

    pages of MIGRATE_ISOLATE migratetype will not be allocated even if it is free.
    By this, you can isolated *freed* pages from users. How-to-free pages is not
    a purpose of this patch. You may use reclaim and migrate codes to free pages.

    If start_isolate_page_range(start,end) is called,
    - migratetype of the range turns to be MIGRATE_ISOLATE if
    its type is MIGRATE_MOVABLE. (*) this check can be updated if other
    memory reclaiming works make progress.
    - MIGRATE_ISOLATE is not on migratetype fallback list.
    - All free pages and will-be-freed pages are isolated.
    To check all pages in the range are isolated or not, use test_pages_isolated(),
    To cancel isolation, use undo_isolate_page_range().

    Changes V6 -> V7
    - removed unnecessary #ifdef

    There are HOLES_IN_ZONE handling codes...I'm glad if we can remove them..

    Signed-off-by: Yasunori Goto
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki