15 Jun, 2018

1 commit

  • mm/*.c files use symbolic and octal styles for permissions.

    Using octal and not symbolic permissions is preferred by many as more
    readable.

    https://lkml.org/lkml/2016/8/2/1945

    Prefer the direct use of octal for permissions.

    Done using
    $ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c
    and some typing.

    Before: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
    44
    After: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
    86

    Miscellanea:

    o Whitespace neatening around these conversions.

    Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com
    Signed-off-by: Joe Perches
    Acked-by: David Rientjes
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

25 May, 2018

1 commit

  • This reverts the following commits that change CMA design in MM.

    3d2054ad8c2d ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y")

    1d47a3ec09b5 ("mm/cma: remove ALLOC_CMA")

    bad8c6c0b114 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")

    Ville reported a following error on i386.

    Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
    microcode: microcode updated early to revision 0x4, date = 2013-06-28
    Initializing CPU#0
    Initializing HighMem for node 0 (000377fe:00118000)
    Initializing Movable for node 0 (00000001:00118000)
    BUG: Bad page state in process swapper pfn:377fe
    page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0
    flags: 0x80000000()
    raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001
    page dumped because: nonzero mapcount
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145
    Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013
    Call Trace:
    dump_stack+0x60/0x96
    bad_page+0x9a/0x100
    free_pages_check_bad+0x3f/0x60
    free_pcppages_bulk+0x29d/0x5b0
    free_unref_page_commit+0x84/0xb0
    free_unref_page+0x3e/0x70
    __free_pages+0x1d/0x20
    free_highmem_page+0x19/0x40
    add_highpages_with_active_regions+0xab/0xeb
    set_highmem_pages_init+0x66/0x73
    mem_init+0x1b/0x1d7
    start_kernel+0x17a/0x363
    i386_start_kernel+0x95/0x99
    startup_32_smp+0x164/0x168

    The reason for this error is that the span of MOVABLE_ZONE is extended
    to whole node span for future CMA initialization, and, normal memory is
    wrongly freed here. I submitted the fix and it seems to work, but,
    another problem happened.

    It's so late time to fix the later problem so I decide to reverting the
    series.

    Reported-by: Ville Syrjälä
    Acked-by: Laura Abbott
    Acked-by: Michal Hocko
    Cc: Andrew Morton
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

12 Apr, 2018

2 commits

  • Now, all reserved pages for CMA region are belong to the ZONE_MOVABLE
    and it only serves for a request with GFP_HIGHMEM && GFP_MOVABLE.

    Therefore, we don't need to maintain ALLOC_CMA at all.

    Link: http://lkml.kernel.org/r/1512114786-5085-3-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Reviewed-by: Aneesh Kumar K.V
    Tested-by: Tony Lindgren
    Acked-by: Vlastimil Babka
    Cc: Johannes Weiner
    Cc: Laura Abbott
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Michal Nazarewicz
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • No allocation callback is using this argument anymore. new_page_node
    used to use this parameter to convey node_id resp. migration error up
    to move_pages code (do_move_page_to_node_array). The error status never
    made it into the final status field and we have a better way to
    communicate node id to the status field now. All other allocation
    callbacks simply ignored the argument so we can drop it finally.

    [mhocko@suse.com: fix migration callback]
    Link: http://lkml.kernel.org/r/20180105085259.GH2801@dhcp22.suse.cz
    [akpm@linux-foundation.org: fix alloc_misplaced_dst_page()]
    [mhocko@kernel.org: fix build]
    Link: http://lkml.kernel.org/r/20180103091134.GB11319@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/20180103082555.14592-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Reviewed-by: Zi Yan
    Cc: Andrea Reale
    Cc: Anshuman Khandual
    Cc: Kirill A. Shutemov
    Cc: Mike Kravetz
    Cc: Naoya Horiguchi
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

06 Apr, 2018

2 commits

  • Link: http://lkml.kernel.org/r/1519585191-10180-4-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • It's possible for free pages to become stranded on per-cpu pagesets
    (pcps) that, if drained, could be merged with buddy pages on the zone's
    free area to form large order pages, including up to MAX_ORDER.

    Consider a verbose example using the tools/vm/page-types tool at the
    beginning of a ZONE_NORMAL ('B' indicates a buddy page and 'S' indicates
    a slab page). Pages on pcps do not have any page flags set.

    109954 1 _______S________________________________________________________
    109955 2 __________B_____________________________________________________
    109957 1 ________________________________________________________________
    109958 1 __________B_____________________________________________________
    109959 7 ________________________________________________________________
    109960 1 __________B_____________________________________________________
    109961 9 ________________________________________________________________
    10996a 1 __________B_____________________________________________________
    10996b 3 ________________________________________________________________
    10996e 1 __________B_____________________________________________________
    10996f 1 ________________________________________________________________
    ...
    109f8c 1 __________B_____________________________________________________
    109f8d 2 ________________________________________________________________
    109f8f 2 __________B_____________________________________________________
    109f91 f ________________________________________________________________
    109fa0 1 __________B_____________________________________________________
    109fa1 7 ________________________________________________________________
    109fa8 1 __________B_____________________________________________________
    109fa9 1 ________________________________________________________________
    109faa 1 __________B_____________________________________________________
    109fab 1 _______S________________________________________________________

    The compaction migration scanner is attempting to defragment this memory
    since it is at the beginning of the zone. It has done so quite well,
    all movable pages have been migrated. From pfn [0x109955, 0x109fab),
    there are only buddy pages and pages without flags set.

    These pages may be stranded on pcps that could otherwise allow this
    memory to be coalesced if freed back to the zone free area. It is
    possible that some of these pages may not be on pcps and that something
    has called alloc_pages() and used the memory directly, but we rely on
    the absence of __GFP_MOVABLE in these cases to allocate from
    MIGATE_UNMOVABLE pageblocks to try to keep these MIGRATE_MOVABLE
    pageblocks as free as possible.

    These buddy and pcp pages, spanning 1,621 pages, could be coalesced and
    allow for three transparent hugepages to be dynamically allocated.
    Running the numbers for all such spans on the system, it was found that
    there were over 400 such spans of only buddy pages and pages without
    flags set at the time this /proc/kpageflags sample was collected.
    Without this support, there were _no_ order-9 or order-10 pages free.

    When kcompactd fails to defragment memory such that a cc.order page can
    be allocated, drain all pcps for the zone back to the buddy allocator so
    this stranding cannot occur. Compaction for that order will
    subsequently be deferred, which acts as a ratelimit on this drain.

    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803010340100.88270@chino.kir.corp.google.com
    Signed-off-by: David Rientjes
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

01 Feb, 2018

1 commit

  • "mode" argument is not used by try_to_compact_pages() and sub functions
    anymore, it has been replaced by "prio". Fix the comment to explain the
    use of "prio" argument.

    Link: http://lkml.kernel.org/r/1515801336-20611-1-git-send-email-yang.shi@linux.alibaba.com
    Signed-off-by: Yang Shi
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     

18 Nov, 2017

5 commits

  • Commit f3c931633a59 ("mm, compaction: persistently skip hugetlbfs
    pageblocks") has introduced pageblock_skip_persistent() checks into
    migration and free scanners, to make sure pageblocks that should be
    persistently skipped are marked as such, regardless of the
    ignore_skip_hint flag.

    Since the previous patch introduced a new no_set_skip_hint flag, the
    ignore flag no longer prevents marking pageblocks as skipped. Therefore
    we can remove the special cases. The relevant pageblocks will be marked
    as skipped by the common logic which marks each pageblock where no page
    could be isolated. This makes the code simpler.

    Link: http://lkml.kernel.org/r/20171102121706.21504-3-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Pageblock skip hints were added as a heuristic for compaction, which
    shares core code with CMA. Since CMA reliability would suffer from the
    heuristics, compact_control flag ignore_skip_hint was added for the CMA
    use case. Since 6815bf3f233e ("mm/compaction: respect ignore_skip_hint
    in update_pageblock_skip") the flag also means that CMA won't *update*
    the skip hints in addition to ignoring them.

    Today, direct compaction can also ignore the skip hints in the last
    resort attempt, but there's no reason not to set them when isolation
    fails in such case. Thus, this patch splits off a new no_set_skip_hint
    flag to avoid the updating, which only CMA sets. This should improve
    the heuristics a bit, and allow us to simplify the persistent skip bit
    handling as the next step.

    Link: http://lkml.kernel.org/r/20171102121706.21504-2-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Mel Gorman
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • pageblock_skip_persistent() checks for HugeTLB pages of pageblock order.
    When clearing pageblock skip bits for compaction, the bits are not
    cleared for such pageblocks, because they cannot contain base pages
    suitable for migration, nor free pages to use as migration targets.

    This optimization can be simply extended to all compound pages of order
    equal or larger than pageblock order, because migrating such pages (if
    they support it) cannot help sub-pageblock fragmentation. This includes
    THP's and also gigantic HugeTLB pages, which the current implementation
    doesn't persistently skip due to a strict pageblock_order equality check
    and not recognizing tail pages.

    While THP pages are generally less "persistent" than HugeTLB, we can
    still expect that if a THP exists at the point of
    __reset_isolation_suitable(), it will exist also during the subsequent
    compaction run. The time difference here could be actually smaller than
    between a compaction run that sets a (non-persistent) skip bit on a THP,
    and the next compaction run that observes it.

    Link: http://lkml.kernel.org/r/20171102121706.21504-1-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Mel Gorman
    Acked-by: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • It is pointless to migrate hugetlb memory as part of memory compaction
    if the hugetlb size is equal to the pageblock order. No defragmentation
    is occurring in this condition.

    It is also pointless to for the freeing scanner to scan a pageblock
    where a hugetlb page is pinned. Unconditionally skip these pageblocks,
    and do so peristently so that they are not rescanned until it is
    observed that these hugepages are no longer pinned.

    It would also be possible to do this by involving the hugetlb subsystem
    in marking pageblocks to no longer be skipped when they hugetlb pages
    are freed. This is a simple solution that doesn't involve any
    additional subsystems in pageblock skip manipulation.

    [rientjes@google.com: fix build]
    Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708201734390.117182@chino.kir.corp.google.com
    Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708151639130.106658@chino.kir.corp.google.com
    Signed-off-by: David Rientjes
    Tested-by: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Kcompactd is needlessly ignoring pageblock skip information. It is
    doing MIGRATE_SYNC_LIGHT compaction, which is no more powerful than
    MIGRATE_SYNC compaction.

    If compaction recently failed to isolate memory from a set of
    pageblocks, there is nothing to indicate that kcompactd will be able to
    do so, or that it is beneficial from attempting to isolate memory.

    Use the pageblock skip hint to avoid rescanning pageblocks needlessly
    until that information is reset.

    Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708151638550.106658@chino.kir.corp.google.com
    Signed-off-by: David Rientjes
    Cc: Vlastimil Babka
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

04 Oct, 2017

1 commit

  • Andrea brought to my attention that the L->{L,S} guarantees are
    completely bogus for this case. I was looking at the diagram, from the
    offending commit, when that _is_ the race, we had the load reordered
    already.

    What we need is at least S->L semantics, thus simply use
    wq_has_sleeper() to serialize the call for good.

    Link: http://lkml.kernel.org/r/20170914175313.GB811@linux-80c1.suse
    Fixes: 46acef048a6 (mm,compaction: serialize waitqueue_active() checks)
    Signed-off-by: Davidlohr Bueso
    Reported-by: Andrea Parri
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

07 Jul, 2017

1 commit

  • __reset_isolation_suitable walks the whole zone pfn range and it tries
    to jump over holes by checking the zone for each page. It might still
    stumble over offline pages, though. Skip those by checking
    pfn_to_online_page()

    Link: http://lkml.kernel.org/r/20170515085827.16474-9-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Balbir Singh
    Cc: Dan Williams
    Cc: Daniel Kiper
    Cc: David Rientjes
    Cc: Heiko Carstens
    Cc: Igor Mammedov
    Cc: Jerome Glisse
    Cc: Joonsoo Kim
    Cc: Martin Schwidefsky
    Cc: Mel Gorman
    Cc: Reza Arbab
    Cc: Tobias Regnery
    Cc: Toshi Kani
    Cc: Vitaly Kuznetsov
    Cc: Xishi Qiu
    Cc: Yasuaki Ishimatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

09 May, 2017

5 commits

  • The main goal of direct compaction is to form a high-order page for
    allocation, but it should also help against long-term fragmentation when
    possible.

    Most lower-than-pageblock-order compactions are for non-movable
    allocations, which means that if we compact in a movable pageblock and
    terminate as soon as we create the high-order page, it's unlikely that
    the fallback heuristics will claim the whole block. Instead there might
    be a single unmovable page in a pageblock full of movable pages, and the
    next unmovable allocation might pick another pageblock and increase
    long-term fragmentation.

    To help against such scenarios, this patch changes the termination
    criteria for compaction so that the current pageblock is finished even
    though the high-order page already exists. Note that it might be
    possible that the high-order page formed elsewhere in the zone due to
    parallel activity, but this patch doesn't try to detect that.

    This is only done with sync compaction, because async compaction is
    limited to pageblock of the same migratetype, where it cannot result in
    a migratetype fallback. (Async compaction also eagerly skips
    order-aligned blocks where isolation fails, which is against the goal of
    migrating away as much of the pageblock as possible.)

    As a result of this patch, long-term memory fragmentation should be
    reduced.

    In testing based on 4.9 kernel with stress-highalloc from mmtests
    configured for order-4 GFP_KERNEL allocations, this patch has reduced
    the number of unmovable allocations falling back to movable pageblocks
    by 20%. The number

    Link: http://lkml.kernel.org/r/20170307131545.28577-9-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Mel Gorman
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • The migrate scanner in async compaction is currently limited to
    MIGRATE_MOVABLE pageblocks. This is a heuristic intended to reduce
    latency, based on the assumption that non-MOVABLE pageblocks are
    unlikely to contain movable pages.

    However, with the exception of THP's, most high-order allocations are
    not movable. Should the async compaction succeed, this increases the
    chance that the non-MOVABLE allocations will fallback to a MOVABLE
    pageblock, making the long-term fragmentation worse.

    This patch attempts to help the situation by changing async direct
    compaction so that the migrate scanner only scans the pageblocks of the
    requested migratetype. If it's a non-MOVABLE type and there are such
    pageblocks that do contain movable pages, chances are that the
    allocation can succeed within one of such pageblocks, removing the need
    for a fallback. If that fails, the subsequent sync attempt will ignore
    this restriction.

    In testing based on 4.9 kernel with stress-highalloc from mmtests
    configured for order-4 GFP_KERNEL allocations, this patch has reduced
    the number of unmovable allocations falling back to movable pageblocks
    by 30%. The number of movable allocations falling back is reduced by
    12%.

    Link: http://lkml.kernel.org/r/20170307131545.28577-8-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Preparation patch. We are going to need migratetype at lower layers
    than compact_zone() and compact_finished().

    Link: http://lkml.kernel.org/r/20170307131545.28577-7-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Mel Gorman
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Preparation for making the decisions more complex and depending on
    compact_control flags. No functional change.

    Link: http://lkml.kernel.org/r/20170307131545.28577-6-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Mel Gorman
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • When detecting whether compaction has succeeded in forming a high-order
    page, __compact_finished() employs a watermark check, followed by an own
    search for a suitable page in the freelists. This is not ideal for two
    reasons:

    - The watermark check also searches high-order freelists, but has a
    less strict criteria wrt fallback. It's therefore redundant and waste
    of cycles. This was different in the past when high-order watermark
    check attempted to apply reserves to high-order pages.

    - The watermark check might actually fail due to lack of order-0 pages.
    Compaction can't help with that, so there's no point in continuing
    because of that. It's possible that high-order page still exists and
    it terminates.

    This patch therefore removes the watermark check. This should save some
    cycles and terminate compaction sooner in some cases.

    Link: http://lkml.kernel.org/r/20170307131545.28577-3-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Mel Gorman
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

04 May, 2017

1 commit

  • By reviewing code, I find that if the migrate target is a large free
    page and we ignore suitable, it may splite large target free page into
    smaller block which is not good for defrag. So move the ignore block
    suitable after check large free page.

    As Vlastimil pointed out in RFC version that this patch is just based on
    logical analyses which might be better for future-proofing the function
    and it is most likely won't have any visible effect right now, for
    direct compaction shouldn't have to be called if there's a
    >=pageblock_order page already available.

    Link: http://lkml.kernel.org/r/1489490743-5364-1-git-send-email-xieyisheng1@huawei.com
    Signed-off-by: Yisheng Xie
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Minchan Kim
    Cc: Hanjun Guo
    Cc: Xishi Qiu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yisheng Xie
     

02 Mar, 2017

1 commit


25 Feb, 2017

1 commit

  • Patch series "HWPOISON: soft offlining for non-lru movable page", v6.

    After Minchan's commit bda807d44454 ("mm: migrate: support non-lru
    movable page migration"), some type of non-lru page like zsmalloc and
    virtio-balloon page also support migration.

    Therefore, we can:

    1) soft offlining no-lru movable pages, which means when memory
    corrected errors occur on a non-lru movable page, we can stop to use
    it by migrating data onto another page and disable the original
    (maybe half-broken) one.

    2) enable memory hotplug for non-lru movable pages, i.e. we may offline
    blocks, which include such pages, by using non-lru page migration.

    This patchset is heavily dependent on non-lru movable page migration.

    This patch (of 4):

    Change the return type of isolate_movable_page() from bool to int. It
    will return 0 when isolate movable page successfully, and return -EBUSY
    when it isolates failed.

    There is no functional change within this patch but prepare for later
    patch.

    [xieyisheng1@huawei.com: v6]
    Link: http://lkml.kernel.org/r/1486108770-630-2-git-send-email-xieyisheng1@huawei.com
    Link: http://lkml.kernel.org/r/1485867981-16037-2-git-send-email-ysxie@foxmail.com
    Signed-off-by: Yisheng Xie
    Suggested-by: Michal Hocko
    Acked-by: Minchan Kim
    Cc: Andi Kleen
    Cc: Hanjun Guo
    Cc: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Naoya Horiguchi
    Cc: Reza Arbab
    Cc: Taku Izumi
    Cc: Vitaly Kuznetsov
    Cc: Vlastimil Babka
    Cc: Xishi Qiu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yisheng Xie
     

23 Feb, 2017

2 commits

  • Without a memory barrier, the following race can occur with a high-order
    allocation:

    wakeup_kcompactd(order == 1) kcompactd()
    [L] waitqueue_active(kcompactd_wait)
    [S] prepare_to_wait_event(kcompactd_wait)
    [L] (kcompactd_max_order == 0)
    [S] kcompactd_max_order = order; schedule()

    Where the waitqueue_active() check is speculatively re-ordered to before
    setting the actual condition (max_order), not seeing the threads that's
    going to block; making us miss a wakeup. There are a couple of options
    to fix this, including calling wq_has_sleepers() which adds a full
    barrier, or unconditionally doing the wake_up_interruptible() and
    serialize on the q->lock. However, to make use of the control
    dependency, we just need to add L->L guarantees.

    While this bug is theoretical, there have been other offenders of the
    lockless waitqueue_active() in the past -- this is also documented in
    the call itself.

    Link: http://lkml.kernel.org/r/1483975528-24342-1-git-send-email-dave@stgolabs.net
    Signed-off-by: Davidlohr Bueso
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • A "compact_daemon_wake" vmstat exists that represents the number of
    times kcompactd has woken up. This doesn't represent how much work it
    actually did, though.

    It's useful to understand how much compaction work is being done by
    kcompactd versus other methods such as direct compaction and explicitly
    triggered per-node (or system) compaction.

    This adds two new vmstats: "compact_daemon_migrate_scanned" and
    "compact_daemon_free_scanned" to represent the number of pages kcompactd
    has scanned as part of its migration scanner and freeing scanner,
    respectively.

    These values are still accounted for in the general
    "compact_migrate_scanned" and "compact_free_scanned" for compatibility.

    It could be argued that explicitly triggered compaction could also be
    tracked separately, and that could be added if others find it useful.

    Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1612071749390.69852@chino.kir.corp.google.com
    Signed-off-by: David Rientjes
    Acked-by: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

15 Dec, 2016

1 commit

  • compaction has been disabled for GFP_NOFS and GFP_NOIO requests since
    the direct compaction was introduced by commit 56de7263fcf3 ("mm:
    compaction: direct compact when a high-order allocation fails"). The
    main reason is that the migration of page cache pages might recurse back
    to fs/io layer and we could potentially deadlock. This is overly
    conservative because all the anonymous memory is migrateable in the
    GFP_NOFS context just fine. This might be a large portion of the memory
    in many/most workkloads.

    Remove the GFP_NOFS restriction and make sure that we skip all fs pages
    (those with a mapping) while isolating pages to be migrated. We cannot
    consider clean fs pages because they might need a metadata update so
    only isolate pages without any mapping for nofs requests.

    The effect of this patch will be probably very limited in many/most
    workloads because higher order GFP_NOFS requests are quite rare,
    although different configurations might lead to very different results.
    David Chinner has mentioned a heavy metadata workload with 64kB block
    which to quote him:

    : Unfortunately, there was an era of cargo cult configuration tweaks in the
    : Ceph community that has resulted in a large number of production machines
    : with XFS filesystems configured this way. And a lot of them store large
    : numbers of small files and run under significant sustained memory
    : pressure.
    :
    : I slowly working towards getting rid of these high order allocations and
    : replacing them with the equivalent number of single page allocations, but
    : I haven't got that (complex) change working yet.

    We can do the following to simulate that workload:
    $ mkfs.xfs -f -n size=64k
    $ mount /mnt/scratch
    $ time ./fs_mark -D 10000 -S0 -n 100000 -s 0 -L 32 \
    -d /mnt/scratch/0 -d /mnt/scratch/1 \
    -d /mnt/scratch/2 -d /mnt/scratch/3 \
    -d /mnt/scratch/4 -d /mnt/scratch/5 \
    -d /mnt/scratch/6 -d /mnt/scratch/7 \
    -d /mnt/scratch/8 -d /mnt/scratch/9 \
    -d /mnt/scratch/10 -d /mnt/scratch/11 \
    -d /mnt/scratch/12 -d /mnt/scratch/13 \
    -d /mnt/scratch/14 -d /mnt/scratch/15

    and indeed is hammers the system with many high order GFP_NOFS requests as
    per a simle tracepoint during the load:
    $ echo '!(gfp_flags & 0x80) && (gfp_flags &0x400000)' > $TRACE_MNT/events/kmem/mm_page_alloc/filter
    I am getting
    5287609 order=0
    37 order=1
    1594905 order=2
    3048439 order=3
    6699207 order=4
    66645 order=5

    My testing was done in a kvm guest so performance numbers should be
    taken with a grain of salt but there seems to be a difference when the
    patch is applied:

    * Original kernel
    FSUse% Count Size Files/sec App Overhead
    1 1600000 0 4300.1 20745838
    3 3200000 0 4239.9 23849857
    5 4800000 0 4243.4 25939543
    6 6400000 0 4248.4 19514050
    8 8000000 0 4262.1 20796169
    9 9600000 0 4257.6 21288675
    11 11200000 0 4259.7 19375120
    13 12800000 0 4220.7 22734141
    14 14400000 0 4238.5 31936458
    16 16000000 0 4231.5 23409901
    18 17600000 0 4045.3 23577700
    19 19200000 0 2783.4 58299526
    21 20800000 0 2678.2 40616302
    23 22400000 0 2693.5 83973996

    and xfs complaining about memory allocation not making progress
    [ 2304.372647] XFS: fs_mark(3289) possible memory allocation deadlock size 65624 in kmem_alloc (mode:0x2408240)
    [ 2304.443323] XFS: fs_mark(3285) possible memory allocation deadlock size 65728 in kmem_alloc (mode:0x2408240)
    [ 4796.772477] XFS: fs_mark(3424) possible memory allocation deadlock size 46936 in kmem_alloc (mode:0x2408240)
    [ 4796.775329] XFS: fs_mark(3423) possible memory allocation deadlock size 51416 in kmem_alloc (mode:0x2408240)
    [ 4797.388808] XFS: fs_mark(3424) possible memory allocation deadlock size 65728 in kmem_alloc (mode:0x2408240)

    * Patched kernel
    FSUse% Count Size Files/sec App Overhead
    1 1600000 0 4289.1 19243934
    3 3200000 0 4241.6 32828865
    5 4800000 0 4248.7 32884693
    6 6400000 0 4314.4 19608921
    8 8000000 0 4269.9 24953292
    9 9600000 0 4270.7 33235572
    11 11200000 0 4346.4 40817101
    13 12800000 0 4285.3 29972397
    14 14400000 0 4297.2 20539765
    16 16000000 0 4219.6 18596767
    18 17600000 0 4273.8 49611187
    19 19200000 0 4300.4 27944451
    21 20800000 0 4270.6 22324585
    22 22400000 0 4317.6 22650382
    24 24000000 0 4065.2 22297964

    So the dropdown at Count 19200000 didn't happen and there was only a
    single warning about allocation not making progress
    [ 3063.815003] XFS: fs_mark(3272) possible memory allocation deadlock size 65624 in kmem_alloc (mode:0x2408240)

    This suggests that the patch has helped even though there is not all that
    much of anonymous memory as the workload mostly generates fs metadata. I
    assume the success rate would be higher with more anonymous memory which
    should be the case in many workloads.

    [akpm@linux-foundation.org: fix comment]
    Link: http://lkml.kernel.org/r/20161012114721.31853-1-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

13 Dec, 2016

2 commits

  • Merge updates from Andrew Morton:

    - various misc bits

    - most of MM (quite a lot of MM material is awaiting the merge of
    linux-next dependencies)

    - kasan

    - printk updates

    - procfs updates

    - MAINTAINERS

    - /lib updates

    - checkpatch updates

    * emailed patches from Andrew Morton : (123 commits)
    init: reduce rootwait polling interval time to 5ms
    binfmt_elf: use vmalloc() for allocation of vma_filesz
    checkpatch: don't emit unified-diff error for rename-only patches
    checkpatch: don't check c99 types like uint8_t under tools
    checkpatch: avoid multiple line dereferences
    checkpatch: don't check .pl files, improve absolute path commit log test
    scripts/checkpatch.pl: fix spelling
    checkpatch: don't try to get maintained status when --no-tree is given
    lib/ida: document locking requirements a bit better
    lib/rbtree.c: fix typo in comment of ____rb_erase_color
    lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM
    MAINTAINERS: add drm and drm/i915 irc channels
    MAINTAINERS: add "C:" for URI for chat where developers hang out
    MAINTAINERS: add drm and drm/i915 bug filing info
    MAINTAINERS: add "B:" for URI where to file bugs
    get_maintainer: look for arbitrary letter prefixes in sections
    printk: add Kconfig option to set default console loglevel
    printk/sound: handle more message headers
    printk/btrfs: handle more message headers
    printk/kdb: handle more message headers
    ...

    Linus Torvalds
     
  • Since commit bda807d44454 ("mm: migrate: support non-lru movable page
    migration") isolate_migratepages_block) can isolate !PageLRU pages which
    would acct_isolated account as NR_ISOLATED_*. Accounting these non-lru
    pages NR_ISOLATED_{ANON,FILE} doesn't make any sense and it can misguide
    heuristics based on those counters such as pgdat_reclaimable_pages resp.
    too_many_isolated which would lead to unexpected stalls during the
    direct reclaim without any good reason. Note that
    __alloc_contig_migrate_range can isolate a lot of pages at once.

    On mobile devices such as 512M ram android Phone, it may use a big zram
    swap. In some cases zram(zsmalloc) uses too many non-lru but
    migratedable pages, such as:

    MemTotal: 468148 kB
    Normal free:5620kB
    Free swap:4736kB
    Total swap:409596kB
    ZRAM: 164616kB(zsmalloc non-lru pages)
    active_anon:60700kB
    inactive_anon:60744kB
    active_file:34420kB
    inactive_file:37532kB

    Fix this by only accounting lru pages to NR_ISOLATED_* in
    isolate_migratepages_block right after they were isolated and we still
    know they were on LRU. Drop acct_isolated because it is called after
    the fact and we've lost that information. Batching per-cpu counter
    doesn't make much improvement anyway. Also make sure that we uncharge
    only LRU pages when putting them back on the LRU in
    putback_movable_pages resp. when unmap_and_move migrates the page.

    [mhocko@suse.com: replace acct_isolated() with direct counting]
    Fixes: bda807d44454 ("mm: migrate: support non-lru movable page migration")
    Link: http://lkml.kernel.org/r/20161019080240.9682-1-mhocko@kernel.org
    Signed-off-by: Ming Ling
    Signed-off-by: Michal Hocko
    Acked-by: Minchan Kim
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ming Ling
     

02 Dec, 2016

1 commit

  • Install the callbacks via the state machine. Should the hotplug init fail then
    no threads are spawned.

    Signed-off-by: Anna-Maria Gleixner
    Signed-off-by: Sebastian Andrzej Siewior
    Cc: Michal Hocko
    Cc: Mel Gorman
    Cc: linux-mm@kvack.org
    Cc: rt@linutronix.de
    Cc: Andrew Morton
    Cc: Vlastimil Babka
    Link: http://lkml.kernel.org/r/20161126231350.10321-15-bigeasy@linutronix.de
    Signed-off-by: Thomas Gleixner

    Anna-Maria Gleixner
     

08 Oct, 2016

11 commits

  • Fragmentation index and the vm.extfrag_threshold sysctl is meant as a
    heuristic to prevent excessive compaction for costly orders (i.e. THP).
    It's unlikely to make any difference for non-costly orders, especially
    with the default threshold. But we cannot afford any uncertainty for
    the non-costly orders where the only alternative to successful
    reclaim/compaction is OOM. After the recent patches we are guaranteed
    maximum effort without heuristics from compaction before deciding OOM,
    and fragindex is the last remaining heuristic. Therefore skip fragindex
    altogether for non-costly orders.

    Suggested-by: Michal Hocko
    Link: http://lkml.kernel.org/r/20160926162025.21555-5-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • The compaction_zonelist_suitable() function tries to determine if
    compaction will be able to proceed after sufficient reclaim, i.e.
    whether there are enough reclaimable pages to provide enough order-0
    freepages for compaction.

    This addition of reclaimable pages to the free pages works well for the
    order-0 watermark check, but in the fragmentation index check we only
    consider truly free pages. Thus we can get fragindex value close to 0
    which indicates failure do to lack of memory, and wrongly decide that
    compaction won't be suitable even after reclaim.

    Instead of trying to somehow adjust fragindex for reclaimable pages,
    let's just skip it from compaction_zonelist_suitable().

    Link: http://lkml.kernel.org/r/20160926162025.21555-4-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Several people have reported premature OOMs for order-2 allocations
    (stack) due to OOM rework in 4.7. In the scenario (parallel kernel
    build and dd writing to two drives) many pageblocks get marked as
    Unmovable and compaction free scanner struggles to isolate free pages.
    Joonsoo Kim pointed out that the free scanner skips pageblocks that are
    not movable to prevent filling them and forcing non-movable allocations
    to fallback to other pageblocks. Such heuristic makes sense to help
    prevent long-term fragmentation, but premature OOMs are relatively more
    urgent problem. As a compromise, this patch disables the heuristic only
    for the ultimate compaction priority.

    Link: http://lkml.kernel.org/r/20160906135258.18335-5-vbabka@suse.cz
    Reported-by: Ralf-Peter Rohbeck
    Reported-by: Arkadiusz Miskiewicz
    Reported-by: Olaf Hering
    Suggested-by: Joonsoo Kim
    Signed-off-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Michal Hocko
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Rik van Riel
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • The __compaction_suitable() function checks the low watermark plus a
    compact_gap() gap to decide if there's enough free memory to perform
    compaction. Then __isolate_free_page uses low watermark check to decide
    if particular free page can be isolated. In the latter case, using low
    watermark is needlessly pessimistic, as the free page isolations are
    only temporary. For __compaction_suitable() the higher watermark makes
    sense for high-order allocations where more freepages increase the
    chance of success, and we can typically fail with some order-0 fallback
    when the system is struggling to reach that watermark. But for
    low-order allocation, forming the page should not be that hard. So
    using low watermark here might just prevent compaction from even trying,
    and eventually lead to OOM killer even if we are above min watermarks.

    So after this patch, we use min watermark for non-costly orders in
    __compaction_suitable(), and for all orders in __isolate_free_page().

    [vbabka@suse.cz: clarify __isolate_free_page() comment]
    Link: http://lkml.kernel.org/r/7ae4baec-4eca-e70b-2a69-94bea4fb19fa@suse.cz
    Link: http://lkml.kernel.org/r/20160810091226.6709-11-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Tested-by: Lorenzo Stoakes
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Rik van Riel
    Signed-off-by: Vlastimil Babka
    Tested-by: Lorenzo Stoakes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • The __compaction_suitable() function checks the low watermark plus a
    compact_gap() gap to decide if there's enough free memory to perform
    compaction. This check uses direct compactor's alloc_flags, but that's
    wrong, since these flags are not applicable for freepage isolation.

    For example, alloc_flags may indicate access to memory reserves, making
    compaction proceed, and then fail watermark check during the isolation.

    A similar problem exists for ALLOC_CMA, which may be part of
    alloc_flags, but not during freepage isolation. In this case however it
    makes sense to use ALLOC_CMA both in __compaction_suitable() and
    __isolate_free_page(), since there's actually nothing preventing the
    freepage scanner to isolate from CMA pageblocks, with the assumption
    that a page that could be migrated once by compaction can be migrated
    also later by CMA allocation. Thus we should count pages in CMA
    pageblocks when considering compaction suitability and when isolating
    freepages.

    To sum up, this patch should remove some false positives from
    __compaction_suitable(), and allow compaction to proceed when free pages
    required for compaction reside in the CMA pageblocks.

    Link: http://lkml.kernel.org/r/20160810091226.6709-10-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Tested-by: Lorenzo Stoakes
    Cc: Michal Hocko
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Compaction uses a watermark gap of (2UL << order) pages at various
    places and it's not immediately obvious why. Abstract it through a
    compact_gap() wrapper to create a single place with a thorough
    explanation.

    [vbabka@suse.cz: clarify the comment of compact_gap()]
    Link: http://lkml.kernel.org/r/7b6aed1f-fdf8-2063-9ff4-bbe4de712d37@suse.cz
    Link: http://lkml.kernel.org/r/20160810091226.6709-9-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Tested-by: Lorenzo Stoakes
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Rik van Riel
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • The __compact_finished() function uses low watermark in a check that has
    to pass if the direct compaction is to finish and allocation should
    succeed. This is too pessimistic, as the allocation will typically use
    min watermark. It may happen that during compaction, we drop below the
    low watermark (due to parallel activity), but still form the target
    high-order page. By checking against low watermark, we might needlessly
    continue compaction.

    Similarly, __compaction_suitable() uses low watermark in a check whether
    allocation can succeed without compaction. Again, this is unnecessarily
    pessimistic.

    After this patch, these check will use direct compactor's alloc_flags to
    determine the watermark, which is effectively the min watermark.

    Link: http://lkml.kernel.org/r/20160810091226.6709-8-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Tested-by: Lorenzo Stoakes
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • During reclaim/compaction loop, it's desirable to get a final answer
    from unsuccessful compaction so we can either fail the allocation or
    invoke the OOM killer. However, heuristics such as deferred compaction
    or pageblock skip bits can cause compaction to skip parts or whole zones
    and lead to premature OOM's, failures or excessive reclaim/compaction
    retries.

    To remedy this, we introduce a new direct compaction priority called
    COMPACT_PRIO_SYNC_FULL, which instructs direct compaction to:

    - ignore deferred compaction status for a zone
    - ignore pageblock skip hints
    - ignore cached scanner positions and scan the whole zone

    The new priority should get eventually picked up by
    should_compact_retry() and this should improve success rates for costly
    allocations using __GFP_REPEAT, such as hugetlbfs allocations, and
    reduce some corner-case OOM's for non-costly allocations.

    Link: http://lkml.kernel.org/r/20160810091226.6709-6-vbabka@suse.cz
    [vbabka@suse.cz: use the MIN_COMPACT_PRIORITY alias]
    Link: http://lkml.kernel.org/r/d443b884-87e7-1c93-8684-3a3a35759fb1@suse.cz
    Signed-off-by: Vlastimil Babka
    Tested-by: Lorenzo Stoakes
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Joonsoo has reminded me that in a later patch changing watermark checks
    throughout compaction I forgot to update checks in
    try_to_compact_pages() and compactd_do_work(). Closer inspection
    however shows that they are redundant now in the success case, because
    compact_zone() now reliably reports this with COMPACT_SUCCESS. So
    effectively the checks just repeat (a subset) of checks that have just
    passed. So instead of checking watermarks again, just test the return
    value.

    Note it's also possible that compaction would declare failure e.g.
    because its find_suitable_fallback() is more strict than simple
    watermark check, and then the watermark check we are removing would then
    still succeed. After this patch this is not possible and it's arguably
    better, because for long-term fragmentation avoidance we should rather
    try a different zone than allocate with the unsuitable fallback. If
    compaction of all zones fail and the allocation is important enough, it
    will retry and succeed anyway.

    Also remove the stray "bool success" variable from kcompactd_do_work().

    Link: http://lkml.kernel.org/r/20160810091226.6709-5-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Reported-by: Joonsoo Kim
    Tested-by: Lorenzo Stoakes
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: David Rientjes
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • COMPACT_PARTIAL has historically meant that compaction returned after
    doing some work without fully compacting a zone. It however didn't
    distinguish if compaction terminated because it succeeded in creating
    the requested high-order page. This has changed recently and now we
    only return COMPACT_PARTIAL when compaction thinks it succeeded, or the
    high-order watermark check in compaction_suitable() passes and no
    compaction needs to be done.

    So at this point we can make the return value clearer by renaming it to
    COMPACT_SUCCESS. The next patch will remove some redundant tests for
    success where compaction just returned COMPACT_SUCCESS.

    Link: http://lkml.kernel.org/r/20160810091226.6709-4-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Tested-by: Lorenzo Stoakes
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Since kswapd compaction moved to kcompactd, compact_pgdat() is not
    called anymore, so we remove it. The only caller of __compact_pgdat()
    is compact_node(), so we merge them and remove code that was only
    reachable from kswapd.

    Link: http://lkml.kernel.org/r/20160810091226.6709-3-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Tested-by: Lorenzo Stoakes
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Joonsoo Kim
    Cc: David Rientjes
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka