29 Jul, 2016

40 commits

  • dequeue_hwpoisoned_huge_page() can be called without page lock hold, so
    let's remove incorrect comment.

    The reason why the page lock is not really needed is that
    dequeue_hwpoisoned_huge_page() checks page_huge_active() inside
    hugetlb_lock, which allows us to avoid trying to dequeue a hugepage that
    are just allocated but not linked to active list yet, even without
    taking page lock.

    Link: http://lkml.kernel.org/r/20160720092901.GA15995@www9186uo.sakura.ne.jp
    Signed-off-by: Naoya Horiguchi
    Reported-by: Zhan Chen
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • When CONFIG_SPARSEMEM_EXTREME is disabled, __section_nr can get the
    section number with a subtraction directly.

    Link: http://lkml.kernel.org/r/1468988310-11560-1-git-send-email-zhouchengming1@huawei.com
    Signed-off-by: Zhou Chengming
    Cc: Dave Hansen
    Cc: Tejun Heo
    Cc: Hanjun Guo
    Cc: Li Bin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhou Chengming
     
  • If the user tries to disable automatic scanning early in the boot
    process using e.g.:

    echo scan=off > /sys/kernel/debug/kmemleak

    then this command will hang until SECS_FIRST_SCAN (= 60) seconds have
    elapsed, even though the system is fully initialised.

    We can fix this using interruptible sleep and checking if we're supposed
    to stop whenever we wake up (like the rest of the code does).

    Link: http://lkml.kernel.org/r/1468835005-2873-1-git-send-email-vegard.nossum@oracle.com
    Signed-off-by: Vegard Nossum
    Acked-by: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     
  • When booting an ACPI enabled kernel with 'mem=x', there is the
    possibility that ACPI data regions from the firmware will lie above the
    memory limit. Ordinarily these will be removed by
    memblock_enforce_memory_limit(.).

    Unfortunately, this means that these regions will then be mapped by
    acpi_os_ioremap(.) as device memory (instead of normal) thus unaligned
    accessess will then provoke alignment faults.

    In this patch we adopt memblock_mem_limit_remove_map instead, and this
    preserves these ACPI data regions (marked NOMAP) thus ensuring that
    these regions are not mapped as device memory.

    For example, below is an alignment exception observed on ARM platform
    when booting the kernel with 'acpi=on mem=8G':

    ...
    Unable to handle kernel paging request at virtual address ffff0000080521e7
    pgd = ffff000008aa0000
    [ffff0000080521e7] *pgd=000000801fffe003, *pud=000000801fffd003, *pmd=000000801fffc003, *pte=00e80083ff1c1707
    Internal error: Oops: 96000021 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc3-next-20160616+ #172
    Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1001A 02/09/2016
    task: ffff800001ef0000 ti: ffff800001ef8000 task.ti: ffff800001ef8000
    PC is at acpi_ns_lookup+0x520/0x734
    LR is at acpi_ns_lookup+0x4a4/0x734
    pc : [] lr : [] pstate: 60000045
    sp : ffff800001efb8b0
    x29: ffff800001efb8c0 x28: 000000000000001b
    x27: 0000000000000001 x26: 0000000000000000
    x25: ffff800001efb9e8 x24: ffff000008a10000
    x23: 0000000000000001 x22: 0000000000000001
    x21: ffff000008724000 x20: 000000000000001b
    x19: ffff0000080521e7 x18: 000000000000000d
    x17: 00000000000038ff x16: 0000000000000002
    x15: 0000000000000007 x14: 0000000000007fff
    x13: ffffff0000000000 x12: 0000000000000018
    x11: 000000001fffd200 x10: 00000000ffffff76
    x9 : 000000000000005f x8 : ffff000008725fa8
    x7 : ffff000008a8df70 x6 : ffff000008a8df70
    x5 : ffff000008a8d000 x4 : 0000000000000010
    x3 : 0000000000000010 x2 : 000000000000000c
    x1 : 0000000000000006 x0 : 0000000000000000
    ...
    acpi_ns_lookup+0x520/0x734
    acpi_ds_load1_begin_op+0x174/0x4fc
    acpi_ps_build_named_op+0xf8/0x220
    acpi_ps_create_op+0x208/0x33c
    acpi_ps_parse_loop+0x204/0x838
    acpi_ps_parse_aml+0x1bc/0x42c
    acpi_ns_one_complete_parse+0x1e8/0x22c
    acpi_ns_parse_table+0x8c/0x128
    acpi_ns_load_table+0xc0/0x1e8
    acpi_tb_load_namespace+0xf8/0x2e8
    acpi_load_tables+0x7c/0x110
    acpi_init+0x90/0x2c0
    do_one_initcall+0x38/0x12c
    kernel_init_freeable+0x148/0x1ec
    kernel_init+0x10/0xec
    ret_from_fork+0x10/0x40
    Code: b9009fbc 2a00037b 36380057 3219037b (b9400260)
    ---[ end trace 03381e5eb0a24de4 ]---
    Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

    With 'efi=debug', we can see those ACPI regions loaded by firmware on
    that board as:

    efi: 0x0083ff185000-0x0083ff1b4fff [Reserved | | | | | | | | |WB|WT|WC|UC]*
    efi: 0x0083ff1b5000-0x0083ff1c2fff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC]*
    efi: 0x0083ff223000-0x0083ff224fff [ACPI Memory NVS | | | | | | | | |WB|WT|WC|UC]*

    Link: http://lkml.kernel.org/r/1468475036-5852-3-git-send-email-dennis.chen@arm.com
    Acked-by: Steve Capper
    Signed-off-by: Dennis Chen
    Cc: Catalin Marinas
    Cc: Ard Biesheuvel
    Cc: Pekka Enberg
    Cc: Mel Gorman
    Cc: Tang Chen
    Cc: Tony Luck
    Cc: Ingo Molnar
    Cc: Rafael J. Wysocki
    Cc: Will Deacon
    Cc: Mark Rutland
    Cc: Matt Fleming
    Cc: Kaly Xin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dennis Chen
     
  • In some cases, memblock is queried by kernel to determine whether a
    specified address is RAM or not. For example, the ACPI core needs this
    information to determine which attributes to use when mapping ACPI
    regions(acpi_os_ioremap). Use of incorrect memory types can result in
    faults, data corruption, or other issues.

    Removing memory with memblock_enforce_memory_limit() throws away this
    information, and so a kernel booted with 'mem=' may suffer from the
    issues described above. To avoid this, we need to keep those NOMAP
    regions instead of removing all above the limit, which preserves the
    information we need while preventing other use of those regions.

    This patch adds new infrastructure to retain all NOMAP memblock regions
    while removing others, to cater for this.

    Link: http://lkml.kernel.org/r/1468475036-5852-2-git-send-email-dennis.chen@arm.com
    Signed-off-by: Dennis Chen
    Acked-by: Steve Capper
    Cc: Catalin Marinas
    Cc: Ard Biesheuvel
    Cc: Pekka Enberg
    Cc: Mel Gorman
    Cc: Tang Chen
    Cc: Tony Luck
    Cc: Ingo Molnar
    Cc: Rafael J. Wysocki
    Cc: Will Deacon
    Cc: Mark Rutland
    Cc: Matt Fleming
    Cc: Kaly Xin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dennis Chen
     
  • We currently show:

    task: ti: task.ti: "

    "ti" and "task.ti" are redundant, and neither is actually what we want
    to show, which the the base of the thread stack. Change the display to
    show the stack pointer explicitly.

    Link: http://lkml.kernel.org/r/543ac5bd66ff94000a57a02e11af7239571a3055.1468523549.git.luto@kernel.org
    Signed-off-by: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • We'll need this cleanup to make the cpu field in thread_info be
    optional.

    Link: http://lkml.kernel.org/r/da298328dc77ea494576c2f20a934218e758a6fa.1468523549.git.luto@kernel.org
    Signed-off-by: Andy Lutomirski
    Cc: Jason Wessel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • We should account for stacks regardless of stack size, and we need to
    account in sub-page units if THREAD_SIZE < PAGE_SIZE. Change the units
    to kilobytes and Move it into account_kernel_stack().

    Fixes: 12580e4b54ba8 ("mm: memcontrol: report kernel stack usage in cgroup2 memory.stat")
    Link: http://lkml.kernel.org/r/9b5314e3ee5eda61b0317ec1563768602c1ef438.1468523549.git.luto@kernel.org
    Signed-off-by: Andy Lutomirski
    Cc: Vladimir Davydov
    Acked-by: Johannes Weiner
    Cc: Michal Hocko
    Reviewed-by: Josh Poimboeuf
    Reviewed-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a zone.
    This only makes sense if each kernel stack exists entirely in one zone,
    and allowing vmapped stacks could break this assumption.

    Since frv has THREAD_SIZE < PAGE_SIZE, we need to track kernel stack
    allocations in a unit that divides both THREAD_SIZE and PAGE_SIZE on all
    architectures. Keep it simple and use KiB.

    Link: http://lkml.kernel.org/r/083c71e642c5fa5f1b6898902e1b2db7b48940d4.1468523549.git.luto@kernel.org
    Signed-off-by: Andy Lutomirski
    Cc: Vladimir Davydov
    Acked-by: Johannes Weiner
    Cc: Michal Hocko
    Reviewed-by: Josh Poimboeuf
    Reviewed-by: Vladimir Davydov
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     
  • Now that ZONE_DEVICE depends on SPARSEMEM_VMEMMAP we can simplify some
    ifdef guards to just ZONE_DEVICE.

    Link: http://lkml.kernel.org/r/146687646788.39261.8020536391978771940.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams
    Reported-by: Vlastimil Babka
    Cc: Eric Sandeen
    Cc: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • When it was first introduced CONFIG_ZONE_DEVICE depended on disabling
    CONFIG_ZONE_DMA, a configuration choice reserved for "experts".
    However, now that the ZONE_DMA conflict has been eliminated it no longer
    makes sense to require CONFIG_EXPERT.

    Link: http://lkml.kernel.org/r/146687646274.39261.14267596518720371009.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams
    Reported-by: Eric Sandeen
    Reported-by: Jeff Moyer
    Acked-by: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • asm-generic headers are generic implementations for architecture
    specific code and should not be included by common code. Thus use the
    asm/ version of sections.h to get at the linker sections.

    Link: http://lkml.kernel.org/r/1468285103-7470-1-git-send-email-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • The definition of return value of madvise_free_huge_pmd is not clear
    before. According to the suggestion of Minchan Kim, change the type of
    return value to bool and return true if we do MADV_FREE successfully on
    entire pmd page, otherwise, return false. Comments are added too.

    Link: http://lkml.kernel.org/r/1467135452-16688-2-git-send-email-ying.huang@intel.com
    Signed-off-by: "Huang, Ying"
    Acked-by: Minchan Kim
    Cc: "Kirill A. Shutemov"
    Cc: Jerome Marchand
    Cc: Vlastimil Babka
    Cc: Dan Williams
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Ebru Akagunduz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     
  • Use ClearPagePrivate/ClearPagePrivate2 helpers to clear
    PG_private/PG_private_2 in page->flags

    Link: http://lkml.kernel.org/r/1467882338-4300-7-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Acked-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • Add __init,__exit attribute for function that only called in module
    init/exit to save memory.

    Link: http://lkml.kernel.org/r/1467882338-4300-6-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Cc: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • Some minor commebnt changes:

    1). update zs_malloc(),zs_create_pool() function header
    2). update "Usage of struct page fields"

    Link: http://lkml.kernel.org/r/1467882338-4300-5-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • Currently, if a class can not be merged, the max objects of zspage in
    that class may be calculated twice.

    This patch calculate max objects of zspage at the begin, and pass the
    value to can_merge() to decide whether the class can be merged.

    Also this patch remove function get_maxobj_per_zspage(), as there is no
    other place to call this function.

    Link: http://lkml.kernel.org/r/1467882338-4300-4-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • num of max objects in zspage is stored in each size_class now. So there
    is no need to re-calculate it.

    Link: http://lkml.kernel.org/r/1467882338-4300-3-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Acked-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • the obj index value should be updated after return from
    find_alloced_obj() to avoid CPU burning caused by unnecessary object
    scanning.

    Link: http://lkml.kernel.org/r/1467882338-4300-2-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • This is a cleanup patch. Change "index" to "obj_index" to keep
    consistent with others in zsmalloc.

    Link: http://lkml.kernel.org/r/1467882338-4300-1-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • With node-lru, if there are enough reclaimable pages in highmem but
    nothing in lowmem, VM can try to shrink inactive list although the
    requested zone is lowmem.

    The problem is that if the inactive list is full of highmem pages then a
    direct reclaimer searching for a lowmem page waste CPU scanning
    uselessly. It just burns out CPU. Even, many direct reclaimers are
    stalled by too_many_isolated if lots of parallel reclaimer are going on
    although there are no reclaimable memory in inactive list.

    I tried the experiment 4 times in 32bit 2G 8 CPU KVM machine to get
    elapsed time.

    hackbench 500 process 2

    = Old =

    1st: 289s 2nd: 310s 3rd: 112s 4th: 272s

    = Now =

    1st: 31s 2nd: 132s 3rd: 162s 4th: 50s.

    [akpm@linux-foundation.org: fixes per Mel]
    Link: http://lkml.kernel.org/r/1469433119-1543-1-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Acked-by: Mel Gorman
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Page reclaim determines whether a pgdat is unreclaimable by examining
    how many pages have been scanned since a page was freed and comparing
    that to the LRU sizes. Skipped pages are not reclaim candidates but
    contribute to scanned. This can prematurely mark a pgdat as
    unreclaimable and trigger an OOM kill.

    This patch accounts for skipped pages as a partial scan so that an
    unreclaimable pgdat will still be marked as such but by scaling the cost
    of a skip, it'll avoid the pgdat being marked prematurely.

    Link: http://lkml.kernel.org/r/1469110261-7365-6-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Johannes Weiner
    Cc: Minchan Kim
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Minchan Kim reported that with per-zone lru state it was possible to
    identify that a normal zone with 8^M anonymous pages could trigger OOM
    with non-atomic order-0 allocations as all pages in the zone were in the
    active list.

    gfp_mask=0x26004c0(GFP_KERNEL|__GFP_REPEAT|__GFP_NOTRACK), order=0
    Call Trace:
    __alloc_pages_nodemask+0xe52/0xe60
    ? new_slab+0x39c/0x3b0
    new_slab+0x39c/0x3b0
    ___slab_alloc.constprop.87+0x6da/0x840
    ? __alloc_skb+0x3c/0x260
    ? enqueue_task_fair+0x73/0xbf0
    ? poll_select_copy_remaining+0x140/0x140
    __slab_alloc.isra.81.constprop.86+0x40/0x6d
    ? __alloc_skb+0x3c/0x260
    kmem_cache_alloc+0x22c/0x260
    ? __alloc_skb+0x3c/0x260
    __alloc_skb+0x3c/0x260
    alloc_skb_with_frags+0x4e/0x1a0
    sock_alloc_send_pskb+0x16a/0x1b0
    ? wait_for_unix_gc+0x31/0x90
    unix_stream_sendmsg+0x28d/0x340
    sock_sendmsg+0x2d/0x40
    sock_write_iter+0x6c/0xc0
    __vfs_write+0xc0/0x120
    vfs_write+0x9b/0x1a0
    ? __might_fault+0x49/0xa0
    SyS_write+0x44/0x90
    do_fast_syscall_32+0xa6/0x1e0

    Mem-Info:
    active_anon:101103 inactive_anon:102219 isolated_anon:0
    active_file:503 inactive_file:544 isolated_file:0
    unevictable:0 dirty:0 writeback:34 unstable:0
    slab_reclaimable:6298 slab_unreclaimable:74669
    mapped:863 shmem:0 pagetables:100998 bounce:0
    free:23573 free_pcp:1861 free_cma:0
    Node 0 active_anon:404412kB inactive_anon:409040kB active_file:2012kB inactive_file:2176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3452kB dirty:0kB writeback:136kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1320845 all_unreclaimable? yes
    DMA free:3296kB min:68kB low:84kB high:100kB active_anon:5540kB inactive_anon:0kB active_file:0kB inactive_file:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:248kB slab_unreclaimable:2628kB kernel_stack:792kB pagetables:2316kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
    lowmem_reserve[]: 0 809 1965 1965
    Normal free:3600kB min:3604kB low:4504kB high:5404kB active_anon:86304kB inactive_anon:0kB active_file:160kB inactive_file:376kB present:897016kB managed:858524kB mlocked:0kB slab_reclaimable:24944kB slab_unreclaimable:296048kB kernel_stack:163832kB pagetables:35892kB bounce:0kB free_pcp:3076kB local_pcp:656kB free_cma:0kB
    lowmem_reserve[]: 0 0 9247 9247
    HighMem free:86156kB min:512kB low:1796kB high:3080kB active_anon:312852kB inactive_anon:410024kB active_file:1924kB inactive_file:2012kB present:1183736kB managed:1183736kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:365784kB bounce:0kB free_pcp:3868kB local_pcp:720kB free_cma:0kB
    lowmem_reserve[]: 0 0 0 0
    DMA: 8*4kB (UM) 8*8kB (UM) 4*16kB (M) 2*32kB (UM) 2*64kB (UM) 1*128kB (M) 3*256kB (UME) 2*512kB (UE) 1*1024kB (E) 0*2048kB 0*4096kB = 3296kB
    Normal: 240*4kB (UME) 160*8kB (UME) 23*16kB (ME) 3*32kB (UE) 3*64kB (UME) 2*128kB (ME) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3408kB
    HighMem: 10942*4kB (UM) 3102*8kB (UM) 866*16kB (UM) 76*32kB (UM) 11*64kB (UM) 4*128kB (UM) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 86344kB
    Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
    54409 total pagecache pages
    53215 pages in swap cache
    Swap cache stats: add 300982, delete 247765, find 157978/226539
    Free swap = 3803244kB
    Total swap = 4192252kB
    524186 pages RAM
    295934 pages HighMem/MovableOnly
    9642 pages reserved
    0 pages cma reserved

    The problem is due to the active deactivation logic in
    inactive_list_is_low:

    Node 0 active_anon:404412kB inactive_anon:409040kB

    IOW, (inactive_anon of node * inactive_ratio > active_anon of node) due
    to highmem anonymous stat so VM never deactivates normal zone's
    anonymous pages.

    This patch is a modified version of Minchan's original solution but
    based upon it. The problem with Minchan's patch is that any low zone
    with an imbalanced list could force a rotation.

    In this patch, a zone-constrained global reclaim will rotate the list if
    the inactive/active ratio of all eligible zones needs to be corrected.
    It is possible that higher zone pages will be initially rotated
    prematurely but this is the safer choice to maintain overall LRU age.

    Link: http://lkml.kernel.org/r/20160722090929.GJ10438@techsingularity.net
    Signed-off-by: Minchan Kim
    Signed-off-by: Mel Gorman
    Acked-by: Johannes Weiner
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • If per-zone LRU accounting is available then there is no point
    approximating whether reclaim and compaction should retry based on pgdat
    statistics. This is effectively a revert of "mm, vmstat: remove zone
    and node double accounting by approximating retries" with the difference
    that inactive/active stats are still available. This preserves the
    history of why the approximation was retried and why it had to be
    reverted to handle OOM kills on 32-bit systems.

    Link: http://lkml.kernel.org/r/1469110261-7365-4-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Johannes Weiner
    Acked-by: Minchan Kim
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • With the reintroduction of per-zone LRU stats, highmem_file_pages is
    redundant so remove it.

    [mgorman@techsingularity.net: wrong stat is being accumulated in highmem_dirtyable_memory]
    Link: http://lkml.kernel.org/r/20160725092324.GM10438@techsingularity.netLink: http://lkml.kernel.org/r/1469110261-7365-3-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Johannes Weiner
    Cc: Minchan Kim
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • When I did stress test with hackbench, I got OOM message frequently
    which didn't ever happen in zone-lru.

    gfp_mask=0x26004c0(GFP_KERNEL|__GFP_REPEAT|__GFP_NOTRACK), order=0
    ..
    ..
    __alloc_pages_nodemask+0xe52/0xe60
    ? new_slab+0x39c/0x3b0
    new_slab+0x39c/0x3b0
    ___slab_alloc.constprop.87+0x6da/0x840
    ? __alloc_skb+0x3c/0x260
    ? _raw_spin_unlock_irq+0x27/0x60
    ? trace_hardirqs_on_caller+0xec/0x1b0
    ? finish_task_switch+0xa6/0x220
    ? poll_select_copy_remaining+0x140/0x140
    __slab_alloc.isra.81.constprop.86+0x40/0x6d
    ? __alloc_skb+0x3c/0x260
    kmem_cache_alloc+0x22c/0x260
    ? __alloc_skb+0x3c/0x260
    __alloc_skb+0x3c/0x260
    alloc_skb_with_frags+0x4e/0x1a0
    sock_alloc_send_pskb+0x16a/0x1b0
    ? wait_for_unix_gc+0x31/0x90
    ? alloc_set_pte+0x2ad/0x310
    unix_stream_sendmsg+0x28d/0x340
    sock_sendmsg+0x2d/0x40
    sock_write_iter+0x6c/0xc0
    __vfs_write+0xc0/0x120
    vfs_write+0x9b/0x1a0
    ? __might_fault+0x49/0xa0
    SyS_write+0x44/0x90
    do_fast_syscall_32+0xa6/0x1e0
    sysenter_past_esp+0x45/0x74

    Mem-Info:
    active_anon:104698 inactive_anon:105791 isolated_anon:192
    active_file:433 inactive_file:283 isolated_file:22
    unevictable:0 dirty:0 writeback:296 unstable:0
    slab_reclaimable:6389 slab_unreclaimable:78927
    mapped:474 shmem:0 pagetables:101426 bounce:0
    free:10518 free_pcp:334 free_cma:0
    Node 0 active_anon:418792kB inactive_anon:423164kB active_file:1732kB inactive_file:1132kB unevictable:0kB isolated(anon):768kB isolated(file):88kB mapped:1896kB dirty:0kB writeback:1184kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1478632 all_unreclaimable? yes
    DMA free:3304kB min:68kB low:84kB high:100kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:4088kB kernel_stack:0kB pagetables:2480kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
    lowmem_reserve[]: 0 809 1965 1965
    Normal free:3436kB min:3604kB low:4504kB high:5404kB present:897016kB managed:858460kB mlocked:0kB slab_reclaimable:25556kB slab_unreclaimable:311712kB kernel_stack:164608kB pagetables:30844kB bounce:0kB free_pcp:620kB local_pcp:104kB free_cma:0kB
    lowmem_reserve[]: 0 0 9247 9247
    HighMem free:33808kB min:512kB low:1796kB high:3080kB present:1183736kB managed:1183736kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:372252kB bounce:0kB free_pcp:428kB local_pcp:72kB free_cma:0kB
    lowmem_reserve[]: 0 0 0 0
    DMA: 2*4kB (UM) 2*8kB (UM) 0*16kB 1*32kB (U) 1*64kB (U) 2*128kB (UM) 1*256kB (U) 1*512kB (M) 0*1024kB 1*2048kB (U) 0*4096kB = 3192kB
    Normal: 33*4kB (MH) 79*8kB (ME) 11*16kB (M) 4*32kB (M) 2*64kB (ME) 2*128kB (EH) 7*256kB (EH) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3244kB
    HighMem: 2590*4kB (UM) 1568*8kB (UM) 491*16kB (UM) 60*32kB (UM) 6*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 33064kB
    Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
    25121 total pagecache pages
    24160 pages in swap cache
    Swap cache stats: add 86371, delete 62211, find 42865/60187
    Free swap = 4015560kB
    Total swap = 4192252kB
    524186 pages RAM
    295934 pages HighMem/MovableOnly
    9658 pages reserved
    0 pages cma reserved

    The order-0 allocation for normal zone failed while there are a lot of
    reclaimable memory(i.e., anonymous memory with free swap). I wanted to
    analyze the problem but it was hard because we removed per-zone lru stat
    so I couldn't know how many of anonymous memory there are in normal/dma
    zone.

    When we investigate OOM problem, reclaimable memory count is crucial
    stat to find a problem. Without it, it's hard to parse the OOM message
    so I believe we should keep it.

    With per-zone lru stat,

    gfp_mask=0x26004c0(GFP_KERNEL|__GFP_REPEAT|__GFP_NOTRACK), order=0
    Mem-Info:
    active_anon:101103 inactive_anon:102219 isolated_anon:0
    active_file:503 inactive_file:544 isolated_file:0
    unevictable:0 dirty:0 writeback:34 unstable:0
    slab_reclaimable:6298 slab_unreclaimable:74669
    mapped:863 shmem:0 pagetables:100998 bounce:0
    free:23573 free_pcp:1861 free_cma:0
    Node 0 active_anon:404412kB inactive_anon:409040kB active_file:2012kB inactive_file:2176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3452kB dirty:0kB writeback:136kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1320845 all_unreclaimable? yes
    DMA free:3296kB min:68kB low:84kB high:100kB active_anon:5540kB inactive_anon:0kB active_file:0kB inactive_file:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:248kB slab_unreclaimable:2628kB kernel_stack:792kB pagetables:2316kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
    lowmem_reserve[]: 0 809 1965 1965
    Normal free:3600kB min:3604kB low:4504kB high:5404kB active_anon:86304kB inactive_anon:0kB active_file:160kB inactive_file:376kB present:897016kB managed:858524kB mlocked:0kB slab_reclaimable:24944kB slab_unreclaimable:296048kB kernel_stack:163832kB pagetables:35892kB bounce:0kB free_pcp:3076kB local_pcp:656kB free_cma:0kB
    lowmem_reserve[]: 0 0 9247 9247
    HighMem free:86156kB min:512kB low:1796kB high:3080kB active_anon:312852kB inactive_anon:410024kB active_file:1924kB inactive_file:2012kB present:1183736kB managed:1183736kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:365784kB bounce:0kB free_pcp:3868kB local_pcp:720kB free_cma:0kB
    lowmem_reserve[]: 0 0 0 0
    DMA: 8*4kB (UM) 8*8kB (UM) 4*16kB (M) 2*32kB (UM) 2*64kB (UM) 1*128kB (M) 3*256kB (UME) 2*512kB (UE) 1*1024kB (E) 0*2048kB 0*4096kB = 3296kB
    Normal: 240*4kB (UME) 160*8kB (UME) 23*16kB (ME) 3*32kB (UE) 3*64kB (UME) 2*128kB (ME) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3408kB
    HighMem: 10942*4kB (UM) 3102*8kB (UM) 866*16kB (UM) 76*32kB (UM) 11*64kB (UM) 4*128kB (UM) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 86344kB
    Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
    54409 total pagecache pages
    53215 pages in swap cache
    Swap cache stats: add 300982, delete 247765, find 157978/226539
    Free swap = 3803244kB
    Total swap = 4192252kB
    524186 pages RAM
    295934 pages HighMem/MovableOnly
    9642 pages reserved
    0 pages cma reserved

    With that, we can see normal zone has a 86M reclaimable memory so we can
    know something goes wrong(I will fix the problem in next patch) in
    reclaim.

    [mgorman@techsingularity.net: rename zone LRU stats in /proc/vmstat]
    Link: http://lkml.kernel.org/r/20160725072300.GK10438@techsingularity.net
    Link: http://lkml.kernel.org/r/1469110261-7365-2-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Minchan Kim
    Signed-off-by: Mel Gorman
    Acked-by: Johannes Weiner
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • With node-lru, the locking is based on the pgdat. As Minchan pointed
    out, there is an opportunity to reduce LRU lock release/acquire in
    check_move_unevictable_pages by only changing lock on a pgdat change.

    [mgorman@techsingularity.net: remove double initialisation]
    Link: http://lkml.kernel.org/r/20160719074835.GC10438@techsingularity.net
    Link: http://lkml.kernel.org/r/1468853426-12858-3-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Johannes Weiner
    Cc: Minchan Kim
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • As pointed out by Minchan Kim, shrink_zones() checks for populated zones
    in a zonelist but a zonelist can never contain unpopulated zones. While
    it's not related to the node-lru series, it can be cleaned up now.

    Link: http://lkml.kernel.org/r/1468853426-12858-2-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Suggested-by: Minchan Kim
    Acked-by: Minchan Kim
    Acked-by: Johannes Weiner
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Minchan Kim reported setting the following warning on a 32-bit system
    although it can affect 64-bit systems.

    WARNING: CPU: 4 PID: 1322 at mm/memcontrol.c:998 mem_cgroup_update_lru_size+0x103/0x110
    mem_cgroup_update_lru_size(f44b4000, 1, -7): zid 1 lru_size 1 but empty
    Modules linked in:
    CPU: 4 PID: 1322 Comm: cp Not tainted 4.7.0-rc4-mm1+ #143
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
    dump_stack+0x76/0xaf
    __warn+0xea/0x110
    ? mem_cgroup_update_lru_size+0x103/0x110
    warn_slowpath_fmt+0x3b/0x40
    mem_cgroup_update_lru_size+0x103/0x110
    isolate_lru_pages.isra.61+0x2e2/0x360
    shrink_active_list+0xac/0x2a0
    ? __delay+0xe/0x10
    shrink_node_memcg+0x53c/0x7a0
    shrink_node+0xab/0x2a0
    do_try_to_free_pages+0xc6/0x390
    try_to_free_pages+0x245/0x590

    LRU list contents and counts are updated separately. Counts are updated
    before pages are added to the LRU and updated after pages are removed.
    The warning above is from a check in mem_cgroup_update_lru_size that
    ensures that list sizes of zero are empty.

    The problem is that node-lru needs to account for highmem pages if
    CONFIG_HIGHMEM is set. One impact of the implementation is that the
    sizes are updated in multiple passes when pages from multiple zones were
    isolated. This happens whether HIGHMEM is set or not. When multiple
    zones are isolated, it's possible for a debugging check in memcg to be
    tripped.

    This patch forces all the zone counts to be updated before the memcg
    function is called.

    Link: http://lkml.kernel.org/r/1468588165-12461-6-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Tested-by: Minchan Kim
    Reported-by: Minchan Kim
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The node_pages_scanned represents the number of scanned pages of node
    for reclaim so it's pointless to show it as kilobytes.

    As well, node_pages_scanned is per-node value, not per-zone.

    This patch changes node_pages_scanned per-zone-killobytes with
    per-node-count.

    [minchan@kernel.org: fix node_pages_scanned]
    Link: http://lkml.kernel.org/r/20160716101431.GA10305@bbox
    Link: http://lkml.kernel.org/r/1468588165-12461-5-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Minchan Kim
    Signed-off-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • With node-lru, the locking is based on the pgdat. Previously it was
    required that a pagevec drain released one zone lru_lock and acquired
    another zone lru_lock on every zone change. Now, it's only necessary if
    the node changes. The end-result is fewer lock release/acquires if the
    pages are all on the same node but in different zones.

    Link: http://lkml.kernel.org/r/1468588165-12461-4-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Minchan Kim
    Acked-by: Johannes Weiner
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • When I tested vmscale in mmtest in 32bit, I found the benchmark was slow
    down 0.5 times.

    base node
    1 global-1
    User 12.98 16.04
    System 147.61 166.42
    Elapsed 26.48 38.08

    With vmstat, I found IO wait avg is much increased compared to base.

    The reason was highmem_dirtyable_memory accumulates free pages and
    highmem_file_pages from HIGHMEM to MOVABLE zones which was wrong. With
    that, dirth_thresh in throtlle_vm_write is always 0 so that it calls
    congestion_wait frequently if writeback starts.

    With this patch, it is much recovered.

    base node fi
    1 global-1 fix
    User 12.98 16.04 13.78
    System 147.61 166.42 143.92
    Elapsed 26.48 38.08 29.64

    Link: http://lkml.kernel.org/r/1468404004-5085-4-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Minchan Kim
    Signed-off-by: Mel Gorman
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • The number of LRU pages, dirty pages and writeback pages must be
    accounted for on both zones and nodes because of the reclaim retry
    logic, compaction retry logic and highmem calculations all depending on
    per-zone stats.

    Many lowmem allocations are immune from OOM kill due to a check in
    __alloc_pages_may_oom for (ac->high_zoneidx < ZONE_NORMAL) since commit
    03668b3ceb0c ("oom: avoid oom killer for lowmem allocations"). The
    exception is costly high-order allocations or allocations that cannot
    fail. If the __alloc_pages_may_oom avoids OOM-kill for low-order lowmem
    allocations then it would fall through to __alloc_pages_direct_compact.

    This patch will blindly retry reclaim for zone-constrained allocations
    in should_reclaim_retry up to MAX_RECLAIM_RETRIES. This is not ideal
    but without per-zone stats there are not many alternatives. The impact
    it that zone-constrained allocations may delay before considering the
    OOM killer.

    As there is no guarantee enough memory can ever be freed to satisfy
    compaction, this patch avoids retrying compaction for zone-contrained
    allocations.

    In combination, that means that the per-node stats can be used when
    deciding whether to continue reclaim using a rough approximation. While
    it is possible this will make the wrong decision on occasion, it will
    not infinite loop as the number of reclaim attempts is capped by
    MAX_RECLAIM_RETRIES.

    The final step is calculating the number of dirtyable highmem pages. As
    those calculations only care about the global count of file pages in
    highmem. This patch uses a global counter used instead of per-zone
    stats as it is sufficient.

    In combination, this allows the per-zone LRU and dirty state counters to
    be removed.

    [mgorman@techsingularity.net: fix acct_highmem_file_pages()]
    Link: http://lkml.kernel.org/r/1468853426-12858-4-git-send-email-mgorman@techsingularity.netLink: http://lkml.kernel.org/r/1467970510-21195-35-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Suggested by: Michal Hocko
    Acked-by: Hillf Danton
    Cc: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • There are a number of stats that were previously accessible via zoneinfo
    that are now invisible. While it is possible to create a new file for
    the node stats, this may be missed by users. Instead this patch prints
    the stats under the first populated zone in /proc/zoneinfo.

    Link: http://lkml.kernel.org/r/1467970510-21195-34-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Hillf Danton
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The vmstat allocstall was fairly useful in the general sense but
    node-based LRUs change that. It's important to know if a stall was for
    an address-limited allocation request as this will require skipping
    pages from other zones. This patch adds pgstall_* counters to replace
    allocstall. The sum of the counters will equal the old allocstall so it
    can be trivially recalculated. A high number of address-limited
    allocation requests may result in a lot of useless LRU scanning for
    suitable pages.

    As address-limited allocations require pages to be skipped, it's
    important to know how much useless LRU scanning took place so this patch
    adds pgskip* counters. This yields the following model

    1. The number of address-space limited stalls can be accounted for (pgstall)
    2. The amount of useless work required to reclaim the data is accounted (pgskip)
    3. The total number of scans is available from pgscan_kswapd and pgscan_direct
    so from that the ratio of useful to useless scans can be calculated.

    [mgorman@techsingularity.net: s/pgstall/allocstall/]
    Link: http://lkml.kernel.org/r/1468404004-5085-3-git-send-email-mgorman@techsingularity.netLink: http://lkml.kernel.org/r/1467970510-21195-33-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Hillf Danton
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • This is partially a preparation patch for more vmstat work but it also
    has the slight advantage that __count_zid_vm_events is cheaper to
    calculate than __count_zone_vm_events().

    Link: http://lkml.kernel.org/r/1467970510-21195-32-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Hillf Danton
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • If a page is about to be dirtied then the page allocator attempts to
    limit the total number of dirty pages that exists in any given zone.
    The call to node_dirty_ok is expensive so this patch records if the last
    pgdat examined hit the dirty limits. In some cases, this reduces the
    number of calls to node_dirty_ok().

    Link: http://lkml.kernel.org/r/1467970510-21195-31-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Hillf Danton
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The fair zone allocation policy interleaves allocation requests between
    zones to avoid an age inversion problem whereby new pages are reclaimed
    to balance a zone. Reclaim is now node-based so this should no longer
    be an issue and the fair zone allocation policy is not free. This patch
    removes it.

    Link: http://lkml.kernel.org/r/1467970510-21195-30-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Hillf Danton
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • This is convenient when tracking down why the skip count is high because
    it'll show what classzone kswapd woke up at and what zones are being
    isolated.

    Link: http://lkml.kernel.org/r/1467970510-21195-29-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Hillf Danton
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The buffer_heads_over_limit limit in kswapd is inconsistent with direct
    reclaim behaviour. It may force an an attempt to reclaim from all zones
    and then not reclaim at all because higher zones were balanced than
    required by the original request.

    This patch will causes kswapd to consider reclaiming from all zones if
    buffer_heads_over_limit. However, if there are eligible zones for the
    allocation request that woke kswapd then no reclaim will occur even if
    buffer_heads_over_limit. This avoids kswapd over-reclaiming just
    because buffer_heads_over_limit.

    [mgorman@techsingularity.net: fix comment about buffer_heads_over_limit]
    Link: http://lkml.kernel.org/r/1468404004-5085-2-git-send-email-mgorman@techsingularity.net
    Link: http://lkml.kernel.org/r/1467970510-21195-28-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Cc: Hillf Danton
    Acked-by: Johannes Weiner
    Acked-by: Vlastimil Babka
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman