07 Jan, 2021

2 commits

  • Collect the time for each allocation recorded in page owner so that
    allocation "surges" can be measured.

    Record the pid for each allocation recorded in page owner so that the
    source of allocation "surges" can be better identified.

    The above is very useful when doing memory analysis. On a crash for
    example, we can get this information from kdump (or ramdump) and parse it
    to figure out memory allocation problems.

    Please note that on x86_64 this increases the size of struct page_owner
    from 16 bytes to 32.

    Vlastimil: it's not a functionality intended for production, so unless
    somebody says they need to enable page_owner for debugging and this
    increase prevents them from fitting into available memory, let's not
    complicate things with making this optional.

    [lmark@codeaurora.org: v3]
    Link: https://lkml.kernel.org/r/20201210160357.27779-1-georgi.djakov@linaro.org

    Link: https://lkml.kernel.org/r/20201209125153.10533-1-georgi.djakov@linaro.org
    Signed-off-by: Liam Mark
    Signed-off-by: Georgi Djakov
    Acked-by: Vlastimil Babka
    Acked-by: Joonsoo Kim
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    (cherry picked from commit 9cc7e96aa846f9086431d6c2d33ff9ab42d72b2d)

    Bug: 175129313
    Signed-off-by: Suren Baghdasaryan
    Change-Id: I5e246ea009c7e9e34c1cc608bcd3196fc0e623b4

    Liam Mark
     
  • Export get_page_owner symbol for loadable vendor
    modules.

    Bug: 176277889
    Change-Id: Iea0a8022e542d1223caf4a742a888647828ca7cc
    Signed-off-by: Vijayanand Jitta

    Vijayanand Jitta
     

17 Oct, 2020

2 commits

  • The current page_order() can only be called on pages in the buddy
    allocator. For compound pages, you have to use compound_order(). This is
    confusing and led to a bug, so rename page_order() to buddy_order().

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/20201001152259.14932-2-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     
  • The implementation of split_page_owner() prefers a count rather than the
    old order of the page. When we support a variable size THP, we won't
    have the order at this point, but we will have the number of pages.
    So change the interface to what the caller and callee would prefer.

    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Andrew Morton
    Reviewed-by: SeongJae Park
    Acked-by: Kirill A. Shutemov
    Cc: Huang Ying
    Link: https://lkml.kernel.org/r/20200908195539.25896-4-willy@infradead.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox (Oracle)
     

04 Jun, 2020

1 commit

  • Pageblock migrate type is encoded in GFP flags, just as zone_type and
    zonelist.

    Currently we use gfp_zone() and gfp_zonelist() to extract related
    information, it would be proper to use the same naming convention for
    migrate type.

    Signed-off-by: Wei Yang
    Signed-off-by: Andrew Morton
    Reviewed-by: Pankaj Gupta
    Link: http://lkml.kernel.org/r/20200329080823.7735-1-richard.weiyang@gmail.com
    Signed-off-by: Linus Torvalds

    Wei Yang
     

19 Oct, 2019

1 commit

  • Uninitialized memmaps contain garbage and in the worst case trigger
    kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get
    touched.

    For example, when not onlining a memory block that is spanned by a zone
    and reading /proc/pagetypeinfo with CONFIG_DEBUG_VM_PGFLAGS and
    CONFIG_PAGE_POISONING, we can trigger a kernel BUG:

    :/# echo 1 > /sys/devices/system/memory/memory40/online
    :/# echo 1 > /sys/devices/system/memory/memory42/online
    :/# cat /proc/pagetypeinfo > test.file
    page:fffff2c585200000 is uninitialized and poisoned
    raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
    raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
    page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
    There is not page extension available.
    ------------[ cut here ]------------
    kernel BUG at include/linux/mm.h:1107!
    invalid opcode: 0000 [#1] SMP NOPTI

    Please note that this change does not affect ZONE_DEVICE, because
    pagetypeinfo_showmixedcount_print() is called from
    mm/vmstat.c:pagetypeinfo_showmixedcount() only for populated zones, and
    ZONE_DEVICE is never populated (zone->present_pages always 0).

    [david@redhat.com: move check to outer loop, add comment, rephrase description]
    Link: http://lkml.kernel.org/r/20191011140638.8160-1-david@redhat.com
    Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319
    Signed-off-by: Qian Cai
    Signed-off-by: David Hildenbrand
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Thomas Gleixner
    Cc: "Peter Zijlstra (Intel)"
    Cc: Miles Chen
    Cc: Mike Rapoport
    Cc: Qian Cai
    Cc: Greg Kroah-Hartman
    Cc: [4.13+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     

15 Oct, 2019

3 commits

  • Commit 37389167a281 ("mm, page_owner: keep owner info when freeing the
    page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page
    is tracked as being allocated. Kirril suggested naming it
    PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat
    loaded term for a page".

    Link: http://lkml.kernel.org/r/20190930122916.14969-4-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Suggested-by: Kirill A. Shutemov
    Cc: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Kirill A. Shutemov
    Cc: Walter Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Commit 8974558f49a6 ("mm, page_owner, debug_pagealloc: save and dump
    freeing stack trace") enhanced page_owner to also store freeing stack
    trace, when debug_pagealloc is also enabled. KASAN would also like to
    do this [1] to improve error reports to debug e.g. UAF issues.

    Kirill has suggested that the freeing stack trace saving should be also
    possible to be enabled separately from KASAN or debug_pagealloc, i.e.
    with an extra boot option. Qian argued that we have enough options
    already, and avoiding the extra overhead is not worth the complications
    in the case of a debugging option. Kirill noted that the extra stack
    handle in struct page_owner requires 0.1% of memory.

    This patch therefore enables free stack saving whenever page_owner is
    enabled, regardless of whether debug_pagealloc or KASAN is also enabled.
    KASAN kernels booted with page_owner=on will thus benefit from the
    improved error reports.

    [1] https://bugzilla.kernel.org/show_bug.cgi?id=203967

    [vbabka@suse.cz: v3]
    Link: http://lkml.kernel.org/r/20191007091808.7096-3-vbabka@suse.cz
    Link: http://lkml.kernel.org/r/20190930122916.14969-3-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Reviewed-by: Qian Cai
    Suggested-by: Dmitry Vyukov
    Suggested-by: Walter Wu
    Suggested-by: Andrey Ryabinin
    Suggested-by: Kirill A. Shutemov
    Suggested-by: Qian Cai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Patch series "followups to debug_pagealloc improvements through
    page_owner", v3.

    These are followups to [1] which made it to Linus meanwhile. Patches 1
    and 3 are based on Kirill's review, patch 2 on KASAN request [2]. It
    would be nice if all of this made it to 5.4 with [1] already there (or
    at least Patch 1).

    This patch (of 3):

    As noted by Kirill, commit 7e2f2a0cd17c ("mm, page_owner: record page
    owner for each subpage") has introduced an off-by-one error in
    __set_page_owner_handle() when looking up page_ext for subpages. As a
    result, the head page page_owner info is set twice, while for the last
    tail page, it's not set at all.

    Fix this and also make the code more efficient by advancing the page_ext
    pointer we already have, instead of calling lookup_page_ext() for each
    subpage. Since the full size of struct page_ext is not known at compile
    time, we can't use a simple page_ext++ statement, so introduce a
    page_ext_next() inline function for that.

    Link: http://lkml.kernel.org/r/20190930122916.14969-2-vbabka@suse.cz
    Fixes: 7e2f2a0cd17c ("mm, page_owner: record page owner for each subpage")
    Signed-off-by: Vlastimil Babka
    Reported-by: Kirill A. Shutemov
    Reported-by: Miles Chen
    Acked-by: Kirill A. Shutemov
    Cc: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Walter Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

25 Sep, 2019

3 commits

  • The debug_pagealloc functionality is useful to catch buggy page allocator
    users that cause e.g. use after free or double free. When page
    inconsistency is detected, debugging is often simpler by knowing the call
    stack of process that last allocated and freed the page. When page_owner
    is also enabled, we record the allocation stack trace, but not freeing.

    This patch therefore adds recording of freeing process stack trace to page
    owner info, if both page_owner and debug_pagealloc are configured and
    enabled. With only page_owner enabled, this info is not useful for the
    memory leak debugging use case. dump_page() is adjusted to print the
    info. An example result of calling __free_pages() twice may look like
    this (note the page last free stack trace):

    BUG: Bad page state in process bash pfn:13d8f8
    page:ffffc31984f63e00 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0
    flags: 0x1affff800000000()
    raw: 01affff800000000 dead000000000100 dead000000000122 0000000000000000
    raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
    page dumped because: nonzero _refcount
    page_owner tracks the page as freed
    page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL)
    prep_new_page+0x143/0x150
    get_page_from_freelist+0x289/0x380
    __alloc_pages_nodemask+0x13c/0x2d0
    khugepaged+0x6e/0xc10
    kthread+0xf9/0x130
    ret_from_fork+0x3a/0x50
    page last free stack trace:
    free_pcp_prepare+0x134/0x1e0
    free_unref_page+0x18/0x90
    khugepaged+0x7b/0xc10
    kthread+0xf9/0x130
    ret_from_fork+0x3a/0x50
    Modules linked in:
    CPU: 3 PID: 271 Comm: bash Not tainted 5.3.0-rc4-2.g07a1a73-default+ #57
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
    Call Trace:
    dump_stack+0x85/0xc0
    bad_page.cold+0xba/0xbf
    rmqueue_pcplist.isra.0+0x6c5/0x6d0
    rmqueue+0x2d/0x810
    get_page_from_freelist+0x191/0x380
    __alloc_pages_nodemask+0x13c/0x2d0
    __get_free_pages+0xd/0x30
    __pud_alloc+0x2c/0x110
    copy_page_range+0x4f9/0x630
    dup_mmap+0x362/0x480
    dup_mm+0x68/0x110
    copy_process+0x19e1/0x1b40
    _do_fork+0x73/0x310
    __x64_sys_clone+0x75/0x80
    do_syscall_64+0x6e/0x1e0
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f10af854a10
    ...

    Link: http://lkml.kernel.org/r/20190820131828.22684-5-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Cc: Kirill A. Shutemov
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • For debugging purposes it might be useful to keep the owner info even
    after page has been freed, and include it in e.g. dump_page() when
    detecting a bad page state. For that, change the PAGE_EXT_OWNER flag
    meaning to "page owner info has been set at least once" and add new
    PAGE_EXT_OWNER_ACTIVE for tracking whether page is supposed to be
    currently tracked allocated or free. Adjust dump_page() accordingly,
    distinguishing free and allocated pages. In the page_owner debugfs file,
    keep printing only allocated pages so that existing scripts are not
    confused, and also because free pages are irrelevant for the memory
    statistics or leak detection that's the typical use case of the file,
    anyway.

    Link: http://lkml.kernel.org/r/20190820131828.22684-4-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Cc: Kirill A. Shutemov
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Patch series "debug_pagealloc improvements through page_owner", v2.

    The debug_pagealloc functionality serves a similar purpose on the page
    allocator level that slub_debug does on the kmalloc level, which is to
    detect bad users. One notable feature that slub_debug has is storing
    stack traces of who last allocated and freed the object. On page level we
    track allocations via page_owner, but that info is discarded when freeing,
    and we don't track freeing at all. This series improves those aspects.
    With both debug_pagealloc and page_owner enabled, we can then get bug
    reports such as the example in Patch 4.

    SLUB debug tracking additionally stores cpu, pid and timestamp. This could
    be added later, if deemed useful enough to justify the additional page_ext
    structure size.

    This patch (of 3):

    Currently, page owner info is only recorded for the first page of a
    high-order allocation, and copied to tail pages in the event of a split
    page. With the plan to keep previous owner info after freeing the page,
    it would be benefical to record page owner for each subpage upon
    allocation. This increases the overhead for high orders, but that should
    be acceptable for a debugging option.

    The order stored for each subpage is the order of the whole allocation.
    This makes it possible to calculate the "head" pfn and to recognize "tail"
    pages (quoted because not all high-order allocations are compound pages
    with true head and tail pages). When reading the page_owner debugfs file,
    keep skipping the "tail" pages so that stats gathered by existing scripts
    don't get inflated.

    Link: http://lkml.kernel.org/r/20190820131828.22684-3-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Cc: Kirill A. Shutemov
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

29 Apr, 2019

1 commit

  • Replace the indirection through struct stack_trace by using the storage
    array based interfaces.

    The original code in all printing functions is really wrong. It allocates a
    storage array on stack which is unused because depot_fetch_stack() does not
    store anything in it. It overwrites the entries pointer in the stack_trace
    struct so it points to the depot storage.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: linux-mm@kvack.org
    Cc: Mike Rapoport
    Cc: David Rientjes
    Cc: Andrew Morton
    Cc: Steven Rostedt
    Cc: Alexander Potapenko
    Cc: Alexey Dobriyan
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Catalin Marinas
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: kasan-dev@googlegroups.com
    Cc: Akinobu Mita
    Cc: Christoph Hellwig
    Cc: iommu@lists.linux-foundation.org
    Cc: Robin Murphy
    Cc: Marek Szyprowski
    Cc: Johannes Thumshirn
    Cc: David Sterba
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: linux-btrfs@vger.kernel.org
    Cc: dm-devel@redhat.com
    Cc: Mike Snitzer
    Cc: Alasdair Kergon
    Cc: Daniel Vetter
    Cc: intel-gfx@lists.freedesktop.org
    Cc: Joonas Lahtinen
    Cc: Maarten Lankhorst
    Cc: dri-devel@lists.freedesktop.org
    Cc: David Airlie
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Tom Zanussi
    Cc: Miroslav Benes
    Cc: linux-arch@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190425094802.067210525@linutronix.de

    Thomas Gleixner
     

15 Apr, 2019

1 commit

  • No architecture terminates the stack trace with ULONG_MAX anymore. Remove
    the cruft.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: Steven Rostedt
    Cc: Alexander Potapenko
    Cc: Michal Hocko
    Cc: linux-mm@kvack.org
    Cc: Mike Rapoport
    Cc: Andrew Morton
    Link: https://lkml.kernel.org/r/20190410103644.661974663@linutronix.de

    Thomas Gleixner
     

06 Mar, 2019

1 commit

  • When calling debugfs functions, there is no need to ever check the
    return value. The function can work or not, but the code logic should
    never do something different based on this.

    Link: http://lkml.kernel.org/r/20190122152151.16139-14-gregkh@linuxfoundation.org
    Signed-off-by: Greg Kroah-Hartman
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: David Rientjes
    Cc: Laura Abbott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Kroah-Hartman
     

29 Dec, 2018

1 commit

  • The (root-only) page owner read might allocate a large size of memory with
    a large read count. Allocation fails can easily occur when doing high
    order allocations.

    Clamp buffer size to PAGE_SIZE to avoid arbitrary size allocation
    and avoid allocation fails due to high order allocation.

    [akpm@linux-foundation.org: use min_t()]
    Link: http://lkml.kernel.org/r/1541091607-27402-1-git-send-email-miles.chen@mediatek.com
    Signed-off-by: Miles Chen
    Acked-by: Michal Hocko
    Cc: Joe Perches
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miles Chen
     

31 Oct, 2018

1 commit

  • Move remaining definitions and declarations from include/linux/bootmem.h
    into include/linux/memblock.h and remove the redundant header.

    The includes were replaced with the semantic patch below and then
    semi-automated removal of duplicated '#include

    @@
    @@
    - #include
    + #include

    [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
    [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
    [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
    Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
    Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

15 Jun, 2018

1 commit

  • mm/*.c files use symbolic and octal styles for permissions.

    Using octal and not symbolic permissions is preferred by many as more
    readable.

    https://lkml.org/lkml/2016/8/2/1945

    Prefer the direct use of octal for permissions.

    Done using
    $ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c
    and some typing.

    Before: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
    44
    After: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
    86

    Miscellanea:

    o Whitespace neatening around these conversions.

    Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com
    Signed-off-by: Joe Perches
    Acked-by: David Rientjes
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

06 Apr, 2018

1 commit

  • The early_param() is only called during kernel initialization, So Linux
    marks the functions of it with __init macro to save memory.

    But it forgot to mark the early_page_owner_param(). So, Make it __init
    as well.

    Link: http://lkml.kernel.org/r/20180117034736.26963-1-douly.fnst@cn.fujitsu.com
    Signed-off-by: Dou Liyang
    Reviewed-by: Andrew Morton
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dou Liyang
     

29 Mar, 2018

1 commit

  • This patch fixes commit 5f48f0bd4e36 ("mm, page_owner: skip unnecessary
    stack_trace entries").

    Because if we skip first two entries then logic of checking count value
    as 2 for recursion is broken and code will go in one depth recursion.

    so we need to check only one call of _RET_IP(__set_page_owner) while
    checking for recursion.

    Current Backtrace while checking for recursion:-

    (save_stack) from (__set_page_owner) // (But recursion returns true here)
    (__set_page_owner) from (get_page_from_freelist)
    (get_page_from_freelist) from (__alloc_pages_nodemask)
    (__alloc_pages_nodemask) from (depot_save_stack)
    (depot_save_stack) from (save_stack) // recursion should return true here
    (save_stack) from (__set_page_owner)
    (__set_page_owner) from (get_page_from_freelist)
    (get_page_from_freelist) from (__alloc_pages_nodemask+)
    (__alloc_pages_nodemask) from (depot_save_stack)
    (depot_save_stack) from (save_stack)
    (save_stack) from (__set_page_owner)
    (__set_page_owner) from (get_page_from_freelist)

    Correct Backtrace with fix:

    (save_stack) from (__set_page_owner) // recursion returned true here
    (__set_page_owner) from (get_page_from_freelist)
    (get_page_from_freelist) from (__alloc_pages_nodemask+)
    (__alloc_pages_nodemask) from (depot_save_stack)
    (depot_save_stack) from (save_stack)
    (save_stack) from (__set_page_owner)
    (__set_page_owner) from (get_page_from_freelist)

    Link: http://lkml.kernel.org/r/1521607043-34670-1-git-send-email-maninder1.s@samsung.com
    Fixes: 5f48f0bd4e36 ("mm, page_owner: skip unnecessary stack_trace entries")
    Signed-off-by: Maninder Singh
    Signed-off-by: Vaneet Narang
    Acked-by: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Oscar Salvador
    Cc: Greg Kroah-Hartman
    Cc: Ayush Mittal
    Cc: Prakash Gupta
    Cc: Vinayak Menon
    Cc: Vasyl Gomonovych
    Cc: Amit Sahrawat
    Cc:
    Cc: Vaneet Narang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Maninder Singh
     

01 Feb, 2018

2 commits

  • Remove two redundant assignments in init_pages_in_zone().

    [osalvador@techadventures.net: v3]
    Link: http://lkml.kernel.org/r/20180117124513.GA876@techadventures.net
    [akpm@linux-foundation.org: coding style tweaks]
    Link: http://lkml.kernel.org/r/20180110084355.GA22822@techadventures.net
    Signed-off-by: Oscar Salvador
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     
  • Fix ptr_ret.cocci warnings:

    mm/page_owner.c:639:1-3: WARNING: PTR_ERR_OR_ZERO can be used

    Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

    Generated by: scripts/coccinelle/api/ptr_ret.cocci

    Link: http://lkml.kernel.org/r/1511824101-9597-1-git-send-email-gomonovych@gmail.com
    Signed-off-by: Vasyl Gomonovych
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasyl Gomonovych
     

20 Jan, 2018

1 commit

  • When setting page_owner = on, the following warning can be seen in the
    boot log:

    WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:2537 drain_all_pages+0x171/0x1a0
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc7-next-20180109-1-default+ #7
    Hardware name: Dell Inc. Latitude E7470/0T6HHJ, BIOS 1.11.3 11/09/2016
    RIP: 0010:drain_all_pages+0x171/0x1a0
    Call Trace:
    init_page_owner+0x4e/0x260
    start_kernel+0x3e6/0x4a6
    ? set_init_arg+0x55/0x55
    secondary_startup_64+0xa5/0xb0
    Code: c5 ed ff 89 df 48 c7 c6 20 3b 71 82 e8 f9 4b 52 00 3b 05 d7 0b f8 00 89 c3 72 d5 5b 5d 41 5

    This warning is shown because we are calling drain_all_pages() in
    init_early_allocated_pages(), but mm_percpu_wq is not up yet, it is being
    set up later on in kernel_init_freeable() -> init_mm_internals().

    Link: http://lkml.kernel.org/r/20180109153921.GA13070@techadventures.net
    Signed-off-by: Oscar Salvador
    Acked-by: Joonsoo Kim
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Ayush Mittal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     

16 Nov, 2017

1 commit

  • Maximum page order can be at max 10 which can be accomodated in short
    data type(2 bytes). last_migrate_reason is defined as enum type whose
    values can be accomodated in short data type (2 bytes).

    Total structure size is currently 16 bytes but after changing structure
    size it goes to 12 bytes.

    Vlastimil said:
    "Looks like it works, so why not.
    Before:
    [ 0.001000] allocated 50331648 bytes of page_ext
    After:
    [ 0.001000] allocated 41943040 bytes of page_ext"

    Link: http://lkml.kernel.org/r/1507623917-37991-1-git-send-email-ayush.m@samsung.com
    Signed-off-by: Ayush Mittal
    Acked-by: Vlastimil Babka
    Cc: Vinayak Menon
    Cc: Amit Sahrawat
    Cc: Vaneet Narang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ayush Mittal
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

14 Sep, 2017

1 commit

  • The page_owner stacktrace always begin as follows:

    [] save_stack+0x40/0xc8
    [] __set_page_owner+0x3c/0x6c

    These two entries do not provide any useful information and limits the
    available stacktrace depth. The page_owner stacktrace was skipping
    caller function from stack entries but this was missed with commit
    f2ca0b557107 ("mm/page_owner: use stackdepot to store stacktrace")

    Example page_owner entry after the patch:

    Page allocated via order 0, mask 0x8(ffffff80085fb714)
    PFN 654411 type Movable Block 639 type CMA Flags 0x0(ffffffbe5c7f12c0)
    [] post_alloc_hook+0x70/0x80
    ...
    [] msm_comm_try_state+0x5f8/0x14f4
    [] msm_vidc_open+0x5e4/0x7d0
    [] msm_v4l2_open+0xa8/0x224

    Link: http://lkml.kernel.org/r/1504078343-28754-2-git-send-email-guptap@codeaurora.org
    Fixes: f2ca0b557107 ("mm/page_owner: use stackdepot to store stacktrace")
    Signed-off-by: Prakash Gupta
    Acked-by: Vlastimil Babka
    Cc: Catalin Marinas
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Prakash Gupta
     

07 Sep, 2017

2 commits

  • init_pages_in_zone() is run under zone->lock, which means a long lock
    time and disabled interrupts on large machines. This is currently not
    an issue since it runs early in boot, but a later patch will change
    that.

    However, like other pfn scanners, we don't actually need zone->lock even
    when other cpus are running. The only potentially dangerous operation
    here is reading bogus buddy page owner due to race, and we already know
    how to handle that. The worst that can happen is that we skip some
    early allocated pages, which should not affect the debugging power of
    page_owner noticeably.

    Link: http://lkml.kernel.org/r/20170720134029.25268-4-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Yang Shi
    Cc: Laura Abbott
    Cc: Vinayak Menon
    Cc: zhong jiang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • In init_pages_in_zone() we currently use the generic set_page_owner()
    function to initialize page_owner info for early allocated pages. This
    means we needlessly do lookup_page_ext() twice for each page, and more
    importantly save_stack(), which has to unwind the stack and find the
    corresponding stack depot handle. Because the stack is always the same
    for the initialization, unwind it once in init_pages_in_zone() and reuse
    the handle. Also avoid the repeated lookup_page_ext().

    This can significantly reduce boot times with page_owner=on on large
    machines, especially for kernels built without frame pointer, where the
    stack unwinding is noticeably slower.

    [vbabka@suse.cz: don't duplicate code of __set_page_owner(), per Michal Hocko]
    [akpm@linux-foundation.org: coding-style fixes]
    [vbabka@suse.cz: create statically allocated fake stack trace for early allocated pages, per Michal]
    Link: http://lkml.kernel.org/r/45813564-2342-fc8d-d31a-f4b68a724325@suse.cz
    Link: http://lkml.kernel.org/r/20170720134029.25268-2-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Yang Shi
    Cc: Laura Abbott
    Cc: Vinayak Menon
    Cc: zhong jiang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

11 Jul, 2017

1 commit

  • pagetypeinfo_showmixedcount_print is found to take a lot of time to
    complete and it does this holding the zone lock and disabling
    interrupts. In some cases it is found to take more than a second (On a
    2.4GHz,8Gb RAM,arm64 cpu).

    Avoid taking the zone lock similar to what is done by read_page_owner,
    which means possibility of inaccurate results.

    Link: http://lkml.kernel.org/r/1498045643-12257-1-git-send-email-vinmenon@codeaurora.org
    Signed-off-by: Vinayak Menon
    Acked-by: Vlastimil Babka
    Cc: Joonsoo Kim
    Cc: zhongjiang
    Cc: Sergey Senozhatsky
    Cc: Sudip Mukherjee
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Sebastian Andrzej Siewior
    Cc: David Rientjes
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vinayak Menon
     

08 Oct, 2016

2 commits

  • There is a memory waste problem if we define field on struct page_ext by
    hard-coding. Entry size of struct page_ext includes the size of those
    fields even if it is disabled at runtime. Now, extra memory request at
    runtime is possible so page_owner don't need to define it's own fields
    by hard-coding.

    This patch removes hard-coded define and uses extra memory for storing
    page_owner information in page_owner. Most of code are just mechanical
    changes.

    Link: http://lkml.kernel.org/r/1471315879-32294-7-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • There is no reason that page_owner specific function resides on
    vmstat.c.

    Link: http://lkml.kernel.org/r/1471315879-32294-4-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

27 Jul, 2016

3 commits

  • Currently, we store each page's allocation stacktrace on corresponding
    page_ext structure and it requires a lot of memory. This causes the
    problem that memory tight system doesn't work well if page_owner is
    enabled. Moreover, even with this large memory consumption, we cannot
    get full stacktrace because we allocate memory at boot time and just
    maintain 8 stacktrace slots to balance memory consumption. We could
    increase it to more but it would make system unusable or change system
    behaviour.

    To solve the problem, this patch uses stackdepot to store stacktrace.
    It obviously provides memory saving but there is a drawback that
    stackdepot could fail.

    stackdepot allocates memory at runtime so it could fail if system has
    not enough memory. But, most of allocation stack are generated at very
    early time and there are much memory at this time. So, failure would
    not happen easily. And, one failure means that we miss just one page's
    allocation stacktrace so it would not be a big problem. In this patch,
    when memory allocation failure happens, we store special stracktrace
    handle to the page that is failed to save stacktrace. With it, user can
    guess memory usage properly even if failure happens.

    Memory saving looks as following. (4GB memory system with page_owner)
    (before the patch -> after the patch)

    static allocation:
    92274688 bytes -> 25165824 bytes

    dynamic allocation after boot + kernel build:
    0 bytes -> 327680 bytes

    total:
    92274688 bytes -> 25493504 bytes

    72% reduction in total.

    Note that implementation looks complex than someone would imagine
    because there is recursion issue. stackdepot uses page allocator and
    page_owner is called at page allocation. Using stackdepot in page_owner
    could re-call page allcator and then page_owner. That is a recursion.
    To detect and avoid it, whenever we obtain stacktrace, recursion is
    checked and page_owner is set to dummy information if found. Dummy
    information means that this page is allocated for page_owner feature
    itself (such as stackdepot) and it's understandable behavior for user.

    [iamjoonsoo.kim@lge.com: mm-page_owner-use-stackdepot-to-store-stacktrace-v3]
    Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com
    Link: http://lkml.kernel.org/r/1466150259-27727-7-git-send-email-iamjoonsoo.kim@lge.com
    Link: http://lkml.kernel.org/r/1464230275-25791-6-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Alexander Potapenko
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • split_page() calls set_page_owner() to set up page_owner to each pages.
    But, it has a drawback that head page and the others have different
    stacktrace because callsite of set_page_owner() is slightly differnt.
    To avoid this problem, this patch copies head page's page_owner to the
    others. It needs to introduce new function, split_page_owner() but it
    also remove the other function, get_page_owner_gfp() so looks good to
    do.

    Link: http://lkml.kernel.org/r/1464230275-25791-4-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Alexander Potapenko
    Cc: Hugh Dickins
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Currently, copy_page_owner() doesn't copy all the owner information. It
    skips last_migrate_reason because copy_page_owner() is used for
    migration and it will be properly set soon. But, following patch will
    use copy_page_owner() and this skip will cause the problem that
    allocated page has uninitialied last_migrate_reason. To prevent it,
    this patch also copy last_migrate_reason in copy_page_owner().

    Link: http://lkml.kernel.org/r/1464230275-25791-3-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Alexander Potapenko
    Cc: Hugh Dickins
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

25 Jun, 2016

1 commit

  • We have dereferenced page_ext before checking it. Lets check it first
    and then used it.

    Fixes: f86e4271978b ("mm: check the return value of lookup_page_ext for all call sites")
    Link: http://lkml.kernel.org/r/1465249059-7883-1-git-send-email-sudipm.mukherjee@gmail.com
    Signed-off-by: Sudip Mukherjee
    Acked-by: Vlastimil Babka
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sudip Mukherjee
     

04 Jun, 2016

1 commit

  • Per the discussion with Joonsoo Kim [1], we need check the return value
    of lookup_page_ext() for all call sites since it might return NULL in
    some cases, although it is unlikely, i.e. memory hotplug.

    Tested with ltp with "page_owner=0".

    [1] http://lkml.kernel.org/r/20160519002809.GA10245@js1304-P5Q-DELUXE

    [akpm@linux-foundation.org: fix build-breaking typos]
    [arnd@arndb.de: fix build problems from lookup_page_ext]
    Link: http://lkml.kernel.org/r/6285269.2CksypHdYp@wuerfel
    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1464023768-31025-1-git-send-email-yang.shi@linaro.org
    Signed-off-by: Yang Shi
    Signed-off-by: Arnd Bergmann
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     

20 May, 2016

2 commits

  • The function call overhead of get_pfnblock_flags_mask() is measurable in
    the page free paths. This patch uses an inlined version that is faster.

    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Jesper Dangaard Brouer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • There is a system thats node's pfns are overlapped as follows:

    -----pfn-------->
    N0 N1 N2 N0 N1 N2

    Therefore, we need to care this overlapping when iterating pfn range.

    There are one place in page_owner.c that iterates pfn range and it
    doesn't consider this overlapping. Add it.

    Without this patch, above system could over count early allocated page
    number before page_owner is activated.

    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Laura Abbott
    Cc: Minchan Kim
    Cc: Marek Szyprowski
    Cc: Michal Nazarewicz
    Cc: "Aneesh Kumar K.V"
    Cc: "Rafael J. Wysocki"
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

18 Mar, 2016

1 commit

  • Kernel style prefers a single string over split strings when the string is
    'user-visible'.

    Miscellanea:

    - Add a missing newline
    - Realign arguments

    Signed-off-by: Joe Perches
    Acked-by: Tejun Heo [percpu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

16 Mar, 2016

1 commit

  • The page_owner mechanism is useful for dealing with memory leaks. By
    reading /sys/kernel/debug/page_owner one can determine the stack traces
    leading to allocations of all pages, and find e.g. a buggy driver.

    This information might be also potentially useful for debugging, such as
    the VM_BUG_ON_PAGE() calls to dump_page(). So let's print the stored
    info from dump_page().

    Example output:

    page:ffffea000292f1c0 count:1 mapcount:0 mapping:ffff8800b2f6cc18 index:0x91d
    flags: 0x1fffff8001002c(referenced|uptodate|lru|mappedtodisk)
    page dumped because: VM_BUG_ON_PAGE(1)
    page->mem_cgroup:ffff8801392c5000
    page allocated via order 0, migratetype Movable, gfp_mask 0x24213ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY)
    [] __alloc_pages_nodemask+0x134/0x230
    [] alloc_pages_current+0x88/0x120
    [] __page_cache_alloc+0xe6/0x120
    [] __do_page_cache_readahead+0xdc/0x240
    [] ondemand_readahead+0x135/0x260
    [] page_cache_async_readahead+0x6c/0x70
    [] generic_file_read_iter+0x3f2/0x760
    [] __vfs_read+0xa7/0xd0
    page has been migrated, last migrate reason: compaction

    Signed-off-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Joonsoo Kim
    Cc: Minchan Kim
    Cc: Sasha Levin
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka