24 Nov, 2017

1 commit

  • commit e492080e640c2d1235ddf3441cae634cfffef7e1 upstream.

    online_page_ext() and page_ext_init() allocate page_ext for each
    section, but they do not allocate if the first PFN is !pfn_present(pfn)
    or !pfn_valid(pfn). Then section->page_ext remains as NULL.
    lookup_page_ext checks NULL only if CONFIG_DEBUG_VM is enabled. For a
    valid PFN, __set_page_owner will try to get page_ext through
    lookup_page_ext. Without CONFIG_DEBUG_VM lookup_page_ext will misuse
    NULL pointer as value 0. This incurrs invalid address access.

    This is the panic example when PFN 0x100000 is not valid but PFN
    0x13FC00 is being used for page_ext. section->page_ext is NULL,
    get_entry returned invalid page_ext address as 0x1DFA000 for a PFN
    0x13FC00.

    To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext
    will be checked at all times.

    Unable to handle kernel paging request at virtual address 01dfa014
    ------------[ cut here ]------------
    Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
    Internal error: Oops: 96000045 [#1] PREEMPT SMP
    Modules linked in:
    PC is at __set_page_owner+0x48/0x78
    LR is at __set_page_owner+0x44/0x78
    __set_page_owner+0x48/0x78
    get_page_from_freelist+0x880/0x8e8
    __alloc_pages_nodemask+0x14c/0xc48
    __do_page_cache_readahead+0xdc/0x264
    filemap_fault+0x2ac/0x550
    ext4_filemap_fault+0x3c/0x58
    __do_fault+0x80/0x120
    handle_mm_fault+0x704/0xbb0
    do_page_fault+0x2e8/0x394
    do_mem_abort+0x88/0x124

    Pre-4.7 kernels also need commit f86e4271978b ("mm: check the return
    value of lookup_page_ext for all call sites").

    Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com
    Fixes: eefa864b701d ("mm/page_ext: resurrect struct page extending code for debugging")
    Signed-off-by: Jaewon Kim
    Acked-by: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Jaewon Kim
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

07 Sep, 2017

2 commits

  • page_ext_init() can take long on large machines, so add a cond_resched()
    point after each section is processed. This will allow moving the init
    to a later point at boot without triggering lockup reports.

    Link: http://lkml.kernel.org/r/20170720134029.25268-3-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Yang Shi
    Cc: Laura Abbott
    Cc: Vinayak Menon
    Cc: zhong jiang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Commit f52407ce2dea ("memory hotplug: alloc page from other node in
    memory online") has introduced N_HIGH_MEMORY checks to only use NUMA
    aware allocations when there is some memory present because the
    respective node might not have any memory yet at the time and so it
    could fail or even OOM.

    Things have changed since then though. Zonelists are now always
    initialized before we do any allocations even for hotplug (see
    959ecc48fc75 ("mm/memory_hotplug.c: fix building of node hotplug
    zonelist")).

    Therefore these checks are not really needed. In fact caller of the
    allocator should never care about whether the node is populated because
    that might change at any time.

    Link: http://lkml.kernel.org/r/20170721143915.14161-10-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Shaohua Li
    Cc: Joonsoo Kim
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Toshi Kani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

04 May, 2017

1 commit

  • On SPARSEMEM systems page poisoning is enabled after buddy is up,
    because of the dependency on page extension init. This causes the pages
    released by free_all_bootmem not to be poisoned. This either delays or
    misses the identification of some issues because the pages have to
    undergo another cycle of alloc-free-alloc for any corruption to be
    detected.

    Enable page poisoning early by getting rid of the PAGE_EXT_DEBUG_POISON
    flag. Since all the free pages will now be poisoned, the flag need not
    be verified before checking the poison during an alloc.

    [vinmenon@codeaurora.org: fix Kconfig]
    Link: http://lkml.kernel.org/r/1490878002-14423-1-git-send-email-vinmenon@codeaurora.org
    Link: http://lkml.kernel.org/r/1490358246-11001-1-git-send-email-vinmenon@codeaurora.org
    Signed-off-by: Vinayak Menon
    Acked-by: Laura Abbott
    Tested-by: Laura Abbott
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vinayak Menon
     

08 Oct, 2016

2 commits

  • Until now, if some page_ext users want to use it's own field on
    page_ext, it should be defined in struct page_ext by hard-coding. It
    has a problem that wastes memory in following situation.

    struct page_ext {
    #ifdef CONFIG_A
    int a;
    #endif
    #ifdef CONFIG_B
    int b;
    #endif
    };

    Assume that kernel is built with both CONFIG_A and CONFIG_B. Even if we
    enable feature A and doesn't enable feature B at runtime, each entry of
    struct page_ext takes two int rather than one int. It's undesirable
    result so this patch tries to fix it.

    To solve above problem, this patch implements to support extra space
    allocation at runtime. When need() callback returns true, it's extra
    memory requirement is summed to entry size of page_ext. Also, offset
    for each user's extra memory space is returned. With this offset, user
    can use this extra space and there is no need to define needed field on
    page_ext by hard-coding.

    This patch only implements an infrastructure. Following patch will use
    it for page_owner which is only user having it's own fields on page_ext.

    Link: http://lkml.kernel.org/r/1471315879-32294-6-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Here, 'offset' means entry index in page_ext array. Following patch
    will use 'offset' for field offset in each entry so rename current
    'offset' to prevent confusion.

    Link: http://lkml.kernel.org/r/1471315879-32294-5-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

28 May, 2016

1 commit

  • page_ext_init() checks suitable pages with pfn_to_nid(), but
    pfn_to_nid() depends on memmap which will not be setup fully until
    page_alloc_init_late() is done. Use early_pfn_to_nid() instead of
    pfn_to_nid() so that page extension could be still used early even
    though CONFIG_ DEFERRED_STRUCT_PAGE_INIT is enabled and catch early page
    allocation call sites.

    Suggested by Joonsoo Kim [1], this fix basically undoes the change
    introduced by commit b8f1a75d61d840 ("mm: call page_ext_init() after all
    struct pages are initialized") and fixes the same problem with a better
    approach.

    [1] http://lkml.kernel.org/r/CAAmzW4OUmyPwQjvd7QUfc6W1Aic__TyAuH80MLRZNMxKy0-wPQ@mail.gmail.com

    Link: http://lkml.kernel.org/r/1464198689-23458-1-git-send-email-yang.shi@linaro.org
    Signed-off-by: Yang Shi
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     

16 Mar, 2016

1 commit

  • By default, page poisoning uses a poison value (0xaa) on free. If this
    is changed to 0, the page is not only sanitized but zeroing on alloc
    with __GFP_ZERO can be skipped as well. The tradeoff is that detecting
    corruption from the poisoning is harder to detect. This feature also
    cannot be used with hibernation since pages are not guaranteed to be
    zeroed after hibernation.

    Credit to Grsecurity/PaX team for inspiring this work

    Signed-off-by: Laura Abbott
    Acked-by: Rafael J. Wysocki
    Cc: "Kirill A. Shutemov"
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Kees Cook
    Cc: Mathias Krause
    Cc: Dave Hansen
    Cc: Jianyu Zhan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Laura Abbott
     

11 Sep, 2015

1 commit

  • Knowing the portion of memory that is not used by a certain application or
    memory cgroup (idle memory) can be useful for partitioning the system
    efficiently, e.g. by setting memory cgroup limits appropriately.
    Currently, the only means to estimate the amount of idle memory provided
    by the kernel is /proc/PID/{clear_refs,smaps}: the user can clear the
    access bit for all pages mapped to a particular process by writing 1 to
    clear_refs, wait for some time, and then count smaps:Referenced. However,
    this method has two serious shortcomings:

    - it does not count unmapped file pages
    - it affects the reclaimer logic

    To overcome these drawbacks, this patch introduces two new page flags,
    Idle and Young, and a new sysfs file, /sys/kernel/mm/page_idle/bitmap.
    A page's Idle flag can only be set from userspace by setting bit in
    /sys/kernel/mm/page_idle/bitmap at the offset corresponding to the page,
    and it is cleared whenever the page is accessed either through page tables
    (it is cleared in page_referenced() in this case) or using the read(2)
    system call (mark_page_accessed()). Thus by setting the Idle flag for
    pages of a particular workload, which can be found e.g. by reading
    /proc/PID/pagemap, waiting for some time to let the workload access its
    working set, and then reading the bitmap file, one can estimate the amount
    of pages that are not used by the workload.

    The Young page flag is used to avoid interference with the memory
    reclaimer. A page's Young flag is set whenever the Access bit of a page
    table entry pointing to the page is cleared by writing to the bitmap file.
    If page_referenced() is called on a Young page, it will add 1 to its
    return value, therefore concealing the fact that the Access bit was
    cleared.

    Note, since there is no room for extra page flags on 32 bit, this feature
    uses extended page flags when compiled on 32 bit.

    [akpm@linux-foundation.org: fix build]
    [akpm@linux-foundation.org: kpageidle requires an MMU]
    [akpm@linux-foundation.org: decouple from page-flags rework]
    Signed-off-by: Vladimir Davydov
    Reviewed-by: Andres Lagar-Cavilla
    Cc: Minchan Kim
    Cc: Raghavendra K T
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Michel Lespinasse
    Cc: David Rientjes
    Cc: Pavel Emelyanov
    Cc: Cyrill Gorcunov
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

14 Dec, 2014

3 commits

  • This is the page owner tracking code which is introduced so far ago. It
    is resident on Andrew's tree, though, nobody tried to upstream so it
    remain as is. Our company uses this feature actively to debug memory leak
    or to find a memory hogger so I decide to upstream this feature.

    This functionality help us to know who allocates the page. When
    allocating a page, we store some information about allocation in extra
    memory. Later, if we need to know status of all pages, we can get and
    analyze it from this stored information.

    In previous version of this feature, extra memory is statically defined in
    struct page, but, in this version, extra memory is allocated outside of
    struct page. It enables us to turn on/off this feature at boottime
    without considerable memory waste.

    Although we already have tracepoint for tracing page allocation/free,
    using it to analyze page owner is rather complex. We need to enlarge the
    trace buffer for preventing overlapping until userspace program launched.
    And, launched program continually dump out the trace buffer for later
    analysis and it would change system behaviour with more possibility rather
    than just keeping it in memory, so bad for debug.

    Moreover, we can use page_owner feature further for various purposes. For
    example, we can use it for fragmentation statistics implemented in this
    patch. And, I also plan to implement some CMA failure debugging feature
    using this interface.

    I'd like to give the credit for all developers contributed this feature,
    but, it's not easy because I don't know exact history. Sorry about that.
    Below is people who has "Signed-off-by" in the patches in Andrew's tree.

    Contributor:
    Alexander Nyberg
    Mel Gorman
    Dave Hansen
    Minchan Kim
    Michal Nazarewicz
    Andrew Morton
    Jungsoo Son

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Until now, debug-pagealloc needs extra flags in struct page, so we need to
    recompile whole source code when we decide to use it. This is really
    painful, because it takes some time to recompile and sometimes rebuild is
    not possible due to third party module depending on struct page. So, we
    can't use this good feature in many cases.

    Now, we have the page extension feature that allows us to insert extra
    flags to outside of struct page. This gets rid of third party module
    issue mentioned above. And, this allows us to determine if we need extra
    memory for this page extension in boottime. With these property, we can
    avoid using debug-pagealloc in boottime with low computational overhead in
    the kernel built with CONFIG_DEBUG_PAGEALLOC. This will help our
    development process greatly.

    This patch is the preparation step to achive above goal. debug-pagealloc
    originally uses extra field of struct page, but, after this patch, it will
    use field of struct page_ext. Because memory for page_ext is allocated
    later than initialization of page allocator in CONFIG_SPARSEMEM, we should
    disable debug-pagealloc feature temporarily until initialization of
    page_ext. This patch implements this.

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • When we debug something, we'd like to insert some information to every
    page. For this purpose, we sometimes modify struct page itself. But,
    this has drawbacks. First, it requires re-compile. This makes us
    hesitate to use the powerful debug feature so development process is
    slowed down. And, second, sometimes it is impossible to rebuild the
    kernel due to third party module dependency. At third, system behaviour
    would be largely different after re-compile, because it changes size of
    struct page greatly and this structure is accessed by every part of
    kernel. Keeping this as it is would be better to reproduce errornous
    situation.

    This feature is intended to overcome above mentioned problems. This
    feature allocates memory for extended data per page in certain place
    rather than the struct page itself. This memory can be accessed by the
    accessor functions provided by this code. During the boot process, it
    checks whether allocation of huge chunk of memory is needed or not. If
    not, it avoids allocating memory at all. With this advantage, we can
    include this feature into the kernel in default and can avoid rebuild and
    solve related problems.

    Until now, memcg uses this technique. But, now, memcg decides to embed
    their variable to struct page itself and it's code to extend struct page
    has been removed. I'd like to use this code to develop debug feature, so
    this patch resurrect it.

    To help these things to work well, this patch introduces two callbacks for
    clients. One is the need callback which is mandatory if user wants to
    avoid useless memory allocation at boot-time. The other is optional, init
    callback, which is used to do proper initialization after memory is
    allocated. Detailed explanation about purpose of these functions is in
    code comment. Please refer it.

    Others are completely same with previous extension code in memcg.

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim