07 Jan, 2021

1 commit


08 Apr, 2020

1 commit

  • Since commit c5e79ef561b0 ("mm/memory_hotplug.c: don't allow to
    online/offline memory blocks with holes") we disallow to offline any
    memory with holes. As all boot memory is online and hotplugged memory
    cannot contain holes, we never online memory with holes.

    This present check can be dropped.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Anshuman Khandual
    Cc: Dan Williams
    Cc: Greg Kroah-Hartman
    Cc: Pavel Tatashin
    Cc: "Rafael J. Wysocki"
    Link: http://lkml.kernel.org/r/20200127110424.5757-4-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

03 Apr, 2020

1 commit

  • After introducing mem sub section concept, pfn_present() loses its literal
    meaning, and will not be necessary a truth on partial populated mem
    section.

    Since all of the callers use it to judge an absent section, it is better
    to rename pfn_present() as pfn_in_present_section().

    Signed-off-by: Pingfan Liu
    Signed-off-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Acked-by: Michael Ellerman [powerpc]
    Cc: Dan Williams
    Cc: Michal Hocko
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Greg Kroah-Hartman
    Cc: "Rafael J. Wysocki"
    Cc: Leonardo Bras
    Cc: Nathan Fontenot
    Cc: Nathan Lynch
    Link: http://lkml.kernel.org/r/1581919110-29575-1-git-send-email-kernelfans@gmail.com
    Signed-off-by: Linus Torvalds

    Pingfan Liu
     

15 Oct, 2019

1 commit

  • Patch series "followups to debug_pagealloc improvements through
    page_owner", v3.

    These are followups to [1] which made it to Linus meanwhile. Patches 1
    and 3 are based on Kirill's review, patch 2 on KASAN request [2]. It
    would be nice if all of this made it to 5.4 with [1] already there (or
    at least Patch 1).

    This patch (of 3):

    As noted by Kirill, commit 7e2f2a0cd17c ("mm, page_owner: record page
    owner for each subpage") has introduced an off-by-one error in
    __set_page_owner_handle() when looking up page_ext for subpages. As a
    result, the head page page_owner info is set twice, while for the last
    tail page, it's not set at all.

    Fix this and also make the code more efficient by advancing the page_ext
    pointer we already have, instead of calling lookup_page_ext() for each
    subpage. Since the full size of struct page_ext is not known at compile
    time, we can't use a simple page_ext++ statement, so introduce a
    page_ext_next() inline function for that.

    Link: http://lkml.kernel.org/r/20190930122916.14969-2-vbabka@suse.cz
    Fixes: 7e2f2a0cd17c ("mm, page_owner: record page owner for each subpage")
    Signed-off-by: Vlastimil Babka
    Reported-by: Kirill A. Shutemov
    Reported-by: Miles Chen
    Acked-by: Kirill A. Shutemov
    Cc: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Walter Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

13 Jul, 2019

1 commit

  • When debug_pagealloc is enabled, we currently allocate the page_ext
    array to mark guard pages with the PAGE_EXT_DEBUG_GUARD flag. Now that
    we have the page_type field in struct page, we can use that instead, as
    guard pages are neither PageSlab nor mapped to userspace. This reduces
    memory overhead when debug_pagealloc is enabled and there are no other
    features requiring the page_ext array.

    Link: http://lkml.kernel.org/r/20190603143451.27353-4-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Cc: Joonsoo Kim
    Cc: Matthew Wilcox
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

13 Mar, 2019

1 commit

  • As all the memblock allocation functions return NULL in case of error
    rather than panic(), the duplicates with _nopanic suffix can be removed.

    Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Greg Kroah-Hartman
    Reviewed-by: Petr Mladek [printk]
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Guo Ren [c-sky]
    Cc: Heiko Carstens
    Cc: Juergen Gross [Xen]
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Paul Burton
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

06 Mar, 2019

2 commits

  • After offlining a memory block, kmemleak scan will trigger a crash, as
    it encounters a page ext address that has already been freed during
    memory offlining. At the beginning in alloc_page_ext(), it calls
    kmemleak_alloc(), but it does not call kmemleak_free() in
    free_page_ext().

    BUG: unable to handle kernel paging request at ffff888453d00000
    PGD 128a01067 P4D 128a01067 PUD 128a04067 PMD 47e09e067 PTE 800ffffbac2ff060
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    CPU: 1 PID: 1594 Comm: bash Not tainted 5.0.0-rc8+ #15
    Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9, BIOS U20 10/25/2017
    RIP: 0010:scan_block+0xb5/0x290
    Code: 85 6e 01 00 00 48 b8 00 00 30 f5 81 88 ff ff 48 39 c3 0f 84 5b 01 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 0f 85 87 01 00 00 8b 3b e8 f3 0c fa ff 4c 39 3d 0c 6b 4c 01 0f 87 08 01 00 00 4c
    RSP: 0018:ffff8881ec57f8e0 EFLAGS: 00010082
    RAX: 0000000000000000 RBX: ffff888453d00000 RCX: ffffffffa61e5a54
    RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888453d00000
    RBP: ffff8881ec57f920 R08: fffffbfff4ed588d R09: fffffbfff4ed588c
    R10: fffffbfff4ed588c R11: ffffffffa76ac463 R12: dffffc0000000000
    R13: ffff888453d00ff9 R14: ffff8881f80cef48 R15: ffff8881f80cef48
    FS: 00007f6c0e3f8740(0000) GS:ffff8881f7680000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff888453d00000 CR3: 00000001c4244003 CR4: 00000000001606a0
    Call Trace:
    scan_gray_list+0x269/0x430
    kmemleak_scan+0x5a8/0x10f0
    kmemleak_write+0x541/0x6ca
    full_proxy_write+0xf8/0x190
    __vfs_write+0xeb/0x980
    vfs_write+0x15a/0x4f0
    ksys_write+0xd2/0x1b0
    __x64_sys_write+0x73/0xb0
    do_syscall_64+0xeb/0xaaa
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f6c0dad73b8
    Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 63 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
    RSP: 002b:00007ffd5b863cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f6c0dad73b8
    RDX: 0000000000000005 RSI: 000055a9216e1710 RDI: 0000000000000001
    RBP: 000055a9216e1710 R08: 000000000000000a R09: 00007ffd5b863840
    R10: 000000000000000a R11: 0000000000000246 R12: 00007f6c0dda9780
    R13: 0000000000000005 R14: 00007f6c0dda4740 R15: 0000000000000005
    Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci libahci igb i2c_algo_bit libata i2c_core dm_mirror dm_region_hash dm_log dm_mod efivarfs
    CR2: ffff888453d00000
    ---[ end trace ccf646c7456717c5 ]---
    Kernel panic - not syncing: Fatal exception
    Shutting down cpus with NMI
    Kernel Offset: 0x24c00000 from 0xffffffff81000000 (relocation range:
    0xffffffff80000000-0xffffffffbfffffff)
    ---[ end Kernel panic - not syncing: Fatal exception ]---

    Link: http://lkml.kernel.org/r/20190227173147.75650-1-cai@lca.pw
    Signed-off-by: Qian Cai
    Reviewed-by: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     
  • Patch series "Replace all open encodings for NUMA_NO_NODE", v3.

    All these places for replacement were found by running the following
    grep patterns on the entire kernel code. Please let me know if this
    might have missed some instances. This might also have replaced some
    false positives. I will appreciate suggestions, inputs and review.

    1. git grep "nid == -1"
    2. git grep "node == -1"
    3. git grep "nid = -1"
    4. git grep "node = -1"

    This patch (of 2):

    At present there are multiple places where invalid node number is
    encoded as -1. Even though implicitly understood it is always better to
    have macros in there. Replace these open encodings for an invalid node
    number with the global macro NUMA_NO_NODE. This helps remove NUMA
    related assumptions like 'invalid node' from various places redirecting
    them to a common definition.

    Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual
    Reviewed-by: David Hildenbrand
    Acked-by: Jeff Kirsher [ixgbe]
    Acked-by: Jens Axboe [mtip32xx]
    Acked-by: Vinod Koul [dmaengine.c]
    Acked-by: Michael Ellerman [powerpc]
    Acked-by: Doug Ledford [drivers/infiniband]
    Cc: Joseph Qi
    Cc: Hans Verkuil
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     

13 Feb, 2019

1 commit

  • This reverts commit fe53ca54270a ("mm: use early_pfn_to_nid in
    page_ext_init").

    When booting a system with "page_owner=on",

    start_kernel
    page_ext_init
    invoke_init_callbacks
    init_section_page_ext
    init_page_owner
    init_early_allocated_pages
    init_zones_in_node
    init_pages_in_zone
    lookup_page_ext
    page_to_nid

    The issue here is that page_to_nid() will not work since some page flags
    have no node information until later in page_alloc_init_late() due to
    DEFERRED_STRUCT_PAGE_INIT. Hence, it could trigger an out-of-bounds
    access with an invalid nid.

    UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
    index 7 is out of range for type 'zone [5]'

    Also, kernel will panic since flags were poisoned earlier with,

    CONFIG_DEBUG_VM_PGFLAGS=y
    CONFIG_NODE_NOT_IN_PAGE_FLAGS=n

    start_kernel
    setup_arch
    pagetable_init
    paging_init
    sparse_init
    sparse_init_nid
    memblock_alloc_try_nid_raw

    It did not handle it well in init_pages_in_zone() which ends up calling
    page_to_nid().

    page:ffffea0004200000 is uninitialized and poisoned
    raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
    raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
    page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
    page_owner info is not active (free page?)
    kernel BUG at include/linux/mm.h:990!
    RIP: 0010:init_page_owner+0x486/0x520

    This means that assumptions behind commit fe53ca54270a ("mm: use
    early_pfn_to_nid in page_ext_init") are incomplete. Therefore, revert
    the commit for now. A proper way to move the page_owner initialization
    to sooner is to hook into memmap initialization.

    Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pw
    Signed-off-by: Qian Cai
    Acked-by: Michal Hocko
    Cc: Pasha Tatashin
    Cc: Mel Gorman
    Cc: Yang Shi
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     

31 Oct, 2018

3 commits

  • Move remaining definitions and declarations from include/linux/bootmem.h
    into include/linux/memblock.h and remove the redundant header.

    The includes were replaced with the semantic patch below and then
    semi-automated removal of duplicated '#include

    @@
    @@
    - #include
    + #include

    [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
    [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
    [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
    Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
    Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Drop BOOTMEM_ALLOC_ACCESSIBLE and BOOTMEM_ALLOC_ANYWHERE in favor of
    identical MEMBLOCK definitions.

    Link: http://lkml.kernel.org/r/1536927045-23536-29-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The conversion is done using

    sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
    $(git grep -l memblock_virt_alloc)

    Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

18 Aug, 2018

1 commit

  • lookup_page_ext() finds 'struct page_ext' for a given page. It requires
    only read access to the given struct page.

    Current implemnentation takes 'struct page *' as an argument. It makes
    compiler complain when 'const struct page *' passed.

    Change the argument to 'const struct page *'.

    Link: http://lkml.kernel.org/r/20180531135457.20167-3-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Reviewed-by: Andrew Morton
    Cc: Vinayak Menon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

01 Feb, 2018

1 commit

  • static struct page_ext_operations *page_ext_ops[] always contains debug_guardpage_ops,

    static struct page_ext_operations *page_ext_ops[] = {
    &debug_guardpage_ops,
    #ifdef CONFIG_PAGE_OWNER
    &page_owner_ops,
    #endif
    ...
    }

    but for it to work, CONFIG_DEBUG_PAGEALLOC must be enabled first. If
    someone has CONFIG_PAGE_EXTENSION, but has none of its users, eg:
    (CONFIG_PAGE_OWNER, CONFIG_DEBUG_PAGEALLOC, CONFIG_IDLE_PAGE_TRACKING),
    we can shrink page_ext_init() to a simple retq.

    $ size vmlinux (before patch)
    text data bss dec hex filename
    14356698 5681582 1687748 21726028 14b834c vmlinux

    $ size vmlinux (after patch)
    text data bss dec hex filename
    14356008 5681538 1687748 21725294 14b806e vmlinux

    On the other hand, it might does not even make sense, since if someone
    enables CONFIG_PAGE_EXTENSION, I would expect him to enable also at
    least one of its users.

    Link: http://lkml.kernel.org/r/20180105130235.GA21241@techadventures.net
    Signed-off-by: Oscar Salvador
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Jaewon Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     

16 Nov, 2017

1 commit

  • online_page_ext() and page_ext_init() allocate page_ext for each
    section, but they do not allocate if the first PFN is !pfn_present(pfn)
    or !pfn_valid(pfn). Then section->page_ext remains as NULL.
    lookup_page_ext checks NULL only if CONFIG_DEBUG_VM is enabled. For a
    valid PFN, __set_page_owner will try to get page_ext through
    lookup_page_ext. Without CONFIG_DEBUG_VM lookup_page_ext will misuse
    NULL pointer as value 0. This incurrs invalid address access.

    This is the panic example when PFN 0x100000 is not valid but PFN
    0x13FC00 is being used for page_ext. section->page_ext is NULL,
    get_entry returned invalid page_ext address as 0x1DFA000 for a PFN
    0x13FC00.

    To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext
    will be checked at all times.

    Unable to handle kernel paging request at virtual address 01dfa014
    ------------[ cut here ]------------
    Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
    Internal error: Oops: 96000045 [#1] PREEMPT SMP
    Modules linked in:
    PC is at __set_page_owner+0x48/0x78
    LR is at __set_page_owner+0x44/0x78
    __set_page_owner+0x48/0x78
    get_page_from_freelist+0x880/0x8e8
    __alloc_pages_nodemask+0x14c/0xc48
    __do_page_cache_readahead+0xdc/0x264
    filemap_fault+0x2ac/0x550
    ext4_filemap_fault+0x3c/0x58
    __do_fault+0x80/0x120
    handle_mm_fault+0x704/0xbb0
    do_page_fault+0x2e8/0x394
    do_mem_abort+0x88/0x124

    Pre-4.7 kernels also need commit f86e4271978b ("mm: check the return
    value of lookup_page_ext for all call sites").

    Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com
    Fixes: eefa864b701d ("mm/page_ext: resurrect struct page extending code for debugging")
    Signed-off-by: Jaewon Kim
    Acked-by: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Joonsoo Kim
    Cc: [depends on f86e427197, see above]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaewon Kim
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

07 Sep, 2017

2 commits

  • page_ext_init() can take long on large machines, so add a cond_resched()
    point after each section is processed. This will allow moving the init
    to a later point at boot without triggering lockup reports.

    Link: http://lkml.kernel.org/r/20170720134029.25268-3-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Yang Shi
    Cc: Laura Abbott
    Cc: Vinayak Menon
    Cc: zhong jiang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Commit f52407ce2dea ("memory hotplug: alloc page from other node in
    memory online") has introduced N_HIGH_MEMORY checks to only use NUMA
    aware allocations when there is some memory present because the
    respective node might not have any memory yet at the time and so it
    could fail or even OOM.

    Things have changed since then though. Zonelists are now always
    initialized before we do any allocations even for hotplug (see
    959ecc48fc75 ("mm/memory_hotplug.c: fix building of node hotplug
    zonelist")).

    Therefore these checks are not really needed. In fact caller of the
    allocator should never care about whether the node is populated because
    that might change at any time.

    Link: http://lkml.kernel.org/r/20170721143915.14161-10-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Shaohua Li
    Cc: Joonsoo Kim
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Toshi Kani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

04 May, 2017

1 commit

  • On SPARSEMEM systems page poisoning is enabled after buddy is up,
    because of the dependency on page extension init. This causes the pages
    released by free_all_bootmem not to be poisoned. This either delays or
    misses the identification of some issues because the pages have to
    undergo another cycle of alloc-free-alloc for any corruption to be
    detected.

    Enable page poisoning early by getting rid of the PAGE_EXT_DEBUG_POISON
    flag. Since all the free pages will now be poisoned, the flag need not
    be verified before checking the poison during an alloc.

    [vinmenon@codeaurora.org: fix Kconfig]
    Link: http://lkml.kernel.org/r/1490878002-14423-1-git-send-email-vinmenon@codeaurora.org
    Link: http://lkml.kernel.org/r/1490358246-11001-1-git-send-email-vinmenon@codeaurora.org
    Signed-off-by: Vinayak Menon
    Acked-by: Laura Abbott
    Tested-by: Laura Abbott
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vinayak Menon
     

08 Oct, 2016

2 commits

  • Until now, if some page_ext users want to use it's own field on
    page_ext, it should be defined in struct page_ext by hard-coding. It
    has a problem that wastes memory in following situation.

    struct page_ext {
    #ifdef CONFIG_A
    int a;
    #endif
    #ifdef CONFIG_B
    int b;
    #endif
    };

    Assume that kernel is built with both CONFIG_A and CONFIG_B. Even if we
    enable feature A and doesn't enable feature B at runtime, each entry of
    struct page_ext takes two int rather than one int. It's undesirable
    result so this patch tries to fix it.

    To solve above problem, this patch implements to support extra space
    allocation at runtime. When need() callback returns true, it's extra
    memory requirement is summed to entry size of page_ext. Also, offset
    for each user's extra memory space is returned. With this offset, user
    can use this extra space and there is no need to define needed field on
    page_ext by hard-coding.

    This patch only implements an infrastructure. Following patch will use
    it for page_owner which is only user having it's own fields on page_ext.

    Link: http://lkml.kernel.org/r/1471315879-32294-6-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Here, 'offset' means entry index in page_ext array. Following patch
    will use 'offset' for field offset in each entry so rename current
    'offset' to prevent confusion.

    Link: http://lkml.kernel.org/r/1471315879-32294-5-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Joonsoo Kim
    Acked-by: Vlastimil Babka
    Cc: Minchan Kim
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

28 May, 2016

1 commit

  • page_ext_init() checks suitable pages with pfn_to_nid(), but
    pfn_to_nid() depends on memmap which will not be setup fully until
    page_alloc_init_late() is done. Use early_pfn_to_nid() instead of
    pfn_to_nid() so that page extension could be still used early even
    though CONFIG_ DEFERRED_STRUCT_PAGE_INIT is enabled and catch early page
    allocation call sites.

    Suggested by Joonsoo Kim [1], this fix basically undoes the change
    introduced by commit b8f1a75d61d840 ("mm: call page_ext_init() after all
    struct pages are initialized") and fixes the same problem with a better
    approach.

    [1] http://lkml.kernel.org/r/CAAmzW4OUmyPwQjvd7QUfc6W1Aic__TyAuH80MLRZNMxKy0-wPQ@mail.gmail.com

    Link: http://lkml.kernel.org/r/1464198689-23458-1-git-send-email-yang.shi@linaro.org
    Signed-off-by: Yang Shi
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     

16 Mar, 2016

1 commit

  • By default, page poisoning uses a poison value (0xaa) on free. If this
    is changed to 0, the page is not only sanitized but zeroing on alloc
    with __GFP_ZERO can be skipped as well. The tradeoff is that detecting
    corruption from the poisoning is harder to detect. This feature also
    cannot be used with hibernation since pages are not guaranteed to be
    zeroed after hibernation.

    Credit to Grsecurity/PaX team for inspiring this work

    Signed-off-by: Laura Abbott
    Acked-by: Rafael J. Wysocki
    Cc: "Kirill A. Shutemov"
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Kees Cook
    Cc: Mathias Krause
    Cc: Dave Hansen
    Cc: Jianyu Zhan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Laura Abbott
     

11 Sep, 2015

1 commit

  • Knowing the portion of memory that is not used by a certain application or
    memory cgroup (idle memory) can be useful for partitioning the system
    efficiently, e.g. by setting memory cgroup limits appropriately.
    Currently, the only means to estimate the amount of idle memory provided
    by the kernel is /proc/PID/{clear_refs,smaps}: the user can clear the
    access bit for all pages mapped to a particular process by writing 1 to
    clear_refs, wait for some time, and then count smaps:Referenced. However,
    this method has two serious shortcomings:

    - it does not count unmapped file pages
    - it affects the reclaimer logic

    To overcome these drawbacks, this patch introduces two new page flags,
    Idle and Young, and a new sysfs file, /sys/kernel/mm/page_idle/bitmap.
    A page's Idle flag can only be set from userspace by setting bit in
    /sys/kernel/mm/page_idle/bitmap at the offset corresponding to the page,
    and it is cleared whenever the page is accessed either through page tables
    (it is cleared in page_referenced() in this case) or using the read(2)
    system call (mark_page_accessed()). Thus by setting the Idle flag for
    pages of a particular workload, which can be found e.g. by reading
    /proc/PID/pagemap, waiting for some time to let the workload access its
    working set, and then reading the bitmap file, one can estimate the amount
    of pages that are not used by the workload.

    The Young page flag is used to avoid interference with the memory
    reclaimer. A page's Young flag is set whenever the Access bit of a page
    table entry pointing to the page is cleared by writing to the bitmap file.
    If page_referenced() is called on a Young page, it will add 1 to its
    return value, therefore concealing the fact that the Access bit was
    cleared.

    Note, since there is no room for extra page flags on 32 bit, this feature
    uses extended page flags when compiled on 32 bit.

    [akpm@linux-foundation.org: fix build]
    [akpm@linux-foundation.org: kpageidle requires an MMU]
    [akpm@linux-foundation.org: decouple from page-flags rework]
    Signed-off-by: Vladimir Davydov
    Reviewed-by: Andres Lagar-Cavilla
    Cc: Minchan Kim
    Cc: Raghavendra K T
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Michel Lespinasse
    Cc: David Rientjes
    Cc: Pavel Emelyanov
    Cc: Cyrill Gorcunov
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

14 Dec, 2014

3 commits

  • This is the page owner tracking code which is introduced so far ago. It
    is resident on Andrew's tree, though, nobody tried to upstream so it
    remain as is. Our company uses this feature actively to debug memory leak
    or to find a memory hogger so I decide to upstream this feature.

    This functionality help us to know who allocates the page. When
    allocating a page, we store some information about allocation in extra
    memory. Later, if we need to know status of all pages, we can get and
    analyze it from this stored information.

    In previous version of this feature, extra memory is statically defined in
    struct page, but, in this version, extra memory is allocated outside of
    struct page. It enables us to turn on/off this feature at boottime
    without considerable memory waste.

    Although we already have tracepoint for tracing page allocation/free,
    using it to analyze page owner is rather complex. We need to enlarge the
    trace buffer for preventing overlapping until userspace program launched.
    And, launched program continually dump out the trace buffer for later
    analysis and it would change system behaviour with more possibility rather
    than just keeping it in memory, so bad for debug.

    Moreover, we can use page_owner feature further for various purposes. For
    example, we can use it for fragmentation statistics implemented in this
    patch. And, I also plan to implement some CMA failure debugging feature
    using this interface.

    I'd like to give the credit for all developers contributed this feature,
    but, it's not easy because I don't know exact history. Sorry about that.
    Below is people who has "Signed-off-by" in the patches in Andrew's tree.

    Contributor:
    Alexander Nyberg
    Mel Gorman
    Dave Hansen
    Minchan Kim
    Michal Nazarewicz
    Andrew Morton
    Jungsoo Son

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Until now, debug-pagealloc needs extra flags in struct page, so we need to
    recompile whole source code when we decide to use it. This is really
    painful, because it takes some time to recompile and sometimes rebuild is
    not possible due to third party module depending on struct page. So, we
    can't use this good feature in many cases.

    Now, we have the page extension feature that allows us to insert extra
    flags to outside of struct page. This gets rid of third party module
    issue mentioned above. And, this allows us to determine if we need extra
    memory for this page extension in boottime. With these property, we can
    avoid using debug-pagealloc in boottime with low computational overhead in
    the kernel built with CONFIG_DEBUG_PAGEALLOC. This will help our
    development process greatly.

    This patch is the preparation step to achive above goal. debug-pagealloc
    originally uses extra field of struct page, but, after this patch, it will
    use field of struct page_ext. Because memory for page_ext is allocated
    later than initialization of page allocator in CONFIG_SPARSEMEM, we should
    disable debug-pagealloc feature temporarily until initialization of
    page_ext. This patch implements this.

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • When we debug something, we'd like to insert some information to every
    page. For this purpose, we sometimes modify struct page itself. But,
    this has drawbacks. First, it requires re-compile. This makes us
    hesitate to use the powerful debug feature so development process is
    slowed down. And, second, sometimes it is impossible to rebuild the
    kernel due to third party module dependency. At third, system behaviour
    would be largely different after re-compile, because it changes size of
    struct page greatly and this structure is accessed by every part of
    kernel. Keeping this as it is would be better to reproduce errornous
    situation.

    This feature is intended to overcome above mentioned problems. This
    feature allocates memory for extended data per page in certain place
    rather than the struct page itself. This memory can be accessed by the
    accessor functions provided by this code. During the boot process, it
    checks whether allocation of huge chunk of memory is needed or not. If
    not, it avoids allocating memory at all. With this advantage, we can
    include this feature into the kernel in default and can avoid rebuild and
    solve related problems.

    Until now, memcg uses this technique. But, now, memcg decides to embed
    their variable to struct page itself and it's code to extend struct page
    has been removed. I'd like to use this code to develop debug feature, so
    this patch resurrect it.

    To help these things to work well, this patch introduces two callbacks for
    clients. One is the need callback which is mandatory if user wants to
    avoid useless memory allocation at boot-time. The other is optional, init
    callback, which is used to do proper initialization after memory is
    allocated. Detailed explanation about purpose of these functions is in
    code comment. Please refer it.

    Others are completely same with previous extension code in memcg.

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim