15 May, 2019

1 commit

  • The code comment above sparse_add_one_section() is obsolete and incorrect.
    Clean it up and write a new one.

    Link: http://lkml.kernel.org/r/20190329144250.14315-1-bhe@redhat.com
    Signed-off-by: Baoquan He
    Acked-by: Michal Hocko
    Reviewed-by: Oscar Salvador
    Reviewed-by: Mukesh Ojha
    Reviewed-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     

30 Mar, 2019

1 commit

  • Commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
    memory to zones until online") introduced move_pfn_range_to_zone() which
    calls memmap_init_zone() during onlining a memory block.
    memmap_init_zone() will reset pagetype flags and makes migrate type to
    be MOVABLE.

    However, in __offline_pages(), it also call undo_isolate_page_range()
    after offline_isolated_pages() to do the same thing. Due to commit
    2ce13640b3f4 ("mm: __first_valid_page skip over offline pages") changed
    __first_valid_page() to skip offline pages, undo_isolate_page_range()
    here just waste CPU cycles looping around the offlining PFN range while
    doing nothing, because __first_valid_page() will return NULL as
    offline_isolated_pages() has already marked all memory sections within
    the pfn range as offline via offline_mem_sections().

    Also, after calling the "useless" undo_isolate_page_range() here, it
    reaches the point of no returning by notifying MEM_OFFLINE. Those pages
    will be marked as MIGRATE_MOVABLE again once onlining. The only thing
    left to do is to decrease the number of isolated pageblocks zone counter
    which would make some paths of the page allocation slower that the above
    commit introduced.

    Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages
    on ppc64, an "int" should still be enough to represent the number of
    pageblocks there. Fix an incorrect comment along the way.

    [cai@lca.pw: v4]
    Link: http://lkml.kernel.org/r/20190314150641.59358-1-cai@lca.pw
    Link: http://lkml.kernel.org/r/20190313143133.46200-1-cai@lca.pw
    Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages")
    Signed-off-by: Qian Cai
    Acked-by: Michal Hocko
    Reviewed-by: Oscar Salvador
    Cc: Vlastimil Babka
    Cc: [4.13+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     

13 Mar, 2019

2 commits

  • As all the memblock allocation functions return NULL in case of error
    rather than panic(), the duplicates with _nopanic suffix can be removed.

    Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Greg Kroah-Hartman
    Reviewed-by: Petr Mladek [printk]
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Guo Ren [c-sky]
    Cc: Heiko Carstens
    Cc: Juergen Gross [Xen]
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Paul Burton
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Add check for the return value of memblock_alloc*() functions and call
    panic() in case of error. The panic message repeats the one used by
    panicing memblock allocators with adjustment of parameters to include
    only relevant ones.

    The replacement was mostly automated with semantic patches like the one
    below with manual massaging of format strings.

    @@
    expression ptr, size, align;
    @@
    ptr = memblock_alloc(size, align);
    + if (!ptr)
    + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align);

    [anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type]
    Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org
    [rppt@linux.ibm.com: fix format strings for panics after memblock_alloc]
    Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com
    [rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails]
    Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx
    [akpm@linux-foundation.org: fix xtensa printk warning]
    Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Anders Roxell
    Reviewed-by: Guo Ren [c-sky]
    Acked-by: Paul Burton [MIPS]
    Acked-by: Heiko Carstens [s390]
    Reviewed-by: Juergen Gross [Xen]
    Reviewed-by: Geert Uytterhoeven [m68k]
    Acked-by: Max Filippov [xtensa]
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Petr Mladek
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

06 Mar, 2019

1 commit

  • next_present_section_nr() could only return an unsigned number -1, so
    just check it specifically where compilers will convert -1 to unsigned
    if needed.

    mm/sparse.c: In function 'sparse_init_nid':
    mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
    ((section_nr >= 0) && \
    ^~
    mm/sparse.c:478:2: note: in expansion of macro
    'for_each_present_section_nr'
    for_each_present_section_nr(pnum_begin, pnum) {
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
    ((section_nr >= 0) && \
    ^~
    mm/sparse.c:497:2: note: in expansion of macro
    'for_each_present_section_nr'
    for_each_present_section_nr(pnum_begin, pnum) {
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    mm/sparse.c: In function 'sparse_init':
    mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
    ((section_nr >= 0) && \
    ^~
    mm/sparse.c:520:2: note: in expansion of macro
    'for_each_present_section_nr'
    for_each_present_section_nr(pnum_begin + 1, pnum_end) {
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~

    Link: http://lkml.kernel.org/r/20190228181839.86504-1-cai@lca.pw
    Fixes: c4e1be9ec113 ("mm, sparsemem: break out of loops early")
    Signed-off-by: Qian Cai
    Reviewed-by: Andrew Morton
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     

29 Dec, 2018

3 commits

  • Since the information needed in sparse_add_one_section() is node id to
    allocate proper memory, it is not necessary to pass its pgdat.

    This patch changes the prototype of sparse_add_one_section() to pass node
    id directly. This is intended to reduce misleading that
    sparse_add_one_section() would touch pgdat.

    Link: http://lkml.kernel.org/r/20181204085657.20472-2-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang
    Reviewed-by: David Hildenbrand
    Acked-by: Michal Hocko
    Cc: Dave Hansen
    Cc: Oscar Salvador
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • pgdat_resize_lock is used to protect pgdat's memory region information
    like: node_start_pfn, node_present_pages, etc. While in function
    sparse_add/remove_one_section(), pgdat_resize_lock is used to protect
    initialization/release of one mem_section. This looks not proper.

    These code paths are currently protected by mem_hotplug_lock currently but
    should there ever be any reason for locking at the sparse layer a
    dedicated lock should be introduced.

    Following is the current call trace of sparse_add/remove_one_section()

    mem_hotplug_begin()
    arch_add_memory()
    add_pages()
    __add_pages()
    __add_section()
    sparse_add_one_section()
    mem_hotplug_done()

    mem_hotplug_begin()
    arch_remove_memory()
    __remove_pages()
    __remove_section()
    sparse_remove_one_section()
    mem_hotplug_done()

    The comment above the pgdat_resize_lock also mentions "Holding this will
    also guarantee that any pfn_valid() stays that way.", which is true with
    the current implementation and false after this patch. But current
    implementation doesn't meet this comment. There isn't any pfn walkers to
    take the lock so this looks like a relict from the past. This patch also
    removes this comment.

    [richard.weiyang@gmail.com: v4]
    Link: http://lkml.kernel.org/r/20181204085657.20472-1-richard.weiyang@gmail.com
    [mhocko@suse.com: changelog suggestion]
    Link: http://lkml.kernel.org/r/20181128091243.19249-1-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang
    Reviewed-by: David Hildenbrand
    Acked-by: Michal Hocko
    Cc: Dave Hansen
    Cc: Oscar Salvador
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • In hot remove, we try to clear poisoned pages, but a small optimization to
    check if num_poisoned_pages is 0 helps remove the iteration through
    nr_pages.

    [akpm@linux-foundation.org: tweak comment text]
    Link: http://lkml.kernel.org/r/20181102120001.4526-1-bsingharora@gmail.com
    Signed-off-by: Balbir Singh
    Acked-by: Michal Hocko
    Acked-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     

15 Dec, 2018

1 commit

  • Presently the arches arm64, arm and sh have a function which loops
    through each memblock and calls memory present. riscv will require a
    similar function.

    Introduce a common memblocks_present() function that can be used by all
    the arches. Subsequent patches will cleanup the arches that make use of
    this.

    Link: http://lkml.kernel.org/r/20181107205433.3875-3-logang@deltatee.com
    Signed-off-by: Logan Gunthorpe
    Acked-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Oscar Salvador
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Logan Gunthorpe
     

31 Oct, 2018

5 commits

  • When a memblock allocation APIs are called with align = 0, the alignment
    is implicitly set to SMP_CACHE_BYTES.

    Implicit alignment is done deep in the memblock allocator and it can
    come as a surprise. Not that such an alignment would be wrong even
    when used incorrectly but it is better to be explicit for the sake of
    clarity and the prinicple of the least surprise.

    Replace all such uses of memblock APIs with the 'align' parameter
    explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
    in the memblock internal allocation functions.

    For the case when memblock APIs are used via helper functions, e.g. like
    iommu_arena_new_node() in Alpha, the helper functions were detected with
    Coccinelle's help and then manually examined and updated where
    appropriate.

    The direct memblock APIs users were updated using the semantic patch below:

    @@
    expression size, min_addr, max_addr, nid;
    @@
    (
    |
    - memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
    + memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
    nid)
    |
    - memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
    + memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
    nid)
    |
    - memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
    + memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
    |
    - memblock_alloc(size, 0)
    + memblock_alloc(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_raw(size, 0)
    + memblock_alloc_raw(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_from(size, 0, min_addr)
    + memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
    |
    - memblock_alloc_nopanic(size, 0)
    + memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_low(size, 0)
    + memblock_alloc_low(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_low_nopanic(size, 0)
    + memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_from_nopanic(size, 0, min_addr)
    + memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
    |
    - memblock_alloc_node(size, 0, nid)
    + memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
    )

    [mhocko@suse.com: changelog update]
    [akpm@linux-foundation.org: coding-style fixes]
    [rppt@linux.ibm.com: fix missed uses of implicit alignment]
    Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
    Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Suggested-by: Michal Hocko
    Acked-by: Paul Burton [MIPS]
    Acked-by: Michael Ellerman [powerpc]
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: Geert Uytterhoeven
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: Matt Turner
    Cc: Michal Simek
    Cc: Richard Weinberger
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Move remaining definitions and declarations from include/linux/bootmem.h
    into include/linux/memblock.h and remove the redundant header.

    The includes were replaced with the semantic patch below and then
    semi-automated removal of duplicated '#include

    @@
    @@
    - #include
    + #include

    [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
    [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
    [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
    Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
    Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Drop BOOTMEM_ALLOC_ACCESSIBLE and BOOTMEM_ALLOC_ANYWHERE in favor of
    identical MEMBLOCK definitions.

    Link: http://lkml.kernel.org/r/1536927045-23536-29-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • With the align parameter memblock_alloc_node() can be used as drop in
    replacement for alloc_bootmem_pages_node() and __alloc_bootmem_node(),
    which is done in the following patches.

    Link: http://lkml.kernel.org/r/1536927045-23536-15-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The conversion is done using

    sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
    $(git grep -l memblock_virt_alloc)

    Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

27 Oct, 2018

1 commit

  • Patch series "Address issues slowing persistent memory initialization", v5.

    The main thing this patch set achieves is that it allows us to initialize
    each node worth of persistent memory independently. As a result we reduce
    page init time by about 2 minutes because instead of taking 30 to 40
    seconds per node and going through each node one at a time, we process all
    4 nodes in parallel in the case of a 12TB persistent memory setup spread
    evenly over 4 nodes.

    This patch (of 3):

    On systems with a large amount of memory it can take a significant amount
    of time to initialize all of the page structs with the PAGE_POISON_PATTERN
    value. I have seen it take over 2 minutes to initialize a system with
    over 12TB of RAM.

    In order to work around the issue I had to disable CONFIG_DEBUG_VM and
    then the boot time returned to something much more reasonable as the
    arch_add_memory call completed in milliseconds versus seconds. However in
    doing that I had to disable all of the other VM debugging on the system.

    In order to work around a kernel that might have CONFIG_DEBUG_VM enabled
    on a system that has a large amount of memory I have added a new kernel
    parameter named "vm_debug" that can be set to "-" in order to disable it.

    Link: http://lkml.kernel.org/r/20180925201921.3576.84239.stgit@localhost.localdomain
    Reviewed-by: Pavel Tatashin
    Signed-off-by: Alexander Duyck
    Cc: Dave Hansen
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Duyck
     

18 Aug, 2018

10 commits

  • Rename new_sparse_init() to sparse_init() which enables it. Delete old
    sparse_init() and all the code that became obsolete with.

    [pasha.tatashin@oracle.com: remove unused sparse_mem_maps_populate_node()]
    Link: http://lkml.kernel.org/r/20180716174447.14529-6-pasha.tatashin@oracle.com
    Link: http://lkml.kernel.org/r/20180712203730.8703-6-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Tested-by: Michael Ellerman [powerpc]
    Tested-by: Oscar Salvador
    Reviewed-by: Oscar Salvador
    Cc: Pasha Tatashin
    Cc: Abdul Haleem
    Cc: Baoquan He
    Cc: Daniel Jordan
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Souptick Joarder
    Cc: Steven Sistare
    Cc: Vlastimil Babka
    Cc: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • sparse_init() requires to temporary allocate two large buffers: usemap_map
    and map_map. Baoquan He has identified that these buffers are so large
    that Linux is not bootable on small memory machines, such as a kdump boot.
    The buffers are especially large when CONFIG_X86_5LEVEL is set, as they
    are scaled to the maximum physical memory size.

    Baoquan provided a fix, which reduces these sizes of these buffers, but it
    is much better to get rid of them entirely.

    Add a new way to initialize sparse memory: sparse_init_nid(), which only
    operates within one memory node, and thus allocates memory either in large
    contiguous block or allocates section by section. This eliminates the
    need for use of temporary buffers.

    For simplified bisecting and review temporarly call sparse_init()
    new_sparse_init(), the new interface is going to be enabled as well as old
    code removed in the next patch.

    Link: http://lkml.kernel.org/r/20180712203730.8703-5-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Reviewed-by: Oscar Salvador
    Tested-by: Oscar Salvador
    Tested-by: Michael Ellerman [powerpc]
    Cc: Pasha Tatashin
    Cc: Abdul Haleem
    Cc: Baoquan He
    Cc: Daniel Jordan
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Souptick Joarder
    Cc: Steven Sistare
    Cc: Vlastimil Babka
    Cc: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • Now that both variants of sparse memory use the same buffers to populate
    memory map, we can move sparse_buffer_init()/sparse_buffer_fini() to the
    common place.

    Link: http://lkml.kernel.org/r/20180712203730.8703-4-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Tested-by: Michael Ellerman [powerpc]
    Tested-by: Oscar Salvador
    Reviewed-by: Andrew Morton
    Cc: Pasha Tatashin
    Cc: Abdul Haleem
    Cc: Baoquan He
    Cc: Daniel Jordan
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Souptick Joarder
    Cc: Steven Sistare
    Cc: Vlastimil Babka
    Cc: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • non-vmemmap sparse also allocated large contiguous chunk of memory, and if
    fails falls back to smaller allocations. Use the same functions to
    allocate buffer as the vmemmap-sparse

    Link: http://lkml.kernel.org/r/20180712203730.8703-3-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Tested-by: Michael Ellerman [powerpc]
    Reviewed-by: Oscar Salvador
    Tested-by: Oscar Salvador
    Cc: Pasha Tatashin
    Cc: Abdul Haleem
    Cc: Baoquan He
    Cc: Daniel Jordan
    Cc: Dan Williams
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Souptick Joarder
    Cc: Steven Sistare
    Cc: Vlastimil Babka
    Cc: Wei Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • Patch series "sparse_init rewrite", v6.

    In sparse_init() we allocate two large buffers to temporary hold usemap
    and memmap for the whole machine. However, we can avoid doing that if
    we changed sparse_init() to operated on per-node bases instead of doing
    it on the whole machine beforehand.

    As shown by Baoquan
    http://lkml.kernel.org/r/20180628062857.29658-1-bhe@redhat.com

    The buffers are large enough to cause machine stop to boot on small
    memory systems.

    Another benefit of these changes is that they also obsolete
    CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER.

    This patch (of 5):

    When struct pages are allocated for sparse-vmemmap VA layout, we first try
    to allocate one large buffer, and than if that fails allocate struct pages
    for each section as we go.

    The code that allocates buffer is uses global variables and is spread
    across several call sites.

    Cleanup the code by introducing three functions to handle the global
    buffer:

    sparse_buffer_init() initialize the buffer
    sparse_buffer_fini() free the remaining part of the buffer
    sparse_buffer_alloc() alloc from the buffer, and if buffer is empty
    return NULL

    Define these functions in sparse.c instead of sparse-vmemmap.c because
    later we will use them for non-vmemmap sparse allocations as well.

    [akpm@linux-foundation.org: use PTR_ALIGN()]
    [akpm@linux-foundation.org: s/BUG_ON/WARN_ON/]
    Link: http://lkml.kernel.org/r/20180712203730.8703-2-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Tested-by: Michael Ellerman [powerpc]
    Reviewed-by: Oscar Salvador
    Tested-by: Oscar Salvador
    Cc: Pasha Tatashin
    Cc: Steven Sistare
    Cc: Daniel Jordan
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Jan Kara
    Cc: Jérôme Glisse
    Cc: Souptick Joarder
    Cc: Baoquan He
    Cc: Greg Kroah-Hartman
    Cc: Vlastimil Babka
    Cc: Wei Yang
    Cc: Dave Hansen
    Cc: David Rientjes
    Cc: Ingo Molnar
    Cc: Abdul Haleem
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • In sparse_init(), two temporary pointer arrays, usemap_map and map_map
    are allocated with the size of NR_MEM_SECTIONS. They are used to store
    each memory section's usemap and mem map if marked as present. With the
    help of these two arrays, continuous memory chunk is allocated for
    usemap and memmap for memory sections on one node. This avoids too many
    memory fragmentations. Like below diagram, '1' indicates the present
    memory section, '0' means absent one. The number 'n' could be much
    smaller than NR_MEM_SECTIONS on most of systems.

    |1|1|1|1|0|0|0|0|1|1|0|0|...|1|0||1|0|...|1||0|1|...|0|
    -------------------------------------------------------
    0 1 2 3 4 5 i i+1 n-1 n

    If we fail to populate the page tables to map one section's memmap, its
    ->section_mem_map will be cleared finally to indicate that it's not
    present. After use, these two arrays will be released at the end of
    sparse_init().

    In 4-level paging mode, each array costs 4M which can be ignorable.
    While in 5-level paging, they costs 256M each, 512M altogether. Kdump
    kernel Usually only reserves very few memory, e.g 256M. So, even thouth
    they are temporarily allocated, still not acceptable.

    In fact, there's no need to allocate them with the size of
    NR_MEM_SECTIONS. Since the ->section_mem_map clearing has been deferred
    to the last, the number of present memory sections are kept the same
    during sparse_init() until we finally clear out the memory section's
    ->section_mem_map if its usemap or memmap is not correctly handled.
    Thus in the middle whenever for_each_present_section_nr() loop is taken,
    the i-th present memory section is always the same one.

    Here only allocate usemap_map and map_map with the size of
    'nr_present_sections'. For the i-th present memory section, install its
    usemap and memmap to usemap_map[i] and mam_map[i] during allocation.
    Then in the last for_each_present_section_nr() loop which clears the
    failed memory section's ->section_mem_map, fetch usemap and memmap from
    usemap_map[] and map_map[] array and set them into mem_section[]
    accordingly.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20180628062857.29658-5-bhe@redhat.com
    Signed-off-by: Baoquan He
    Reviewed-by: Pavel Tatashin
    Cc: Pasha Tatashin
    Cc: Dave Hansen
    Cc: Kirill A. Shutemov
    Cc: Oscar Salvador
    Cc: Pankaj Gupta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     
  • It's used to pass the size of map data unit into
    alloc_usemap_and_memmap, and is preparation for next patch.

    Link: http://lkml.kernel.org/r/20180228032657.32385-4-bhe@redhat.com
    Signed-off-by: Baoquan He
    Reviewed-by: Pavel Tatashin
    Reviewed-by: Oscar Salvador
    Cc: Pasha Tatashin
    Cc: Kirill A. Shutemov
    Cc: Pankaj Gupta
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     
  • In sparse_init(), if CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y, system
    will allocate one continuous memory chunk for mem maps on one node and
    populate the relevant page tables to map memory section one by one. If
    fail to populate for a certain mem section, print warning and its
    ->section_mem_map will be cleared to cancel the marking of being
    present. Like this, the number of mem sections marked as present could
    become less during sparse_init() execution.

    Here just defer the ms->section_mem_map clearing if failed to populate
    its page tables until the last for_each_present_section_nr() loop. This
    is in preparation for later optimizing the mem map allocation.

    [akpm@linux-foundation.org: remove now-unused local `ms', per Oscar]
    Link: http://lkml.kernel.org/r/20180228032657.32385-3-bhe@redhat.com
    Signed-off-by: Baoquan He
    Acked-by: Dave Hansen
    Reviewed-by: Pavel Tatashin
    Reviewed-by: Oscar Salvador
    Cc: Pasha Tatashin
    Cc: Kirill A. Shutemov
    Cc: Pankaj Gupta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     
  • Patch series "mm/sparse: Optimize memmap allocation during
    sparse_init()", v6.

    In sparse_init(), two temporary pointer arrays, usemap_map and map_map
    are allocated with the size of NR_MEM_SECTIONS. They are used to store
    each memory section's usemap and mem map if marked as present. In
    5-level paging mode, this will cost 512M memory though they will be
    released at the end of sparse_init(). System with few memory, like
    kdump kernel which usually only has about 256M, will fail to boot
    because of allocation failure if CONFIG_X86_5LEVEL=y.

    In this patchset, optimize the memmap allocation code to only use
    usemap_map and map_map with the size of nr_present_sections. This makes
    kdump kernel boot up with normal crashkernel='' setting when
    CONFIG_X86_5LEVEL=y.

    This patch (of 5):

    nr_present_sections is used to record how many memory sections are
    marked as present during system boot up, and will be used in the later
    patch.

    Link: http://lkml.kernel.org/r/20180228032657.32385-2-bhe@redhat.com
    Signed-off-by: Baoquan He
    Acked-by: Dave Hansen
    Reviewed-by: Andrew Morton
    Reviewed-by: Pavel Tatashin
    Reviewed-by: Oscar Salvador
    Cc: Pasha Tatashin
    Cc: Kirill A. Shutemov
    Cc: Pankaj Gupta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     
  • sparse_init_one_section() is being called from two sites: sparse_init()
    and sparse_add_one_section(). The former calls it from a
    for_each_present_section_nr() loop, and the latter marks the section as
    present before calling it. This means that when
    sparse_init_one_section() gets called, we already know that the section
    is present. So there is no point to double check that in the function.

    This removes the check and makes the function void.

    [ross.zwisler@linux.intel.com: fix error path in sparse_add_one_section]
    Link: http://lkml.kernel.org/r/20180706190658.6873-1-ross.zwisler@linux.intel.com
    [ross.zwisler@linux.intel.com: simplification suggested by Oscar]
    Link: http://lkml.kernel.org/r/20180706223358.742-1-ross.zwisler@linux.intel.com
    Link: http://lkml.kernel.org/r/20180702154325.12196-1-osalvador@techadventures.net
    Signed-off-by: Oscar Salvador
    Acked-by: Michal Hocko
    Reviewed-by: Pavel Tatashin
    Reviewed-by: Andrew Morton
    Cc: Pasha Tatashin
    Cc: Oscar Salvador
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oscar Salvador
     

08 Jun, 2018

2 commits

  • In commit c4e1be9ec113 ("mm, sparsemem: break out of loops early")
    __highest_present_section_nr is introduced to reduce the loop counts for
    present section. This is also helpful for usemap and memmap allocation.

    This patch uses __highest_present_section_nr + 1 to optimize the loop.

    Link: http://lkml.kernel.org/r/20180326081956.75275-1-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang
    Reviewed-by: Andrew Morton
    Cc: David Rientjes
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • When searching a present section, there are two boundaries:

    * __highest_present_section_nr
    * NR_MEM_SECTIONS

    And it is known, __highest_present_section_nr is a more strict boundary
    than NR_MEM_SECTIONS. This means it would be necessary to check
    __highest_present_section_nr only.

    Link: http://lkml.kernel.org/r/20180326081956.75275-2-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang
    Acked-by: David Rientjes
    Reviewed-by: Andrew Morton
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     

12 May, 2018

1 commit

  • Memory hotplug and hotremove operate with per-block granularity. If the
    machine has a large amount of memory (more than 64G), the size of a
    memory block can span multiple sections. By mistake, during hotremove
    we set only the first section to offline state.

    The bug was discovered because kernel selftest started to fail:
    https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop

    After commit, "mm/memory_hotplug: optimize probe routine". But, the bug
    is older than this commit. In this optimization we also added a check
    for sections to be in a proper state during hotplug operation.

    Link: http://lkml.kernel.org/r/20180427145257.15222-1-pasha.tatashin@oracle.com
    Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
    Signed-off-by: Pavel Tatashin
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Cc: Vlastimil Babka
    Cc: Steven Sistare
    Cc: Daniel Jordan
    Cc: "Kirill A. Shutemov"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     

06 Apr, 2018

1 commit

  • During memory hotplugging we traverse struct pages three times:

    1. memset(0) in sparse_add_one_section()
    2. loop in __add_section() to set do: set_page_node(page, nid); and
    SetPageReserved(page);
    3. loop in memmap_init_zone() to call __init_single_pfn()

    This patch removes the first two loops, and leaves only loop 3. All
    struct pages are initialized in one place, the same as it is done during
    boot.

    The benefits:

    - We improve memory hotplug performance because we are not evicting the
    cache several times and also reduce loop branching overhead.

    - Remove condition from hotpath in __init_single_pfn(), that was added
    in order to fix the problem that was reported by Bharata in the above
    email thread, thus also improve performance during normal boot.

    - Make memory hotplug more similar to the boot memory initialization
    path because we zero and initialize struct pages only in one
    function.

    - Simplifies memory hotplug struct page initialization code, and thus
    enables future improvements, such as multi-threading the
    initialization of struct pages in order to improve hotplug
    performance even further on larger machines.

    [pasha.tatashin@oracle.com: v5]
    Link: http://lkml.kernel.org/r/20180228030308.1116-7-pasha.tatashin@oracle.com
    Link: http://lkml.kernel.org/r/20180215165920.8570-7-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Reviewed-by: Ingo Molnar
    Cc: Michal Hocko
    Cc: Baoquan He
    Cc: Bharata B Rao
    Cc: Daniel Jordan
    Cc: Dan Williams
    Cc: Greg Kroah-Hartman
    Cc: "H. Peter Anvin"
    Cc: Kirill A. Shutemov
    Cc: Mel Gorman
    Cc: Steven Sistare
    Cc: Thomas Gleixner
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     

03 Apr, 2018

1 commit

  • Pul removal of obsolete architecture ports from Arnd Bergmann:
    "This removes the entire architecture code for blackfin, cris, frv,
    m32r, metag, mn10300, score, and tile, including the associated device
    drivers.

    I have been working with the (former) maintainers for each one to
    ensure that my interpretation was right and the code is definitely
    unused in mainline kernels. Many had fond memories of working on the
    respective ports to start with and getting them included in upstream,
    but also saw no point in keeping the port alive without any users.

    In the end, it seems that while the eight architectures are extremely
    different, they all suffered the same fate: There was one company in
    charge of an SoC line, a CPU microarchitecture and a software
    ecosystem, which was more costly than licensing newer off-the-shelf
    CPU cores from a third party (typically ARM, MIPS, or RISC-V). It
    seems that all the SoC product lines are still around, but have not
    used the custom CPU architectures for several years at this point. In
    contrast, CPU instruction sets that remain popular and have actively
    maintained kernel ports tend to all be used across multiple licensees.

    [ See the new nds32 port merged in the previous commit for the next
    generation of "one company in charge of an SoC line, a CPU
    microarchitecture and a software ecosystem" - Linus ]

    The removal came out of a discussion that is now documented at
    https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
    marking any ports as deprecated but remove them all at once after I
    made sure that they are all unused. Some architectures (notably tile,
    mn10300, and blackfin) are still being shipped in products with old
    kernels, but those products will never be updated to newer kernel
    releases.

    After this series, we still have a few architectures without mainline
    gcc support:

    - unicore32 and hexagon both have very outdated gcc releases, but the
    maintainers promised to work on providing something newer. At least
    in case of hexagon, this will only be llvm, not gcc.

    - openrisc, risc-v and nds32 are still in the process of finishing
    their support or getting it added to mainline gcc in the first
    place. They all have patched gcc-7.3 ports that work to some
    degree, but complete upstream support won't happen before gcc-8.1.
    Csky posted their first kernel patch set last week, their situation
    will be similar

    [ Palmer Dabbelt points out that RISC-V support is in mainline gcc
    since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]"

    This really says it all:

    2498 files changed, 95 insertions(+), 467668 deletions(-)

    * tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits)
    MAINTAINERS: UNICORE32: Change email account
    staging: iio: remove iio-trig-bfin-timer driver
    tty: hvc: remove tile driver
    tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers
    serial: remove tile uart driver
    serial: remove m32r_sio driver
    serial: remove blackfin drivers
    serial: remove cris/etrax uart drivers
    usb: Remove Blackfin references in USB support
    usb: isp1362: remove blackfin arch glue
    usb: musb: remove blackfin port
    usb: host: remove tilegx platform glue
    pwm: remove pwm-bfin driver
    i2c: remove bfin-twi driver
    spi: remove blackfin related host drivers
    watchdog: remove bfin_wdt driver
    can: remove bfin_can driver
    mmc: remove bfin_sdh driver
    input: misc: remove blackfin rotary driver
    input: keyboard: remove bf54x driver
    ...

    Linus Torvalds
     

27 Mar, 2018

1 commit


16 Mar, 2018

1 commit

  • Tile was the only remaining architecture to implement alloc_remap(),
    and since that is being removed, there is no point in keeping this
    function.

    Removing all callers simplifies the mem_map handling.

    Reviewed-by: Pavel Tatashin
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

07 Feb, 2018

1 commit

  • Pull libnvdimm updates from Ross Zwisler:

    - Require struct page by default for filesystem DAX to remove a number
    of surprising failure cases. This includes failures with direct I/O,
    gdb and fork(2).

    - Add support for the new Platform Capabilities Structure added to the
    NFIT in ACPI 6.2a. This new table tells us whether the platform
    supports flushing of CPU and memory controller caches on unexpected
    power loss events.

    - Revamp vmem_altmap and dev_pagemap handling to clean up code and
    better support future future PCI P2P uses.

    - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has
    become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL
    spec, and instead rely on the generic ND_CMD_CALL approach used by
    the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}.

    - Enhance nfit_test so we can test some of the new things added in
    version 1.6 of the DSM specification. This includes testing firmware
    download and simulating the Last Shutdown State (LSS) status.

    * tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits)
    libnvdimm, namespace: remove redundant initialization of 'nd_mapping'
    acpi, nfit: fix register dimm error handling
    libnvdimm, namespace: make min namespace size 4K
    tools/testing/nvdimm: force nfit_test to depend on instrumented modules
    libnvdimm/nfit_test: adding support for unit testing enable LSS status
    libnvdimm/nfit_test: add firmware download emulation
    nfit-test: Add platform cap support from ACPI 6.2a to test
    libnvdimm: expose platform persistence attribute for nd_region
    acpi: nfit: add persistent memory control flag for nd_region
    acpi: nfit: Add support for detect platform CPU cache flush on power loss
    device-dax: Fix trailing semicolon
    libnvdimm, btt: fix uninitialized err_lock
    dax: require 'struct page' by default for filesystem dax
    ext2: auto disable dax instead of failing mount
    ext4: auto disable dax instead of failing mount
    mm, dax: introduce pfn_t_special()
    mm: Fix devm_memremap_pages() collision handling
    mm: Fix memory size alignment in devm_memremap_pages_release()
    memremap: merge find_dev_pagemap into get_dev_pagemap
    memremap: change devm_memremap_pages interface to use struct dev_pagemap
    ...

    Linus Torvalds
     

03 Feb, 2018

1 commit


01 Feb, 2018

1 commit

  • The comment is confusing. On the one hand, it refers to 32-bit
    alignment (struct page alignment on 32-bit platforms), but this would
    only guarantee that the 2 lowest bits must be zero. On the other hand,
    it claims that at least 3 bits are available, and 3 bits are actually
    used.

    This is not broken, because there is a stronger alignment guarantee,
    just less obvious. Let's fix the comment to make it clear how many bits
    are available and why.

    Although memmap arrays are allocated in various places, the resulting
    pointer is encoded eventually, so I am adding a BUG_ON() here to enforce
    at runtime that all expected bits are indeed available.

    I have also added a BUILD_BUG_ON to check that PFN_SECTION_SHIFT is
    sufficient, because this part of the calculation can be easily checked
    at build time.

    [ptesarik@suse.com: v2]
    Link: http://lkml.kernel.org/r/20180125100516.589ea6af@ezekiel.suse.cz
    Link: http://lkml.kernel.org/r/20180119080908.3a662e6f@ezekiel.suse.cz
    Signed-off-by: Petr Tesarik
    Acked-by: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Kemi Wang
    Cc: YASUAKI ISHIMATSU
    Cc: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Tesarik
     

09 Jan, 2018

2 commits


05 Jan, 2018

1 commit

  • In commit 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime
    for CONFIG_SPARSEMEM_EXTREME=y") mem_section is allocated at runtime to
    save memory.

    It allocates the first dimension of array with sizeof(struct mem_section).

    It costs extra memory, should be sizeof(struct mem_section *).

    Fix it.

    Link: http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com
    Fixes: 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
    Signed-off-by: Baoquan He
    Tested-by: Dave Young
    Acked-by: Kirill A. Shutemov
    Cc: Kirill A. Shutemov
    Cc: Ingo Molnar
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Atsushi Kumagai
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     

16 Nov, 2017

1 commit

  • vmemmap_alloc_block() will no longer zero the block, so zero memory at
    its call sites for everything except struct pages. Struct page memory
    is zero'd by struct page initialization.

    Replace allocators in sparse-vmemmap to use the non-zeroing version.
    So, we will get the performance improvement by zeroing the memory in
    parallel when struct pages are zeroed.

    Add struct page zeroing as a part of initialization of other fields in
    __init_single_page().

    This single thread performance collected on: Intel(R) Xeon(R) CPU E7-8895
    v3 @ 2.60GHz with 1T of memory (268400646 pages in 8 nodes):

    BASE FIX
    sparse_init 11.244671836s 0.007199623s
    zone_sizes_init 4.879775891s 8.355182299s
    --------------------------
    Total 16.124447727s 8.362381922s

    sparse_init is where memory for struct pages is zeroed, and the zeroing
    part is moved later in this patch into __init_single_page(), which is
    called from zone_sizes_init().

    [akpm@linux-foundation.org: make vmemmap_alloc_block_zero() private to sparse-vmemmap.c]
    Link: http://lkml.kernel.org/r/20171013173214.27300-10-pasha.tatashin@oracle.com
    Signed-off-by: Pavel Tatashin
    Reviewed-by: Steven Sistare
    Reviewed-by: Daniel Jordan
    Reviewed-by: Bob Picco
    Tested-by: Bob Picco
    Acked-by: Michal Hocko
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Ard Biesheuvel
    Cc: Catalin Marinas
    Cc: Christian Borntraeger
    Cc: David S. Miller
    Cc: Dmitry Vyukov
    Cc: Heiko Carstens
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Mark Rutland
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Sam Ravnborg
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     

10 Nov, 2017

1 commit