29 Mar, 2018

1 commit

  • commit f59f1caf72ba00d519c793c3deb32cd3be32edc2 upstream.

    This reverts commit b92df1de5d28 ("mm: page_alloc: skip over regions of
    invalid pfns where possible"). The commit is meant to be a boot init
    speed up skipping the loop in memmap_init_zone() for invalid pfns.

    But given some specific memory mapping on x86_64 (or more generally
    theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the
    implementation also skips valid pfns which is plain wrong and causes
    'kernel BUG at mm/page_alloc.c:1389!'

    crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
    kernel BUG at mm/page_alloc.c:1389!
    invalid opcode: 0000 [#1] SMP
    --
    RIP: 0010: move_freepages+0x15e/0x160
    --
    Call Trace:
    move_freepages_block+0x73/0x80
    __rmqueue+0x263/0x460
    get_page_from_freelist+0x7e1/0x9e0
    __alloc_pages_nodemask+0x176/0x420
    --

    crash> page_init_bug -v | grep RAM
    1000 - 9bfff System RAM (620.00 KiB)
    100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
    4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB)
    4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB)
    7b788000 - 7b7fffff System RAM (480.00 KiB)
    100000000 - 67fffffff System RAM ( 22.00 GiB)

    crash> page_init_bug | head -6
    7b788000 - 7b7fffff System RAM (480.00 KiB)
    1fffff00000000 0 1 DMA32 4096 1048575
    505736 505344 505855
    0 0 0 DMA 1 4095
    1fffff00000400 0 1 DMA32 4096 1048575
    BUG, zones differ!

    crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
    PAGE PHYSICAL MAPPING INDEX CNT FLAGS
    ffffea0001e00000 78000000 0 0 0 0
    ffffea0001ed7fc0 7b5ff000 0 0 0 0
    ffffea0001ed8000 7b600000 0 0 0 0 <<<<
    ffffea0001ede1c0 7b787000 0 0 0 0
    ffffea0001ede200 7b788000 0 0 1 1fffff00000000

    Link: http://lkml.kernel.org/r/20180316143855.29838-1-neelx@redhat.com
    Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
    Signed-off-by: Daniel Vacek
    Acked-by: Ard Biesheuvel
    Acked-by: Michal Hocko
    Reviewed-by: Andrew Morton
    Cc: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Pavel Tatashin
    Cc: Paul Burton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Daniel Vacek
     

15 Mar, 2018

1 commit

  • commit 379b03b7fa05f7db521b7732a52692448a3c34fe upstream.

    This is just a cleanup. It aids handling the special end case in the
    next commit.

    [akpm@linux-foundation.org: make it work against current -linus, not against -mm]
    [akpm@linux-foundation.org: make it work against current -linus, not against -mm some more]
    Link: http://lkml.kernel.org/r/1ca478d4269125a99bcfb1ca04d7b88ac1aee924.1520011944.git.neelx@redhat.com
    Signed-off-by: Daniel Vacek
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Pavel Tatashin
    Cc: Paul Burton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Daniel Vacek
     

26 Aug, 2017

1 commit

  • In recently introduced memblock_discard() there is a reversed logic bug.
    Memory is freed of static array instead of dynamically allocated one.

    Link: http://lkml.kernel.org/r/1503511441-95478-2-git-send-email-pasha.tatashin@oracle.com
    Fixes: 3010f876500f ("mm: discard memblock data later")
    Signed-off-by: Pavel Tatashin
    Reported-by: Woody Suwalski
    Tested-by: Woody Suwalski
    Acked-by: Michal Hocko
    Cc: Vlastimil Babka
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     

19 Aug, 2017

1 commit

  • There is existing use after free bug when deferred struct pages are
    enabled:

    The memblock_add() allocates memory for the memory array if more than
    128 entries are needed. See comment in e820__memblock_setup():

    * The bootstrap memblock region count maximum is 128 entries
    * (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
    * than that - so allow memblock resizing.

    This memblock memory is freed here:
    free_low_memory_core_early()

    We access the freed memblock.memory later in boot when deferred pages
    are initialized in this path:

    deferred_init_memmap()
    for_each_mem_pfn_range()
    __next_mem_pfn_range()
    type = &memblock.memory;

    One possible explanation for why this use-after-free hasn't been hit
    before is that the limit of INIT_MEMBLOCK_REGIONS has never been
    exceeded at least on systems where deferred struct pages were enabled.

    Tested by reducing INIT_MEMBLOCK_REGIONS down to 4 from the current 128,
    and verifying in qemu that this code is getting excuted and that the
    freed pages are sane.

    Link: http://lkml.kernel.org/r/1502485554-318703-2-git-send-email-pasha.tatashin@oracle.com
    Fixes: 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
    Signed-off-by: Pavel Tatashin
    Reviewed-by: Steven Sistare
    Reviewed-by: Daniel Jordan
    Reviewed-by: Bob Picco
    Acked-by: Michal Hocko
    Cc: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     

07 Jul, 2017

2 commits

  • movable_node_is_enabled is defined in memblock proper while it is
    initialized from the memory hotplug proper. This is quite messy and it
    makes a dependency between the two so move movable_node along with the
    helper functions to memory_hotplug.

    To make it more entertaining the kernel parameter is ignored unless
    CONFIG_HAVE_MEMBLOCK_NODE_MAP=y because we do not have the node
    information for each memblock otherwise. So let's warn when the option
    is disabled.

    Link: http://lkml.kernel.org/r/20170529114141.536-4-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Reza Arbab
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Jerome Glisse
    Cc: Yasuaki Ishimatsu
    Cc: Xishi Qiu
    Cc: Kani Toshimitsu
    Cc: Chen Yucong
    Cc: Joonsoo Kim
    Cc: Andi Kleen
    Cc: David Rientjes
    Cc: Daniel Kiper
    Cc: Igor Mammedov
    Cc: Vitaly Kuznetsov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Commit 20b2f52b73fe ("numa: add CONFIG_MOVABLE_NODE for
    movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a
    good explanation on why it is actually useful.

    It makes a lot of sense to make movable node semantic opt in but we
    already have that because the feature has to be explicitly enabled on
    the kernel command line. A config option on top only makes the
    configuration space larger without a good reason. It also adds an
    additional ifdefery that pollutes the code.

    Just drop the config option and make it de-facto always enabled. This
    shouldn't introduce any change to the semantic.

    Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Reza Arbab
    Acked-by: Vlastimil Babka
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Jerome Glisse
    Cc: Yasuaki Ishimatsu
    Cc: Xishi Qiu
    Cc: Kani Toshimitsu
    Cc: Chen Yucong
    Cc: Joonsoo Kim
    Cc: Andi Kleen
    Cc: David Rientjes
    Cc: Daniel Kiper
    Cc: Igor Mammedov
    Cc: Vitaly Kuznetsov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

03 Jun, 2017

1 commit

  • We have seen an early OOM killer invocation on ppc64 systems with
    crashkernel=4096M:

    kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
    kthreadd cpuset=/ mems_allowed=7
    CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
    Call Trace:
    dump_stack+0xb0/0xf0 (unreliable)
    dump_header+0xb0/0x258
    out_of_memory+0x5f0/0x640
    __alloc_pages_nodemask+0xa8c/0xc80
    kmem_getpages+0x84/0x1a0
    fallback_alloc+0x2a4/0x320
    kmem_cache_alloc_node+0xc0/0x2e0
    copy_process.isra.25+0x260/0x1b30
    _do_fork+0x94/0x470
    kernel_thread+0x48/0x60
    kthreadd+0x264/0x330
    ret_from_kernel_thread+0x5c/0xa4

    Mem-Info:
    active_anon:0 inactive_anon:0 isolated_anon:0
    active_file:0 inactive_file:0 isolated_file:0
    unevictable:0 dirty:0 writeback:0 unstable:0
    slab_reclaimable:5 slab_unreclaimable:73
    mapped:0 shmem:0 pagetables:0 bounce:0
    free:0 free_pcp:0 free_cma:0
    Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
    lowmem_reserve[]: 0 0 0 0
    Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
    0 total pagecache pages
    0 pages in swap cache
    Swap cache stats: add 0, delete 0, find 0/0
    Free swap = 0kB
    Total swap = 0kB
    819200 pages RAM
    0 pages HighMem/MovableOnly
    817481 pages reserved
    0 pages cma reserved
    0 pages hwpoisoned

    the reason is that the managed memory is too low (only 110MB) while the
    rest of the the 50GB is still waiting for the deferred intialization to
    be done. update_defer_init estimates the initial memoty to initialize
    to 2GB at least but it doesn't consider any memory allocated in that
    range. In this particular case we've had

    Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)

    so the low 2GB is mostly depleted.

    Fix this by considering memblock allocations in the initial static
    initialization estimation. Move the max_initialise to
    reset_deferred_meminit and implement a simple memblock_reserved_memory
    helper which iterates all reserved blocks and sums the size of all that
    start below the given address. The cumulative size is than added on top
    of the initial estimation. This is still not ideal because
    reset_deferred_meminit doesn't consider holes and so reservation might
    be above the initial estimation whihch we ignore but let's make the
    logic simpler until we really need to handle more complicated cases.

    Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
    Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
    Signed-off-by: Michal Hocko
    Acked-by: Mel Gorman
    Tested-by: Srikar Dronamraju
    Cc: [4.2+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

06 Apr, 2017

2 commits

  • Add memblock_cap_memory_range() which will remove all the memblock regions
    except the memory range specified in the arguments. In addition, rework is
    done on memblock_mem_limit_remove_map() to re-implement it using
    memblock_cap_memory_range().

    This function, like memblock_mem_limit_remove_map(), will not remove
    memblocks with MEMMAP_NOMAP attribute as they may be mapped and accessed
    later as "device memory."
    See the commit a571d4eb55d8 ("mm/memblock.c: add new infrastructure to
    address the mem limit issue").

    This function is used, in a succeeding patch in the series of arm64 kdump
    suuport, to limit the range of usable memory, or System RAM, on crash dump
    kernel.
    (Please note that "mem=" parameter is of little use for this purpose.)

    Signed-off-by: AKASHI Takahiro
    Reviewed-by: Will Deacon
    Acked-by: Catalin Marinas
    Acked-by: Dennis Chen
    Cc: linux-mm@kvack.org
    Cc: Andrew Morton
    Reviewed-by: Ard Biesheuvel
    Signed-off-by: Catalin Marinas

    AKASHI Takahiro
     
  • This function, with a combination of memblock_mark_nomap(), will be used
    in a later kdump patch for arm64 when it temporarily isolates some range
    of memory from the other memory blocks in order to create a specific
    kernel mapping at boot time.

    Signed-off-by: AKASHI Takahiro
    Reviewed-by: Ard Biesheuvel
    Signed-off-by: Catalin Marinas

    AKASHI Takahiro
     

10 Mar, 2017

1 commit

  • Obviously, we should not access memblock.memory.regions[right] if
    'right' is outside of [0..memblock.memory.cnt>.

    Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
    Link: http://lkml.kernel.org/r/20170303023745.9104-1-takahiro.akashi@linaro.org
    Signed-off-by: AKASHI Takahiro
    Cc: Paul Burton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    AKASHI Takahiro
     

25 Feb, 2017

3 commits

  • Provide the name of each memblock type with struct memblock_type. This
    allows to get rid of the function memblock_type_name() and duplicating
    the type names in __memblock_dump_all().

    The only memblock_type usage out of mm/memblock.c seems to be
    arch/s390/kernel/crash_dump.c. While at it, give it a name.

    Link: http://lkml.kernel.org/r/20170120123456.46508-4-heiko.carstens@de.ibm.com
    Signed-off-by: Heiko Carstens
    Cc: Philipp Hachtmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Since commit 70210ed950b5 ("mm/memblock: add physical memory list") the
    memblock structure knows about a physical memory list.

    The physical memory list should also be dumped if memblock_dump_all() is
    called in case memblock_debug is switched on. This makes debugging a
    bit easier.

    Link: http://lkml.kernel.org/r/20170120123456.46508-3-heiko.carstens@de.ibm.com
    Signed-off-by: Heiko Carstens
    Cc: Philipp Hachtmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Since commit 70210ed950b5 ("mm/memblock: add physical memory list") the
    memblock structure knows about a physical memory list.

    memblock_type_name() should return "physmem" instead of "unknown" if the
    name of the physmem memblock_type is being asked for.

    Link: http://lkml.kernel.org/r/20170120123456.46508-2-heiko.carstens@de.ibm.com
    Signed-off-by: Heiko Carstens
    Cc: Philipp Hachtmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

23 Feb, 2017

4 commits

  • There is no variable named flags in memblock_add() and
    memblock_reserve() so remove it from the log messages.

    This patch also cleans up the type casting for phys_addr_t by using %pa
    to print them.

    Link: http://lkml.kernel.org/r/1484720165-25403-1-git-send-email-miles.chen@mediatek.com
    Signed-off-by: Miles Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miles Chen
     
  • memblock_reserve() would add a new range to memblock.reserved in case
    the new range is not totally covered by any of the current
    memblock.reserved range. If the memblock.reserved is full and can't
    resize, memblock_reserve() would fail.

    This doesn't happen in real world now, I observed this during code
    review. While theoretically, it has the chance to happen. And if it
    happens, others would think this range of memory is still available and
    may corrupt the memory.

    This patch checks the return value and goto "done" after it succeeds.

    Link: http://lkml.kernel.org/r/1482363033-24754-3-git-send-email-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • memblock_is_region_memory() invoke memblock_search() to see whether the
    base address is in the memory region. If it fails, idx would be -1.
    Then, it returns 0.

    If the memblock_search() returns a valid index, it means the base
    address is guaranteed to be in the range memblock.memory.regions[idx].
    Because of this, it is not necessary to check the base again.

    This patch removes the check on "base".

    Link: http://lkml.kernel.org/r/1482363033-24754-2-git-send-email-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wei Yang
     
  • When using a sparse memory model memmap_init_zone() when invoked with
    the MEMMAP_EARLY context will skip over pages which aren't valid - ie.
    which aren't in a populated region of the sparse memory map. However if
    the memory map is extremely sparse then it can spend a long time
    linearly checking each PFN in a large non-populated region of the memory
    map & skipping it in turn.

    When CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled, we have sufficient
    information to quickly discover the next valid PFN given an invalid one
    by searching through the list of memory regions & skipping forwards to
    the first PFN covered by the memory region to the right of the
    non-populated region. Implement this in order to speed up
    memmap_init_zone() for systems with extremely sparse memory maps.

    James said "I have tested this patch on a virtual model of a Samurai CPU
    with a sparse memory map. The kernel boot time drops from 109 to
    62 seconds. "

    Link: http://lkml.kernel.org/r/20161125185518.29885-1-paul.burton@imgtec.com
    Signed-off-by: Paul Burton
    Tested-by: James Hartley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Burton
     

12 Oct, 2016

1 commit

  • Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a
    physical address to a virtual one using __va(). However, such physical
    addresses may sometimes be located in highmem and using __va() is
    incorrect, leading to inconsistent object tracking in kmemleak.

    The following functions have been added to the kmemleak API and they take
    a physical address as the object pointer. They only perform the
    corresponding action if the address has a lowmem mapping:

    kmemleak_alloc_phys
    kmemleak_free_part_phys
    kmemleak_not_leak_phys
    kmemleak_ignore_phys

    The affected calling places have been updated to use the new kmemleak
    API.

    Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.com
    Signed-off-by: Catalin Marinas
    Reported-by: Vignesh R
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Catalin Marinas
     

08 Oct, 2016

1 commit

  • The total reserved memory in a system is accounted but not available for
    use use outside mm/memblock.c. By exposing the total reserved memory,
    systems can better calculate the size of large hashes.

    Link: http://lkml.kernel.org/r/1472476010-4709-3-git-send-email-srikar@linux.vnet.ibm.com
    Signed-off-by: Srikar Dronamraju
    Suggested-by: Mel Gorman
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Michael Ellerman
    Cc: Mahesh Salgaonkar
    Cc: Hari Bathini
    Cc: Dave Hansen
    Cc: Balbir Singh
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srikar Dronamraju
     

05 Aug, 2016

2 commits

  • It causes NULL dereference error and failure to get type_a->regions[0]
    info if parameter type_b of __next_mem_range_rev() == NULL

    Fix this by checking before dereferring and initializing idx_b to 0

    The approach is tested by dumping all types of region via
    __memblock_dump_all() and __next_mem_range_rev() fixed to UART
    separately the result is okay after checking the logs.

    Link: http://lkml.kernel.org/r/57A0320D.6070102@zoho.com
    Signed-off-by: zijun_hu
    Tested-by: zijun_hu
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zijun_hu
     
  • s/accomodate/accommodate/

    Link: http://lkml.kernel.org/r/20160804121824.18100-1-kuleshovmail@gmail.com
    Signed-off-by: Alexander Kuleshov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Kuleshov
     

29 Jul, 2016

3 commits

  • Fix region index adjustment error when parameter type_b of
    __next_mem_range_rev() == NULL.

    Signed-off-by: zijun_hu
    Cc: Alexander Kuleshov
    Cc: Ard Biesheuvel
    Cc: Tang Chen
    Cc: Wei Yang
    Cc: Tang Chen
    Cc: Richard Leitner
    Cc: David Gibson
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zijun_hu
     
  • In some cases, memblock is queried by kernel to determine whether a
    specified address is RAM or not. For example, the ACPI core needs this
    information to determine which attributes to use when mapping ACPI
    regions(acpi_os_ioremap). Use of incorrect memory types can result in
    faults, data corruption, or other issues.

    Removing memory with memblock_enforce_memory_limit() throws away this
    information, and so a kernel booted with 'mem=' may suffer from the
    issues described above. To avoid this, we need to keep those NOMAP
    regions instead of removing all above the limit, which preserves the
    information we need while preventing other use of those regions.

    This patch adds new infrastructure to retain all NOMAP memblock regions
    while removing others, to cater for this.

    Link: http://lkml.kernel.org/r/1468475036-5852-2-git-send-email-dennis.chen@arm.com
    Signed-off-by: Dennis Chen
    Acked-by: Steve Capper
    Cc: Catalin Marinas
    Cc: Ard Biesheuvel
    Cc: Pekka Enberg
    Cc: Mel Gorman
    Cc: Tang Chen
    Cc: Tony Luck
    Cc: Ingo Molnar
    Cc: Rafael J. Wysocki
    Cc: Will Deacon
    Cc: Mark Rutland
    Cc: Matt Fleming
    Cc: Kaly Xin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dennis Chen
     
  • asm-generic headers are generic implementations for architecture
    specific code and should not be included by common code. Thus use the
    asm/ version of sections.h to get at the linker sections.

    Link: http://lkml.kernel.org/r/1468285103-7470-1-git-send-email-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

27 Jul, 2016

1 commit


21 May, 2016

2 commits

  • Comparing an u64 variable to >= 0 returns always true and can therefore
    be removed. This issue was detected using the -Wtype-limits gcc flag.

    This patch fixes following type-limits warning:

    mm/memblock.c: In function `__next_reserved_mem_region':
    mm/memblock.c:843:11: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
    if (*idx >= 0 && *idx < type->cnt) {

    Link: http://lkml.kernel.org/r/20160510103625.3a7f8f32@g0hl1n.net
    Signed-off-by: Richard Leitner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Leitner
     
  • memblock_add_region() and memblock_reserve_region() do nothing specific
    before the call of memblock_add_range(), only print debug output.

    We can do the same in memblock_add() and memblock_reserve() since both
    memblock_add_region() and memblock_reserve_region() are not used by
    anybody outside of memblock.c and memblock_{add,reserve}() have the same
    set of flags and nids.

    Since memblock_add_region() and memblock_reserve_region() will be
    inlined, there will not be functional changes, but will improve code
    readability a little.

    Signed-off-by: Alexander Kuleshov
    Acked-by: Ard Biesheuvel
    Cc: Mel Gorman
    Cc: Pekka Enberg
    Cc: Tony Luck
    Cc: Tang Chen
    Cc: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Kuleshov
     

18 Mar, 2016

1 commit

  • Kernel style prefers a single string over split strings when the string is
    'user-visible'.

    Miscellanea:

    - Add a missing newline
    - Realign arguments

    Signed-off-by: Joe Perches
    Acked-by: Tejun Heo [percpu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

16 Mar, 2016

1 commit

  • We define struct memblock_type *type in the memblock_add_region() and
    memblock_reserve_region() functions only for passing it to the
    memlock_add_range() and memblock_reserve_range() functions. Let's
    remove these variables and will pass a type directly.

    Signed-off-by: Alexander Kuleshov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Kuleshov
     

06 Feb, 2016

1 commit

  • At the moment memblock_phys_mem_size() is marked as __init, and so is
    discarded after boot. This is different from most of the memblock
    functions which are marked __init_memblock, and are only discarded after
    boot if memory hotplug is not configured.

    To allow for upcoming code which will need memblock_phys_mem_size() in
    the hotplug path, change it from __init to __init_memblock.

    Signed-off-by: David Gibson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Gibson
     

15 Jan, 2016

3 commits

  • We already have the for_each_memblock() macro in
    which provides ability to iterate over memblock regions of a known type.
    The for_each_memblock() macro allows us to pass the pointer to the
    struct memblock_type, instead we need to pass name of the type.

    This patch introduces a new macro for_each_memblock_type() which allows
    us iterate over memblock regions with the given type when the type is
    unknown.

    Signed-off-by: Alexander Kuleshov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Kuleshov
     
  • Remove rgnbase and rgnsize variables from memblock_overlaps_region().
    We use these variables only for passing to the memblock_addrs_overlap()
    function and that's all. Let's remove them.

    Signed-off-by: Alexander Kuleshov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Kuleshov
     
  • Make memblock_is_memory() and memblock_is_reserved return bool to
    improve readability due to these particular functions only using either
    one or zero as their return value.

    No functional change.

    Signed-off-by: Yaowei Bai
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yaowei Bai
     

10 Dec, 2015

1 commit

  • This introduces the MEMBLOCK_NOMAP attribute and the required plumbing
    to make it usable as an indicator that some parts of normal memory
    should not be covered by the kernel direct mapping. It is up to the
    arch to actually honor the attribute when laying out this mapping,
    but the memblock code itself is modified to disregard these regions
    for allocations and other general use.

    Cc: linux-mm@kvack.org
    Cc: Alexander Kuleshov
    Cc: Andrew Morton
    Reviewed-by: Matt Fleming
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Will Deacon

    Ard Biesheuvel
     

06 Nov, 2015

1 commit


09 Sep, 2015

5 commits

  • Signed-off-by: Alexander Kuleshov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Kuleshov
     
  • s/succees/success/

    Signed-off-by: Alexander Kuleshov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Kuleshov
     
  • Since commit e3239ff92a17 ("memblock: Rename memblock_region to
    memblock_type and memblock_property to memblock_region"), all local
    variables of the membock_type type were renamed to 'type'. This commit
    renames all remaining local variables with the memblock_type type to the
    same view.

    Signed-off-by: Alexander Kuleshov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Kuleshov
     
  • When parsing SRAT, all memory ranges are added into numa_meminfo. In
    numa_init(), before entering numa_cleanup_meminfo(), all possible memory
    ranges are in numa_meminfo. And numa_cleanup_meminfo() removes all
    ranges over max_pfn or empty.

    But, this only works if the nodes are continuous. Let's have a look at
    the following example:

    We have an SRAT like this:
    SRAT: Node 0 PXM 0 [mem 0x00000000-0x5fffffff]
    SRAT: Node 0 PXM 0 [mem 0x100000000-0x1ffffffffff]
    SRAT: Node 1 PXM 1 [mem 0x20000000000-0x3ffffffffff]
    SRAT: Node 4 PXM 2 [mem 0x40000000000-0x5ffffffffff] hotplug
    SRAT: Node 5 PXM 3 [mem 0x60000000000-0x7ffffffffff] hotplug
    SRAT: Node 2 PXM 4 [mem 0x80000000000-0x9ffffffffff] hotplug
    SRAT: Node 3 PXM 5 [mem 0xa0000000000-0xbffffffffff] hotplug
    SRAT: Node 6 PXM 6 [mem 0xc0000000000-0xdffffffffff] hotplug
    SRAT: Node 7 PXM 7 [mem 0xe0000000000-0xfffffffffff] hotplug

    On boot, only node 0,1,2,3 exist.

    And the numa_meminfo will look like this:
    numa_meminfo.nr_blks = 9
    1. on node 0: [0, 60000000]
    2. on node 0: [100000000, 20000000000]
    3. on node 1: [20000000000, 40000000000]
    4. on node 4: [40000000000, 60000000000]
    5. on node 5: [60000000000, 80000000000]
    6. on node 2: [80000000000, a0000000000]
    7. on node 3: [a0000000000, a0800000000]
    8. on node 6: [c0000000000, a0800000000]
    9. on node 7: [e0000000000, a0800000000]

    And numa_cleanup_meminfo() will merge 1 and 2, and remove 8,9 because the
    end address is over max_pfn, which is a0800000000. But 4 and 5 are not
    removed because their end addresses are less then max_pfn. But in fact,
    node 4 and 5 don't exist.

    In a word, numa_cleanup_meminfo() is not able to handle holes between nodes.

    Since memory ranges in node 4 and 5 are in numa_meminfo, in
    numa_register_memblks(), node 4 and 5 will be mistakenly set to online.

    If you run lscpu, it will show:
    NUMA node0 CPU(s): 0-14,128-142
    NUMA node1 CPU(s): 15-29,143-157
    NUMA node2 CPU(s):
    NUMA node3 CPU(s):
    NUMA node4 CPU(s): 62-76,190-204
    NUMA node5 CPU(s): 78-92,206-220

    In this patch, we use memblock_overlaps_region() to check if ranges in
    numa_meminfo overlap with ranges in memory_block. Since memory_block
    contains all available memory at boot time, if they overlap, it means the
    ranges exist. If not, then remove them from numa_meminfo.

    After this patch, lscpu will show:
    NUMA node0 CPU(s): 0-14,128-142
    NUMA node1 CPU(s): 15-29,143-157
    NUMA node4 CPU(s): 62-76,190-204
    NUMA node5 CPU(s): 78-92,206-220

    Signed-off-by: Tang Chen
    Reviewed-by: Yasuaki Ishimatsu
    Cc: Thomas Gleixner
    Cc: Tejun Heo
    Cc: Luiz Capitulino
    Cc: Xishi Qiu
    Cc: Will Deacon
    Cc: Vladimir Murzin
    Cc: Fabian Frederick
    Cc: Alexander Kuleshov
    Cc: Baoquan He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     
  • memblock_overlaps_region() checks if the given memblock region
    intersects a region in memblock. If so, it returns the index of the
    intersected region.

    But its only caller is memblock_is_region_reserved(), and it returns 0
    if false, non-zero if true.

    Both of these should return bool.

    Signed-off-by: Tang Chen
    Cc: Thomas Gleixner
    Cc: Tejun Heo
    Cc: Yasuaki Ishimatsu
    Cc: Luiz Capitulino
    Cc: Xishi Qiu
    Cc: Will Deacon
    Cc: Vladimir Murzin
    Cc: Fabian Frederick
    Cc: Alexander Kuleshov
    Cc: Baoquan He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen