12 Oct, 2016

1 commit

  • Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a
    physical address to a virtual one using __va(). However, such physical
    addresses may sometimes be located in highmem and using __va() is
    incorrect, leading to inconsistent object tracking in kmemleak.

    The following functions have been added to the kmemleak API and they take
    a physical address as the object pointer. They only perform the
    corresponding action if the address has a lowmem mapping:

    kmemleak_alloc_phys
    kmemleak_free_part_phys
    kmemleak_not_leak_phys
    kmemleak_ignore_phys

    The affected calling places have been updated to use the new kmemleak
    API.

    Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.com
    Signed-off-by: Catalin Marinas
    Reported-by: Vignesh R
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Catalin Marinas
     

08 Oct, 2016

2 commits

  • Commit b4def3509d18 ("mm, nobootmem: clean-up of free_low_memory_core_early()")
    removed the unnecessary nodeid argument, after that, this comment
    becomes more confused. We should move it to the right place.

    Fixes: b4def3509d18c1db9 ("mm, nobootmem: clean-up of free_low_memory_core_early()")
    Link: http://lkml.kernel.org/r/1473996082-14603-1-git-send-email-wanlong.gao@gmail.com
    Signed-off-by: Wanlong Gao
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanlong Gao
     
  • Fix the following bugs:

    - the same ARCH_LOW_ADDRESS_LIMIT statements are duplicated between
    header and relevant source

    - don't ensure ARCH_LOW_ADDRESS_LIMIT perhaps defined by ARCH in
    asm/processor.h is preferred over default in linux/bootmem.h
    completely since the former header isn't included by the latter

    Link: http://lkml.kernel.org/r/e046aeaa-e160-6d9e-dc1b-e084c2fd999f@zoho.com
    Signed-off-by: zijun_hu
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zijun_hu
     

18 Mar, 2016

1 commit

  • Most of the mm subsystem uses pr_ so make it consistent.

    Miscellanea:

    - Realign arguments
    - Add missing newline to format
    - kmemleak-test.c has a "kmemleak: " prefix added to the
    "Kmemleak testing" logging message via pr_fmt

    Signed-off-by: Joe Perches
    Acked-by: Tejun Heo [percpu]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

06 Dec, 2015

1 commit

  • max_possible_pfn will be used for tracking max possible
    PFN for memory that isn't present in E820 table and
    could be hotplugged later.

    By default max_possible_pfn is initialized with max_pfn,
    but later it could be updated with highest PFN of
    hotpluggable memory ranges declared in ACPI SRAT table
    if any present.

    Signed-off-by: Igor Mammedov
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akataria@vmware.com
    Cc: fujita.tomonori@lab.ntt.co.jp
    Cc: konrad.wilk@oracle.com
    Cc: pbonzini@redhat.com
    Cc: revers@redhat.com
    Cc: riel@redhat.com
    Link: http://lkml.kernel.org/r/1449234426-273049-2-git-send-email-imammedo@redhat.com
    Signed-off-by: Ingo Molnar

    Igor Mammedov
     

01 Jul, 2015

2 commits

  • __free_pages_bootmem prepares a page for release to the buddy allocator
    and assumes that the struct page is initialised. Parallel initialisation
    of struct pages defers initialisation and __free_pages_bootmem can be
    called for struct pages that cannot yet map struct page to PFN. This
    patch passes PFN to __free_pages_bootmem with no other functional change.

    Signed-off-by: Mel Gorman
    Tested-by: Nate Zimmer
    Tested-by: Waiman Long
    Tested-by: Daniel J Blueman
    Acked-by: Pekka Enberg
    Cc: Robin Holt
    Cc: Nate Zimmer
    Cc: Dave Hansen
    Cc: Waiman Long
    Cc: Scott Norton
    Cc: "Luck, Tony"
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Currently each page struct is set as reserved upon initialization. This
    patch leaves the reserved bit clear and only sets the reserved bit when it
    is known the memory was allocated by the bootmem allocator. This makes it
    easier to distinguish between uninitialised struct pages and reserved
    struct pages in later patches.

    Signed-off-by: Robin Holt
    Signed-off-by: Nathan Zimmer
    Signed-off-by: Mel Gorman
    Tested-by: Nate Zimmer
    Tested-by: Waiman Long
    Tested-by: Daniel J Blueman
    Acked-by: Pekka Enberg
    Cc: Robin Holt
    Cc: Dave Hansen
    Cc: Waiman Long
    Cc: Scott Norton
    Cc: "Luck, Tony"
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nathan Zimmer
     

25 Jun, 2015

2 commits

  • Try to allocate all boot time kernel data structures from mirrored
    memory.

    If we run out of mirrored memory print warnings, but fall back to using
    non-mirrored memory to make sure that we still boot.

    By number of bytes, most of what we allocate at boot time is the page
    structures. 64 bytes per 4K page on x86_64 ... or about 1.5% of total
    system memory. For workloads where the bulk of memory is allocated to
    applications this may represent a useful improvement to system
    availability since 1.5% of total memory might be a third of the memory
    allocated to the kernel.

    Signed-off-by: Tony Luck
    Cc: Xishi Qiu
    Cc: Hanjun Guo
    Cc: Xiexiuqi
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Yinghai Lu
    Cc: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tony Luck
     
  • Some high end Intel Xeon systems report uncorrectable memory errors as a
    recoverable machine check. Linux has included code for some time to
    process these and just signal the affected processes (or even recover
    completely if the error was in a read only page that can be replaced by
    reading from disk).

    But we have no recovery path for errors encountered during kernel code
    execution. Except for some very specific cases were are unlikely to ever
    be able to recover.

    Enter memory mirroring. Actually 3rd generation of memory mirroing.

    Gen1: All memory is mirrored
    Pro: No s/w enabling - h/w just gets good data from other side of the
    mirror
    Con: Halves effective memory capacity available to OS/applications

    Gen2: Partial memory mirror - just mirror memory begind some memory controllers
    Pro: Keep more of the capacity
    Con: Nightmare to enable. Have to choose between allocating from
    mirrored memory for safety vs. NUMA local memory for performance

    Gen3: Address range partial memory mirror - some mirror on each memory
    controller
    Pro: Can tune the amount of mirror and keep NUMA performance
    Con: I have to write memory management code to implement

    The current plan is just to use mirrored memory for kernel allocations.
    This has been broken into two phases:

    1) This patch series - find the mirrored memory, use it for boot time
    allocations

    2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the
    unused mirrored memory from mm/memblock.c and only give it out to
    select kernel allocations (this is still being scoped because
    page_alloc.c is scary).

    This patch (of 3):

    Add extra "flags" to memblock to allow selection of memory based on
    attribute. No functional changes

    Signed-off-by: Tony Luck
    Cc: Xishi Qiu
    Cc: Hanjun Guo
    Cc: Xiexiuqi
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Yinghai Lu
    Cc: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tony Luck
     

14 Nov, 2014

1 commit

  • In free_area_init_core(), zone->managed_pages is set to an approximate
    value for lowmem, and will be adjusted when the bootmem allocator frees
    pages into the buddy system.

    But free_area_init_core() is also called by hotadd_new_pgdat() when
    hot-adding memory. As a result, zone->managed_pages of the newly added
    node's pgdat is set to an approximate value in the very beginning.

    Even if the memory on that node has node been onlined,
    /sys/device/system/node/nodeXXX/meminfo has wrong value:

    hot-add node2 (memory not onlined)
    cat /sys/device/system/node/node2/meminfo
    Node 2 MemTotal: 33554432 kB
    Node 2 MemFree: 0 kB
    Node 2 MemUsed: 33554432 kB
    Node 2 Active: 0 kB

    This patch fixes this problem by reset node managed pages to 0 after
    hot-adding a new node.

    1. Move reset_managed_pages_done from reset_node_managed_pages() to
    reset_all_zones_managed_pages()
    2. Make reset_node_managed_pages() non-static
    3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat
    is initialized

    Signed-off-by: Tang Chen
    Signed-off-by: Yasuaki Ishimatsu
    Cc: [3.16+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     

11 Sep, 2014

1 commit

  • Let memblock skip the hotpluggable memory regions in __next_mem_range(),
    it is used to to prevent memblock from allocating hotpluggable memory
    for the kernel at early time. The code is the same as __next_mem_range_rev().

    Clear hotpluggable flag before releasing free pages to the buddy
    allocator. If we don't clear hotpluggable flag in
    free_low_memory_core_early(), the memory which marked hotpluggable flag
    will not free to buddy allocator. Because __next_mem_range() will skip
    them.

    free_low_memory_core_early
    for_each_free_mem_range
    for_each_mem_range
    __next_mem_range

    [akpm@linux-foundation.org: fix warning]
    Signed-off-by: Xishi Qiu
    Cc: Tejun Heo
    Cc: Tang Chen
    Cc: Zhang Yanfei
    Cc: Wen Congyang
    Cc: "Rafael J. Wysocki"
    Cc: "H. Peter Anvin"
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     

07 Jun, 2014

1 commit

  • Kmemleak could ignore memory blocks allocated via memblock_alloc()
    leading to false positives during scanning. This patch adds the
    corresponding callbacks and removes kmemleak_free_* calls in
    mm/nobootmem.c to avoid duplication.

    The kmemleak_alloc() in mm/nobootmem.c is kept since
    __alloc_memory_core_early() does not use memblock_alloc() directly.

    Signed-off-by: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Catalin Marinas
     

04 Apr, 2014

1 commit

  • Mark function as static in nobootmem.c because it is not used outside
    this file.

    This eliminates the following warning in mm/nobootmem.c:

    mm/nobootmem.c:324:15: warning: no previous prototype for `___alloc_bootmem_node' [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rashika Kheria
     

24 Jan, 2014

3 commits

  • get_allocated_memblock_reserved_regions_info() should work if it is
    compiled in. Extended the ifdef around
    get_allocated_memblock_memory_regions_info() to include
    get_allocated_memblock_reserved_regions_info() as well. Similar changes
    in nobootmem.c/free_low_memory_core_early() where the two functions are
    called.

    [akpm@linux-foundation.org: cleanup]
    Signed-off-by: Philipp Hachtmann
    Cc: qiuxishi
    Cc: David Howells
    Cc: Daeseok Youn
    Cc: Jiang Liu
    Acked-by: Yinghai Lu
    Cc: Zhang Yanfei
    Cc: Santosh Shilimkar
    Cc: Grygorii Strashko
    Cc: Tang Chen
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Hachtmann
     
  • When calling free_all_bootmem() the free areas under memblock's control
    are released to the buddy allocator. Additionally the reserved list is
    freed if it was reallocated by memblock. The same should apply for the
    memory list.

    Signed-off-by: Philipp Hachtmann
    Reviewed-by: Tejun Heo
    Cc: Joonsoo Kim
    Cc: Johannes Weiner
    Cc: Tang Chen
    Cc: Toshi Kani
    Cc: Jianguo Wu
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Hachtmann
     
  • When memblock_reserve() fails because memblock.reserved.regions cannot
    be resized, the caller (e.g. alloc_bootmem()) is not informed of the
    failed allocation. Therefore alloc_bootmem() silently returns the same
    pointer again and again.

    This patch adds a check for the return value of memblock_reserve() in
    __alloc_memory_core().

    Signed-off-by: Philipp Hachtmann
    Reviewed-by: Tejun Heo
    Cc: Joonsoo Kim
    Cc: Johannes Weiner
    Cc: Tang Chen
    Cc: Toshi Kani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Hachtmann
     

22 Jan, 2014

2 commits

  • It's recommended to use NUMA_NO_NODE everywhere to select "process any
    node" behavior or to indicate that "no node id specified".

    Hence, update __next_free_mem_range*() API's to accept both NUMA_NO_NODE
    and MAX_NUMNODES, but emit warning once on MAX_NUMNODES, and correct
    corresponding API's documentation to describe new behavior. Also,
    update other memblock/nobootmem APIs where MAX_NUMNODES is used
    dirrectly.

    The change was suggested by Tejun Heo.

    Signed-off-by: Grygorii Strashko
    Signed-off-by: Santosh Shilimkar
    Cc: Yinghai Lu
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Cc: Arnd Bergmann
    Cc: Christoph Lameter
    Cc: Greg Kroah-Hartman
    Cc: H. Peter Anvin
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Michal Hocko
    Cc: Paul Walmsley
    Cc: Pavel Machek
    Cc: Russell King
    Cc: Tony Lindgren
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Grygorii Strashko
     
  • Reorder parameters of memblock_find_in_range_node to be consistent with
    other memblock APIs.

    The change was suggested by Tejun Heo .

    Signed-off-by: Grygorii Strashko
    Signed-off-by: Santosh Shilimkar
    Cc: Yinghai Lu
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Cc: Arnd Bergmann
    Cc: Christoph Lameter
    Cc: Greg Kroah-Hartman
    Cc: H. Peter Anvin
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Michal Hocko
    Cc: Paul Walmsley
    Cc: Pavel Machek
    Cc: Russell King
    Cc: Tony Lindgren
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Grygorii Strashko
     

13 Nov, 2013

1 commit

  • On large memory machines it can take a few minutes to get through
    free_all_bootmem().

    Currently, when free_all_bootmem() calls __free_pages_memory(), the number
    of contiguous pages that __free_pages_memory() passes to the buddy
    allocator is limited to BITS_PER_LONG. BITS_PER_LONG was originally
    chosen to keep things similar to mm/nobootmem.c. But it is more efficient
    to limit it to MAX_ORDER.

    base new change
    8TB 202s 172s 30s
    16TB 401s 351s 50s

    That is around 1%-3% improvement on total boot time.

    This patch was spun off from the boot time rfc Robin and I had been
    working on.

    Signed-off-by: Robin Holt
    Signed-off-by: Nathan Zimmer
    Cc: Robin Holt
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Mike Travis
    Cc: Yinghai Lu
    Cc: Mel Gorman
    Acked-by: Johannes Weiner
    Reviewed-by: Wanpeng Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Holt
     

04 Jul, 2013

2 commits

  • Concentrate code to modify totalram_pages into the mm core, so the arch
    memory initialized code doesn't need to take care of it. With these
    changes applied, only following functions from mm core modify global
    variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
    free_all_bootmem_node(), adjust_managed_page_count().

    With this patch applied, it will be much more easier for us to keep
    totalram_pages and zone->managed_pages in consistence.

    Signed-off-by: Jiang Liu
    Acked-by: David Howells
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Commit "mm: introduce new field 'managed_pages' to struct zone" assumes
    that all highmem pages will be freed into the buddy system by function
    mem_init(). But that's not always true, some architectures may reserve
    some highmem pages during boot. For example PPC may allocate highmem
    pages for giagant HugeTLB pages, and several architectures have code to
    check PageReserved flag to exclude highmem pages allocated during boot
    when freeing highmem pages into the buddy system.

    So treat highmem pages in the same way as normal pages, that is to:
    1) reset zone->managed_pages to zero in mem_init().
    2) recalculate managed_pages when freeing pages into the buddy system.

    Signed-off-by: Jiang Liu
    Cc: "H. Peter Anvin"
    Cc: Tejun Heo
    Cc: Joonsoo Kim
    Cc: Yinghai Lu
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Kamezawa Hiroyuki
    Cc: Marek Szyprowski
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Konrad Rzeszutek Wilk
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     

30 Apr, 2013

2 commits


30 Jan, 2013

2 commits


13 Dec, 2012

1 commit

  • Currently a zone's present_pages is calcuated as below, which is
    inaccurate and may cause trouble to memory hotplug.

    spanned_pages - absent_pages - memmap_pages - dma_reserve.

    During fixing bugs caused by inaccurate zone->present_pages, we found
    zone->present_pages has been abused. The field zone->present_pages may
    have different meanings in different contexts:

    1) pages existing in a zone.
    2) pages managed by the buddy system.

    For more discussions about the issue, please refer to:
    http://lkml.org/lkml/2012/11/5/866
    https://patchwork.kernel.org/patch/1346751/

    This patchset tries to introduce a new field named "managed_pages" to
    struct zone, which counts "pages managed by the buddy system". And revert
    zone->present_pages to count "physical pages existing in a zone", which
    also keep in consistence with pgdat->node_present_pages.

    We will set an initial value for zone->managed_pages in function
    free_area_init_core() and will adjust it later if the initial value is
    inaccurate.

    For DMA/normal zones, the initial value is set to:

    (spanned_pages - absent_pages - memmap_pages - dma_reserve)

    Later zone->managed_pages will be adjusted to the accurate value when the
    bootmem allocator frees all free pages to the buddy system in function
    free_all_bootmem_node() and free_all_bootmem().

    The bootmem allocator doesn't touch highmem pages, so highmem zones'
    managed_pages is set to the accurate value "spanned_pages - absent_pages"
    in function free_area_init_core() and won't be updated anymore.

    This patch also adds a new field "managed_pages" to /proc/zoneinfo
    and sysrq showmem.

    [akpm@linux-foundation.org: small comment tweaks]
    Signed-off-by: Jiang Liu
    Cc: Wen Congyang
    Cc: David Rientjes
    Cc: Maciej Rutecki
    Tested-by: Chris Clayton
    Cc: "Rafael J . Wysocki"
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Michal Hocko
    Cc: Jianguo Wu
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     

18 Nov, 2012

1 commit

  • Now NO_BOOTMEM version free_all_bootmem_node() does not really
    do free_bootmem at all, and it only call register_page_bootmem_info_node
    for online nodes instead.

    That is confusing.

    We can kill that free_all_bootmem_node(), after we kill two callings
    in x86 and sparc.

    Signed-off-by: Yinghai Lu
    Link: http://lkml.kernel.org/r/1353123563-3103-46-git-send-email-yinghai@kernel.org
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     

17 Nov, 2012

1 commit

  • Revert commit 7f1290f2f2a4 ("mm: fix-up zone present pages")

    That patch tried to fix a issue when calculating zone->present_pages,
    but it caused a regression on 32bit systems with HIGHMEM. With that
    change, reset_zone_present_pages() resets all zone->present_pages to
    zero, and fixup_zone_present_pages() is called to recalculate
    zone->present_pages when the boot allocator frees core memory pages into
    buddy allocator. Because highmem pages are not freed by bootmem
    allocator, all highmem zones' present_pages becomes zero.

    Various options for improving the situation are being discussed but for
    now, let's return to the 3.6 code.

    Cc: Jianguo Wu
    Cc: Jiang Liu
    Cc: Petr Tesarik
    Cc: "Luck, Tony"
    Cc: Mel Gorman
    Cc: Yinghai Lu
    Cc: Minchan Kim
    Cc: Johannes Weiner
    Acked-by: David Rientjes
    Tested-by: Chris Clayton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

09 Oct, 2012

2 commits

  • I think zone->present_pages indicates pages that buddy system can management,
    it should be:

    zone->present_pages = spanned pages - absent pages - bootmem pages,

    but is now:
    zone->present_pages = spanned pages - absent pages - memmap pages.

    spanned pages: total size, including holes.
    absent pages: holes.
    bootmem pages: pages used in system boot, managed by bootmem allocator.
    memmap pages: pages used by page structs.

    This may cause zone->present_pages less than it should be. For example,
    numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other
    bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's
    present_pages should be spanned pages - absent pages, but now it also
    minus memmap pages(free_area_init_core), which are actually allocated from
    ZONE_MOVABLE. When offlining all memory of a zone, this will cause
    zone->present_pages less than 0, because present_pages is unsigned long
    type, it is actually a very large integer, it indirectly caused
    zone->watermark[WMARK_MIN] becomes a large
    integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a
    large integer(calculate_totalreserve_pages()), and finally cause memory
    allocating failure when fork process(__vm_enough_memory()).

    [root@localhost ~]# dmesg
    -bash: fork: Cannot allocate memory

    I think the bug described in

    http://marc.info/?l=linux-mm&m=134502182714186&w=2

    is also caused by wrong zone present pages.

    This patch intends to fix-up zone->present_pages when memory are freed to
    buddy system on x86_64 and IA64 platforms.

    Signed-off-by: Jianguo Wu
    Signed-off-by: Jiang Liu
    Reported-by: Petr Tesarik
    Tested-by: Petr Tesarik
    Cc: "Luck, Tony"
    Cc: Mel Gorman
    Cc: Yinghai Lu
    Cc: Minchan Kim
    Cc: Johannes Weiner
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jianguo Wu
     
  • Commit 0ee332c14518 ("memblock: Kill early_node_map[]") removed
    early_node_map[]. Clean up the comments to comply with that change.

    Signed-off-by: Wanpeng Li
    Cc: Michal Hocko
    Cc: KAMEZAWA Hiroyuki
    Cc: Minchan Kim
    Cc: Gavin Shan
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanpeng Li
     

12 Jul, 2012

2 commits

  • memblock_free_reserved_regions() calls memblock_free(), but
    memblock_free() would double reserved.regions too, so we could free the
    old range for reserved.regions.

    Also tj said there is another bug which could be related to this.

    | I don't think we're saving any noticeable
    | amount by doing this "free - give it to page allocator - reserve
    | again" dancing. We should just allocate regions aligned to page
    | boundaries and free them later when memblock is no longer in use.

    in that case, when DEBUG_PAGEALLOC, will get panic:

    memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
    BUG: unable to handle kernel paging request at ffff88102febd948
    IP: [] __next_free_mem_range+0x9b/0x155
    PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    CPU 0
    Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
    RIP: 0010:[] [] __next_free_mem_range+0x9b/0x155

    See the discussion at https://lkml.org/lkml/2012/6/13/469

    So try to allocate with PAGE_SIZE alignment and free it later.

    Reported-by: Sasha Levin
    Acked-by: Tejun Heo
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Yinghai Lu
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • After commit f5bf18fa22f8 ("bootmem/sparsemem: remove limit constraint
    in alloc_bootmem_section"), usemap allocations may easily be placed
    outside the optimal section that holds the node descriptor, even if
    there is space available in that section. This results in unnecessary
    hotplug dependencies that need to have the node unplugged before the
    section holding the usemap.

    The reason is that the bootmem allocator doesn't guarantee a linear
    search starting from the passed allocation goal but may start out at a
    much higher address absent an upper limit.

    Fix this by trying the allocation with the limit at the section end,
    then retry without if that fails. This keeps the fix from f5bf18fa22f8
    of not panicking if the allocation does not fit in the section, but
    still makes sure to try to stay within the section at first.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Johannes Weiner
    Cc: [3.3.x, 3.4.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

30 May, 2012

3 commits

  • alloc_bootmem_section() derives allocation area constraints from the
    specified sparsemem section. This is a bit specific for a generic memory
    allocator like bootmem, though, so move it over to sparsemem.

    As __alloc_bootmem_node_nopanic() already retries failed allocations with
    relaxed area constraints, the fallback code in sparsemem.c can be removed
    and the code becomes a bit more compact overall.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • While the panicking node-specific allocation function tries to satisfy
    node+goal, goal, node, anywhere, the non-panicking function still does
    node+goal, goal, anywhere.

    Make it simpler: define the panicking version in terms of the non-panicking
    one, like the node-agnostic interface, so they always behave the same way
    apart from how to deal with allocation failure.

    Signed-off-by: Johannes Weiner
    Acked-by: Yinghai Lu
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • __alloc_bootmem_node and __alloc_bootmem_low_node documentation claims
    the functions panic on allocation failure. Do it.

    Signed-off-by: Johannes Weiner
    Acked-by: Yinghai Lu
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

11 May, 2012

1 commit

  • Systems with 8 TBytes of memory or greater can hit a problem where only
    the the first 8 TB of memory shows up. This is due to "int i" being
    smaller than "unsigned long start_aligned", causing the high bits to be
    dropped.

    The fix is to change `i' to unsigned long to match start_aligned
    and end_aligned.

    Thanks to Jack Steiner for assistance tracking this down.

    Signed-off-by: Russ Anderson
    Cc: Jack Steiner
    Cc: Johannes Weiner
    Cc: Tejun Heo
    Cc: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russ Anderson
     

26 Apr, 2012

1 commit

  • The comments above __alloc_bootmem_node() claim that the code will
    first try the allocation using 'goal' and if that fails it will
    try again but with the 'goal' requirement dropped.

    Unfortunately, this is not what the code does, so fix it to do so.

    This is important for nobootmem conversions to architectures such
    as sparc where MAX_DMA_ADDRESS is infinity.

    On such architectures all of the allocations done by generic spots,
    such as the sparse-vmemmap implementation, will pass in:

    __pa(MAX_DMA_ADDRESS)

    as the goal, and with the limit given as "-1" this will always fail
    unless we add the appropriate fallback logic here.

    Signed-off-by: David S. Miller
    Acked-by: Yinghai Lu
    Signed-off-by: Linus Torvalds

    David Miller
     

29 Nov, 2011

1 commit

  • Conflicts & resolutions:

    * arch/x86/xen/setup.c

    dc91c728fd "xen: allow extra memory to be in multiple regions"
    24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..."

    conflicted on xen_add_extra_mem() updates. The resolution is
    trivial as the latter just want to replace
    memblock_x86_reserve_range() with memblock_reserve().

    * drivers/pci/intel-iommu.c

    166e9278a3f "x86/ia64: intel-iommu: move to drivers/iommu/"
    5dfe8660a3d "bootmem: Replace work_with_active_regions() with..."

    conflicted as the former moved the file under drivers/iommu/.
    Resolved by applying the chnages from the latter on the moved
    file.

    * mm/Kconfig

    6661672053a "memblock: add NO_BOOTMEM config symbol"
    c378ddd53f9 "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option"

    conflicted trivially. Both added config options. Just
    letting both add their own options resolves the conflict.

    * mm/memblock.c

    d1f0ece6cdc "mm/memblock.c: small function definition fixes"
    ed7b56a799c "memblock: Remove memblock_memory_can_coalesce()"

    confliected. The former updates function removed by the
    latter. Resolution is trivial.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

31 Oct, 2011

1 commit


15 Jul, 2011

1 commit

  • Other than sanity check and debug message, the x86 specific version of
    memblock reserve/free functions are simple wrappers around the generic
    versions - memblock_reserve/free().

    This patch adds debug messages with caller identification to the
    generic versions and replaces x86 specific ones and kills them.
    arch/x86/include/asm/memblock.h and arch/x86/mm/memblock.c are empty
    after this change and removed.

    Signed-off-by: Tejun Heo
    Link: http://lkml.kernel.org/r/1310462166-31469-14-git-send-email-tj@kernel.org
    Cc: Yinghai Lu
    Cc: Benjamin Herrenschmidt
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: H. Peter Anvin

    Tejun Heo