13 Nov, 2013

2 commits


04 Jul, 2013

3 commits

  • Now nobody makes use of free_all_bootmem_node(), kill it.

    Signed-off-by: Jiang Liu
    Cc: Johannes Weiner
    Cc: "David S. Miller"
    Cc: Yinghai Lu
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Concentrate code to modify totalram_pages into the mm core, so the arch
    memory initialized code doesn't need to take care of it. With these
    changes applied, only following functions from mm core modify global
    variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
    free_all_bootmem_node(), adjust_managed_page_count().

    With this patch applied, it will be much more easier for us to keep
    totalram_pages and zone->managed_pages in consistence.

    Signed-off-by: Jiang Liu
    Acked-by: David Howells
    Cc: "H. Peter Anvin"
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Joonsoo Kim
    Cc: Kamezawa Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Marek Szyprowski
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Yinghai Lu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Commit "mm: introduce new field 'managed_pages' to struct zone" assumes
    that all highmem pages will be freed into the buddy system by function
    mem_init(). But that's not always true, some architectures may reserve
    some highmem pages during boot. For example PPC may allocate highmem
    pages for giagant HugeTLB pages, and several architectures have code to
    check PageReserved flag to exclude highmem pages allocated during boot
    when freeing highmem pages into the buddy system.

    So treat highmem pages in the same way as normal pages, that is to:
    1) reset zone->managed_pages to zero in mem_init().
    2) recalculate managed_pages when freeing pages into the buddy system.

    Signed-off-by: Jiang Liu
    Cc: "H. Peter Anvin"
    Cc: Tejun Heo
    Cc: Joonsoo Kim
    Cc: Yinghai Lu
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Kamezawa Hiroyuki
    Cc: Marek Szyprowski
    Cc: "Michael S. Tsirkin"
    Cc:
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Cc: Chris Metcalf
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Cc: Jianguo Wu
    Cc: Konrad Rzeszutek Wilk
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Tang Chen
    Cc: Thomas Gleixner
    Cc: Wen Congyang
    Cc: Will Deacon
    Cc: Yasuaki Ishimatsu
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     

30 Jan, 2013

1 commit


12 Jan, 2013

1 commit

  • Currently free_all_bootmem_core ignores that node_min_pfn may be not
    multiple of BITS_PER_LONG. Eg commit 6dccdcbe2c3e ("mm: bootmem: fix
    checking the bitmap when finally freeing bootmem") shifts vec by lower
    bits of start instead of lower bits of idx. Also

    if (IS_ALIGNED(start, BITS_PER_LONG) && vec == ~0UL)

    assumes that vec bit 0 corresponds to start pfn, which is only true when
    node_min_pfn is a multiple of BITS_PER_LONG. Also loop in the else
    clause can double-free pages (e.g. with node_min_pfn == start == 1,
    map[0] == ~0 on 32-bit machine page 32 will be double-freed).

    This bug causes the following message during xtensa kernel boot:

    bootmem::free_all_bootmem_core nid=0 start=1 end=8000
    BUG: Bad page state in process swapper pfn:00001
    page:d04bd020 count:0 mapcount:-127 mapping: (null) index:0x2
    page flags: 0x0()
    Call Trace:
    bad_page+0x8c/0x9c
    free_pages_prepare+0x5e/0x88
    free_hot_cold_page+0xc/0xa0
    __free_pages+0x24/0x38
    __free_pages_bootmem+0x54/0x56
    free_all_bootmem_core$part$11+0xeb/0x138
    free_all_bootmem+0x46/0x58
    mem_init+0x25/0xa4
    start_kernel+0x11e/0x25c
    should_never_return+0x0/0x3be7

    The fix is the following:
    - always align vec so that its bit 0 corresponds to start
    - provide BITS_PER_LONG bits in vec, if those bits are available in the
    map
    - don't free pages past next start position in the else clause.

    Signed-off-by: Max Filippov
    Cc: Gavin Shan
    Cc: Johannes Weiner
    Cc: Tejun Heo
    Cc: Yinghai Lu
    Cc: Joonsoo Kim
    Cc: Prasad Koya
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Max Filippov
     

13 Dec, 2012

4 commits

  • reserve_bootmem_generic() has no caller,

    Signed-off-by: Lin Feng
    Acked-by: Johannes Weiner
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lin Feng
     
  • Currently a zone's present_pages is calcuated as below, which is
    inaccurate and may cause trouble to memory hotplug.

    spanned_pages - absent_pages - memmap_pages - dma_reserve.

    During fixing bugs caused by inaccurate zone->present_pages, we found
    zone->present_pages has been abused. The field zone->present_pages may
    have different meanings in different contexts:

    1) pages existing in a zone.
    2) pages managed by the buddy system.

    For more discussions about the issue, please refer to:
    http://lkml.org/lkml/2012/11/5/866
    https://patchwork.kernel.org/patch/1346751/

    This patchset tries to introduce a new field named "managed_pages" to
    struct zone, which counts "pages managed by the buddy system". And revert
    zone->present_pages to count "physical pages existing in a zone", which
    also keep in consistence with pgdat->node_present_pages.

    We will set an initial value for zone->managed_pages in function
    free_area_init_core() and will adjust it later if the initial value is
    inaccurate.

    For DMA/normal zones, the initial value is set to:

    (spanned_pages - absent_pages - memmap_pages - dma_reserve)

    Later zone->managed_pages will be adjusted to the accurate value when the
    bootmem allocator frees all free pages to the buddy system in function
    free_all_bootmem_node() and free_all_bootmem().

    The bootmem allocator doesn't touch highmem pages, so highmem zones'
    managed_pages is set to the accurate value "spanned_pages - absent_pages"
    in function free_area_init_core() and won't be updated anymore.

    This patch also adds a new field "managed_pages" to /proc/zoneinfo
    and sysrq showmem.

    [akpm@linux-foundation.org: small comment tweaks]
    Signed-off-by: Jiang Liu
    Cc: Wen Congyang
    Cc: David Rientjes
    Cc: Maciej Rutecki
    Tested-by: Chris Clayton
    Cc: "Rafael J . Wysocki"
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Michal Hocko
    Cc: Jianguo Wu
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • The name of this function is not suitable, and removing the function and
    open-coding it into each call sites makes the code more understandable.

    Additionally, we shouldn't do an allocation from bootmem when
    slab_is_available(), so directly return kmalloc()'s return value.

    Signed-off-by: Joonsoo Kim
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Johannes Weiner
    Cc: FUJITA Tomonori
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • There is no implementation of bootmem_arch_preferred_node() and a call to
    this function will cause a compilation error. So remove it.

    Signed-off-by: Joonsoo Kim
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Johannes Weiner
    Cc: FUJITA Tomonori
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

12 Dec, 2012

1 commit

  • It is strange that alloc_bootmem() returns a virtual address and
    free_bootmem() requires a physical address. Anyway, free_bootmem()'s
    first parameter should be physical address.

    There are some call sites for free_bootmem() with virtual address. So fix
    them.

    [akpm@linux-foundation.org: improve free_bootmem() and free_bootmem_pate() documentation]
    Signed-off-by: Joonsoo Kim
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Johannes Weiner
    Cc: FUJITA Tomonori
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

17 Nov, 2012

1 commit

  • Revert commit 7f1290f2f2a4 ("mm: fix-up zone present pages")

    That patch tried to fix a issue when calculating zone->present_pages,
    but it caused a regression on 32bit systems with HIGHMEM. With that
    change, reset_zone_present_pages() resets all zone->present_pages to
    zero, and fixup_zone_present_pages() is called to recalculate
    zone->present_pages when the boot allocator frees core memory pages into
    buddy allocator. Because highmem pages are not freed by bootmem
    allocator, all highmem zones' present_pages becomes zero.

    Various options for improving the situation are being discussed but for
    now, let's return to the 3.6 code.

    Cc: Jianguo Wu
    Cc: Jiang Liu
    Cc: Petr Tesarik
    Cc: "Luck, Tony"
    Cc: Mel Gorman
    Cc: Yinghai Lu
    Cc: Minchan Kim
    Cc: Johannes Weiner
    Acked-by: David Rientjes
    Tested-by: Chris Clayton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

09 Oct, 2012

1 commit

  • I think zone->present_pages indicates pages that buddy system can management,
    it should be:

    zone->present_pages = spanned pages - absent pages - bootmem pages,

    but is now:
    zone->present_pages = spanned pages - absent pages - memmap pages.

    spanned pages: total size, including holes.
    absent pages: holes.
    bootmem pages: pages used in system boot, managed by bootmem allocator.
    memmap pages: pages used by page structs.

    This may cause zone->present_pages less than it should be. For example,
    numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other
    bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's
    present_pages should be spanned pages - absent pages, but now it also
    minus memmap pages(free_area_init_core), which are actually allocated from
    ZONE_MOVABLE. When offlining all memory of a zone, this will cause
    zone->present_pages less than 0, because present_pages is unsigned long
    type, it is actually a very large integer, it indirectly caused
    zone->watermark[WMARK_MIN] becomes a large
    integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a
    large integer(calculate_totalreserve_pages()), and finally cause memory
    allocating failure when fork process(__vm_enough_memory()).

    [root@localhost ~]# dmesg
    -bash: fork: Cannot allocate memory

    I think the bug described in

    http://marc.info/?l=linux-mm&m=134502182714186&w=2

    is also caused by wrong zone present pages.

    This patch intends to fix-up zone->present_pages when memory are freed to
    buddy system on x86_64 and IA64 platforms.

    Signed-off-by: Jianguo Wu
    Signed-off-by: Jiang Liu
    Reported-by: Petr Tesarik
    Tested-by: Petr Tesarik
    Cc: "Luck, Tony"
    Cc: Mel Gorman
    Cc: Yinghai Lu
    Cc: Minchan Kim
    Cc: Johannes Weiner
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jianguo Wu
     

27 Aug, 2012

1 commit


18 Jul, 2012

1 commit

  • In reaction to commit 99ab7b19440a ("mm: sparse: fix usemap allocation
    above node descriptor section") Johannes said:
    | while backporting the below patch, I realised that your fix busted
    | f5bf18fa22f8 again. The problem was not a panicking version on
    | allocation failure but when the usemap size was too large such that
    | goal + size > limit triggers the BUG_ON in the bootmem allocator. So
    | we need a version that passes limit ONLY if the usemap is smaller than
    | the section.

    after checking the code, the name of ___alloc_bootmem_node_nopanic()
    does not reflect the fact.

    Make bootmem really not panic.

    Hope will kill bootmem sooner.

    Signed-off-by: Yinghai Lu
    Cc: Johannes Weiner
    Cc: [3.3.x, 3.4.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

12 Jul, 2012

1 commit

  • After commit f5bf18fa22f8 ("bootmem/sparsemem: remove limit constraint
    in alloc_bootmem_section"), usemap allocations may easily be placed
    outside the optimal section that holds the node descriptor, even if
    there is space available in that section. This results in unnecessary
    hotplug dependencies that need to have the node unplugged before the
    section holding the usemap.

    The reason is that the bootmem allocator doesn't guarantee a linear
    search starting from the passed allocation goal but may start out at a
    much higher address absent an upper limit.

    Fix this by trying the allocation with the limit at the section end,
    then retry without if that fails. This keeps the fix from f5bf18fa22f8
    of not panicking if the allocation does not fit in the section, but
    still makes sure to try to stay within the section at first.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Johannes Weiner
    Cc: [3.3.x, 3.4.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

30 May, 2012

9 commits

  • The objects of "struct bootmem_data_t" are linked together to form
    double-linked list sequentially based on its minimal page frame number.

    The current implementation implicitly supports the following cases,
    which means the inserting point for current bootmem data depends on how
    "list_for_each" works. That makes the code a little hard to read.
    Besides, "list_for_each" and "list_entry" can be replaced with
    "list_for_each_entry".

    - The linked list is empty.
    - There has no entry in the linked list, whose minimal page
    frame number is bigger than current one.

    Signed-off-by: Gavin Shan
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gavin Shan
     
  • alloc_bootmem_section() derives allocation area constraints from the
    specified sparsemem section. This is a bit specific for a generic memory
    allocator like bootmem, though, so move it over to sparsemem.

    As __alloc_bootmem_node_nopanic() already retries failed allocations with
    relaxed area constraints, the fallback code in sparsemem.c can be removed
    and the code becomes a bit more compact overall.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Pass down the node descriptor instead of the more specific bootmem node
    descriptor down the call stack, like nobootmem does, when there is no good
    reason for the two to be different.

    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • While the panicking node-specific allocation function tries to satisfy
    node+goal, goal, node, anywhere, the non-panicking function still does
    node+goal, goal, anywhere.

    Make it simpler: define the panicking version in terms of the
    non-panicking one, like the node-agnostic interface, so they always behave
    the same way apart from how to deal with allocation failure.

    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Match the nobootmem version of __alloc_bootmem_node. Try to satisfy both
    the node and the goal, then just the goal, then just the node, then
    allocate anywhere before panicking.

    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Matching the desired goal to the right node is one thing, dropping the
    goal when it can not be satisfied is another. Split this into separate
    functions so that subsequent patches can use the node-finding but drop and
    handle the goal fallback on their own terms.

    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Callsites need to provide a bootmem_data_t *, make the naming more
    descriptive.

    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Cc: Gavin Shan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • When bootmem releases an unaligned BITS_PER_LONG pages chunk of memory
    to the page allocator, it checks the bitmap if there are still
    unreserved pages in the chunk (set bits), but also if the offset in the
    chunk indicates BITS_PER_LONG loop iterations already.

    But since the consulted bitmap is only a one-word-excerpt of the full
    per-node bitmap, there can not be more than BITS_PER_LONG bits set in
    it. The additional offset check is unnecessary.

    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • When bootmem releases an unaligned chunk of memory at the beginning of a
    node to the page allocator, it iterates from that unaligned PFN but
    checks an aligned word of the page bitmap. The checked bits do not
    correspond to the PFNs and, as a result, reserved pages can be freed.

    Properly shift the bitmap word so that the lowest bit corresponds to the
    starting PFN before entering the freeing loop.

    This bug has been around since commit 41546c17418f ("bootmem: clean up
    free_all_bootmem_core") (2.6.27) without known reports.

    Signed-off-by: Gavin Shan
    Signed-off-by: Johannes Weiner
    Acked-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gavin Shan
     

22 Mar, 2012

1 commit

  • While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
    Overcommit) on powerpc, we tripped the following:

    kernel BUG at mm/bootmem.c:483!
    cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
    pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
    lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
    sp: c000000000c03bc0
    msr: 8000000000021032
    current = 0xc000000000b0cce0
    paca = 0xc000000001d80000
    pid = 0, comm = swapper
    kernel BUG at mm/bootmem.c:483!
    enter ? for help
    [c000000000c03c80] c000000000a64bcc
    .sparse_early_usemaps_alloc_node+0x84/0x29c
    [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
    [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
    [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
    [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c

    This is

    BUG_ON(limit && goal + size > limit);

    and after some debugging, it seems that

    goal = 0x7ffff000000
    limit = 0x80000000000

    and sparse_early_usemaps_alloc_node ->
    sparse_early_usemaps_alloc_pgdat_section calls

    return alloc_bootmem_section(usemap_size() * count, section_nr);

    This is on a system with 8TB available via the AMS pool, and as a quirk
    of AMS in firmware, all of that memory shows up in node 0. So, we end
    up with an allocation that will fail the goal/limit constraints.

    In theory, we could "fall-back" to alloc_bootmem_node() in
    sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
    defined, we'll BUG_ON() instead. A simple solution appears to be to
    unconditionally remove the limit condition in alloc_bootmem_section,
    meaning allocations are allowed to cross section boundaries (necessary
    for systems of this size).

    Johannes Weiner pointed out that if alloc_bootmem_section() no longer
    guarantees section-locality, we need check_usemap_section_nr() to print
    possible cross-dependencies between node descriptors and the usemaps
    allocated through it. That makes the two loops in
    sparse_early_usemaps_alloc_node() identical, so re-factor the code a
    bit.

    [akpm@linux-foundation.org: code simplification]
    Signed-off-by: Nishanth Aravamudan
    Cc: Dave Hansen
    Cc: Anton Blanchard
    Cc: Paul Mackerras
    Cc: Ben Herrenschmidt
    Cc: Robert Jennings
    Acked-by: Johannes Weiner
    Acked-by: Mel Gorman
    Cc: [3.3.1]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nishanth Aravamudan
     

11 Jan, 2012

3 commits

  • The loop that frees pages to the page allocator while bootstrapping tries
    to free higher-order blocks only when the starting address is aligned to
    that block size. Otherwise it will free all pages on that node
    one-by-one.

    Change it to free individual pages up to the first aligned block and then
    try higher-order frees from there.

    Signed-off-by: Johannes Weiner
    Cc: Uwe Kleine-König
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The area node_bootmem_map represents is aligned to BITS_PER_LONG, and all
    bits in any aligned word of that map valid. When the represented area
    extends beyond the end of the node, the non-existant pages will be marked
    as reserved.

    As a result, when freeing a page block, doing an explicit range check for
    whether that block is within the node's range is redundant as the bitmap
    is consulted anyway to see whether all pages in the block are unreserved.

    Signed-off-by: Johannes Weiner
    Cc: Uwe Kleine-König
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The first entry of bdata->node_bootmem_map holds the data for
    bdata->node_min_pfn up to bdata->node_min_pfn + BITS_PER_LONG - 1. So the
    test for freeing all pages of a single map entry can be slightly relaxed.

    Moreover use DIV_ROUND_UP in another place instead of open coding it.

    Signed-off-by: Uwe Kleine-König
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     

31 Oct, 2011

1 commit


24 Mar, 2011

1 commit

  • …p_elfcorehdr and saved_max_pfn

    The Xen PV drivers in a crashed HVM guest can not connect to the dom0
    backend drivers because both frontend and backend drivers are still in
    connected state. To run the connection reset function only in case of a
    crashdump, the is_kdump_kernel() function needs to be available for the PV
    driver modules.

    Consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn into
    kernel/crash_dump.c Also export elfcorehdr_addr to make is_kdump_kernel()
    usable for modules.

    Leave 'elfcorehdr' as early_param(). This changes powerpc from __setup()
    to early_param(). It adds an address range check from x86 also on ia64
    and powerpc.

    [akpm@linux-foundation.org: additional #includes]
    [akpm@linux-foundation.org: remove elfcorehdr_addr export]
    [akpm@linux-foundation.org: fix for Tejun's mm/nobootmem.c changes]
    Signed-off-by: Olaf Hering <olaf@aepfle.de>
    Cc: Russell King <rmk@arm.linux.org.uk>
    Cc: "Luck, Tony" <tony.luck@intel.com>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Paul Mundt <lethal@linux-sh.org>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Olaf Hering
     

24 Feb, 2011

2 commits

  • Now that bootmem.c and nobootmem.c are separate, it's cleaner to
    define contig_page_data in each file than in page_alloc.c with #ifdef.
    Move it.

    This patch doesn't introduce any behavior change.

    -v2: According to Andrew, fixed the struct layout.
    -tj: Updated commit description.

    Signed-off-by: Yinghai Lu
    Acked-by: Andrew Morton
    Signed-off-by: Tejun Heo

    Yinghai Lu
     
  • mm/bootmem.c contained code paths for both bootmem and no bootmem
    configurations. They implement about the same set of APIs in
    different ways and as a result bootmem.c contains massive amount of
    #ifdef CONFIG_NO_BOOTMEM.

    Separate out CONFIG_NO_BOOTMEM code into mm/nobootmem.c. As the
    common part is relatively small, duplicate them in nobootmem.c instead
    of creating a common file or ifdef'ing in bootmem.c.

    The followings are duplicated.

    * {min|max}_low_pfn, max_pfn, saved_max_pfn
    * free_bootmem_late()
    * ___alloc_bootmem()
    * __alloc_bootmem_low()

    The followings are applicable only to nobootmem and moved verbatim.

    * __free_pages_memory()
    * free_all_memory_core_early()

    The followings are not applicable to nobootmem and omitted in
    nobootmem.c.

    * reserve_bootmem_node()
    * reserve_bootmem()

    The rest split function bodies according to CONFIG_NO_BOOTMEM.

    Makefile is updated so that only either bootmem.c or nobootmem.c is
    built according to CONFIG_NO_BOOTMEM.

    This patch doesn't introduce any behavior change.

    -tj: Rewrote commit description.

    Suggested-by: Ingo Molnar
    Signed-off-by: Yinghai Lu
    Acked-by: Andrew Morton
    Signed-off-by: Tejun Heo

    Yinghai Lu
     

28 Aug, 2010

3 commits

  • 1.include linux/memblock.h directly. so later could reduce e820.h reference.
    2 this patch is done by sed scripts mainly

    -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL

    Signed-off-by: Yinghai Lu
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     
  • 1. replace find_e820_area with memblock_find_in_range
    2. replace reserve_early with memblock_x86_reserve_range
    3. replace free_early with memblock_x86_free_range.
    4. NO_BOOTMEM will switch to use memblock too.
    5. use _e820, _early wrap in the patch, in following patch, will
    replace them all
    6. because memblock_x86_free_range support partial free, we can remove some special care
    7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill()
    so adjust some calling later in setup.c::setup_arch()
    -- corruption_check and mptable_update

    -v2: Move reserve_brk() early
    Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range()
    that could happen We have more then 128 RAM entry in E820 tables, and
    memblock_x86_fill() could use memblock_find_in_range() to find a new place for
    memblock.memory.region array.
    and We don't need to use extend_brk() after fill_memblock_area()
    So move reserve_brk() early before fill_memblock_area().
    -v3: Move find_smp_config early
    To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable
    in right place.
    -v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in
    memblock.reserved already..
    use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later.
    -v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit
    active_region for 32bit does include high pages
    need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped()
    -v6: Use current_limit instead
    -v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L
    -v8: Set memblock_can_resize early to handle EFI with more RAM entries
    -v9: update after kmemleak changes in mainline

    Suggested-by: David S. Miller
    Suggested-by: Benjamin Herrenschmidt
    Suggested-by: Thomas Gleixner
    Signed-off-by: Yinghai Lu
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     
  • It will be used memblock_x86_to_bootmem converting

    It is an wrapper for reserve_bootmem, and x86 64bit is using special one.

    Also clean up that version for x86_64. We don't need to take care of numa
    path for that, bootmem can handle it how

    Signed-off-by: Yinghai Lu
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     

21 Jul, 2010

1 commit

  • Borislav Petkov reported his 32bit numa system has problem:

    [ 0.000000] Reserving total of 4c00 pages for numa KVA remap
    [ 0.000000] kva_start_pfn ~ 32800 max_low_pfn ~ 375fe
    [ 0.000000] max_pfn = 238000
    [ 0.000000] 8202MB HIGHMEM available.
    [ 0.000000] 885MB LOWMEM available.
    [ 0.000000] mapped low ram: 0 - 375fe000
    [ 0.000000] low ram: 0 - 375fe000
    [ 0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 1000 1000 => 34e7000
    [ 0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 200 40 => 34c9d80
    [ 0.000000] alloc (nid=0 100000 - 7ee00000) (1000000 - ffffffffffffffff) 180 40 => 34e6140
    [ 0.000000] alloc (nid=1 80000000 - c7e60000) (1000000 - ffffffffffffffff) 240 40 => 80000000
    [ 0.000000] BUG: unable to handle kernel paging request at 40000000
    [ 0.000000] IP: [] __alloc_memory_core_early+0x147/0x1d6
    [ 0.000000] *pdpt = 0000000000000000 *pde = f000ff53f000ff00
    ...
    [ 0.000000] Call Trace:
    [ 0.000000] [] ? __alloc_bootmem_node+0x216/0x22f
    [ 0.000000] [] ? sparse_early_usemaps_alloc_node+0x5a/0x10b
    [ 0.000000] [] ? sparse_init+0x1dc/0x499
    [ 0.000000] [] ? paging_init+0x168/0x1df
    [ 0.000000] [] ? native_pagetable_setup_start+0xef/0x1bb

    looks like it allocates too much high address for bootmem.

    Try to cut limit with get_max_mapped()

    Reported-by: Borislav Petkov
    Tested-by: Conny Seidel
    Signed-off-by: Yinghai Lu
    Cc: [2.6.34.x]
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Johannes Weiner
    Cc: Lee Schermerhorn
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

08 Apr, 2010

1 commit

  • …git/x86/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
    x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards
    x86: Increase CONFIG_NODES_SHIFT max to 10
    ibft, x86: Change reserve_ibft_region() to find_ibft_region()
    x86, hpet: Fix bug in RTC emulation
    x86, hpet: Erratum workaround for read after write of HPET comparator
    bootmem, x86: Fix 32bit numa system without RAM on node 0
    nobootmem, x86: Fix 32bit numa system without RAM on node 0
    x86: Handle overlapping mptables
    x86: Make e820_remove_range to handle all covered case
    x86-32, resume: do a global tlb flush in S4 resume

    Linus Torvalds
     

02 Apr, 2010

1 commit

  • When 32bit numa is used, free_all_bootmem() will still only go over with
    node id 0.

    If node 0 doesn't have RAM installed, the lowest populated node
    becomes low RAM.

    This one fixes BOOTMEM path by iterating over the bdata_list.

    -v3: add more comments, and fix bootmem path too.
    -v4: seperate from one big patch

    Signed-off-by: Yinghai Lu
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Yinghai Lu