11 Jan, 2012

3 commits

  • The loop that frees pages to the page allocator while bootstrapping tries
    to free higher-order blocks only when the starting address is aligned to
    that block size. Otherwise it will free all pages on that node
    one-by-one.

    Change it to free individual pages up to the first aligned block and then
    try higher-order frees from there.

    Signed-off-by: Johannes Weiner
    Cc: Uwe Kleine-König
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The area node_bootmem_map represents is aligned to BITS_PER_LONG, and all
    bits in any aligned word of that map valid. When the represented area
    extends beyond the end of the node, the non-existant pages will be marked
    as reserved.

    As a result, when freeing a page block, doing an explicit range check for
    whether that block is within the node's range is redundant as the bitmap
    is consulted anyway to see whether all pages in the block are unreserved.

    Signed-off-by: Johannes Weiner
    Cc: Uwe Kleine-König
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The first entry of bdata->node_bootmem_map holds the data for
    bdata->node_min_pfn up to bdata->node_min_pfn + BITS_PER_LONG - 1. So the
    test for freeing all pages of a single map entry can be slightly relaxed.

    Moreover use DIV_ROUND_UP in another place instead of open coding it.

    Signed-off-by: Uwe Kleine-König
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     

31 Oct, 2011

1 commit


24 Mar, 2011

1 commit

  • …p_elfcorehdr and saved_max_pfn

    The Xen PV drivers in a crashed HVM guest can not connect to the dom0
    backend drivers because both frontend and backend drivers are still in
    connected state. To run the connection reset function only in case of a
    crashdump, the is_kdump_kernel() function needs to be available for the PV
    driver modules.

    Consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn into
    kernel/crash_dump.c Also export elfcorehdr_addr to make is_kdump_kernel()
    usable for modules.

    Leave 'elfcorehdr' as early_param(). This changes powerpc from __setup()
    to early_param(). It adds an address range check from x86 also on ia64
    and powerpc.

    [akpm@linux-foundation.org: additional #includes]
    [akpm@linux-foundation.org: remove elfcorehdr_addr export]
    [akpm@linux-foundation.org: fix for Tejun's mm/nobootmem.c changes]
    Signed-off-by: Olaf Hering <olaf@aepfle.de>
    Cc: Russell King <rmk@arm.linux.org.uk>
    Cc: "Luck, Tony" <tony.luck@intel.com>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Paul Mundt <lethal@linux-sh.org>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Olaf Hering
     

24 Feb, 2011

2 commits

  • Now that bootmem.c and nobootmem.c are separate, it's cleaner to
    define contig_page_data in each file than in page_alloc.c with #ifdef.
    Move it.

    This patch doesn't introduce any behavior change.

    -v2: According to Andrew, fixed the struct layout.
    -tj: Updated commit description.

    Signed-off-by: Yinghai Lu
    Acked-by: Andrew Morton
    Signed-off-by: Tejun Heo

    Yinghai Lu
     
  • mm/bootmem.c contained code paths for both bootmem and no bootmem
    configurations. They implement about the same set of APIs in
    different ways and as a result bootmem.c contains massive amount of
    #ifdef CONFIG_NO_BOOTMEM.

    Separate out CONFIG_NO_BOOTMEM code into mm/nobootmem.c. As the
    common part is relatively small, duplicate them in nobootmem.c instead
    of creating a common file or ifdef'ing in bootmem.c.

    The followings are duplicated.

    * {min|max}_low_pfn, max_pfn, saved_max_pfn
    * free_bootmem_late()
    * ___alloc_bootmem()
    * __alloc_bootmem_low()

    The followings are applicable only to nobootmem and moved verbatim.

    * __free_pages_memory()
    * free_all_memory_core_early()

    The followings are not applicable to nobootmem and omitted in
    nobootmem.c.

    * reserve_bootmem_node()
    * reserve_bootmem()

    The rest split function bodies according to CONFIG_NO_BOOTMEM.

    Makefile is updated so that only either bootmem.c or nobootmem.c is
    built according to CONFIG_NO_BOOTMEM.

    This patch doesn't introduce any behavior change.

    -tj: Rewrote commit description.

    Suggested-by: Ingo Molnar
    Signed-off-by: Yinghai Lu
    Acked-by: Andrew Morton
    Signed-off-by: Tejun Heo

    Yinghai Lu
     

28 Aug, 2010

3 commits

  • 1.include linux/memblock.h directly. so later could reduce e820.h reference.
    2 this patch is done by sed scripts mainly

    -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL

    Signed-off-by: Yinghai Lu
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     
  • 1. replace find_e820_area with memblock_find_in_range
    2. replace reserve_early with memblock_x86_reserve_range
    3. replace free_early with memblock_x86_free_range.
    4. NO_BOOTMEM will switch to use memblock too.
    5. use _e820, _early wrap in the patch, in following patch, will
    replace them all
    6. because memblock_x86_free_range support partial free, we can remove some special care
    7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill()
    so adjust some calling later in setup.c::setup_arch()
    -- corruption_check and mptable_update

    -v2: Move reserve_brk() early
    Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range()
    that could happen We have more then 128 RAM entry in E820 tables, and
    memblock_x86_fill() could use memblock_find_in_range() to find a new place for
    memblock.memory.region array.
    and We don't need to use extend_brk() after fill_memblock_area()
    So move reserve_brk() early before fill_memblock_area().
    -v3: Move find_smp_config early
    To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable
    in right place.
    -v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in
    memblock.reserved already..
    use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later.
    -v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit
    active_region for 32bit does include high pages
    need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped()
    -v6: Use current_limit instead
    -v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L
    -v8: Set memblock_can_resize early to handle EFI with more RAM entries
    -v9: update after kmemleak changes in mainline

    Suggested-by: David S. Miller
    Suggested-by: Benjamin Herrenschmidt
    Suggested-by: Thomas Gleixner
    Signed-off-by: Yinghai Lu
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     
  • It will be used memblock_x86_to_bootmem converting

    It is an wrapper for reserve_bootmem, and x86 64bit is using special one.

    Also clean up that version for x86_64. We don't need to take care of numa
    path for that, bootmem can handle it how

    Signed-off-by: Yinghai Lu
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     

21 Jul, 2010

1 commit

  • Borislav Petkov reported his 32bit numa system has problem:

    [ 0.000000] Reserving total of 4c00 pages for numa KVA remap
    [ 0.000000] kva_start_pfn ~ 32800 max_low_pfn ~ 375fe
    [ 0.000000] max_pfn = 238000
    [ 0.000000] 8202MB HIGHMEM available.
    [ 0.000000] 885MB LOWMEM available.
    [ 0.000000] mapped low ram: 0 - 375fe000
    [ 0.000000] low ram: 0 - 375fe000
    [ 0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 1000 1000 => 34e7000
    [ 0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 200 40 => 34c9d80
    [ 0.000000] alloc (nid=0 100000 - 7ee00000) (1000000 - ffffffffffffffff) 180 40 => 34e6140
    [ 0.000000] alloc (nid=1 80000000 - c7e60000) (1000000 - ffffffffffffffff) 240 40 => 80000000
    [ 0.000000] BUG: unable to handle kernel paging request at 40000000
    [ 0.000000] IP: [] __alloc_memory_core_early+0x147/0x1d6
    [ 0.000000] *pdpt = 0000000000000000 *pde = f000ff53f000ff00
    ...
    [ 0.000000] Call Trace:
    [ 0.000000] [] ? __alloc_bootmem_node+0x216/0x22f
    [ 0.000000] [] ? sparse_early_usemaps_alloc_node+0x5a/0x10b
    [ 0.000000] [] ? sparse_init+0x1dc/0x499
    [ 0.000000] [] ? paging_init+0x168/0x1df
    [ 0.000000] [] ? native_pagetable_setup_start+0xef/0x1bb

    looks like it allocates too much high address for bootmem.

    Try to cut limit with get_max_mapped()

    Reported-by: Borislav Petkov
    Tested-by: Conny Seidel
    Signed-off-by: Yinghai Lu
    Cc: [2.6.34.x]
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Johannes Weiner
    Cc: Lee Schermerhorn
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

08 Apr, 2010

1 commit

  • …git/x86/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
    x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards
    x86: Increase CONFIG_NODES_SHIFT max to 10
    ibft, x86: Change reserve_ibft_region() to find_ibft_region()
    x86, hpet: Fix bug in RTC emulation
    x86, hpet: Erratum workaround for read after write of HPET comparator
    bootmem, x86: Fix 32bit numa system without RAM on node 0
    nobootmem, x86: Fix 32bit numa system without RAM on node 0
    x86: Handle overlapping mptables
    x86: Make e820_remove_range to handle all covered case
    x86-32, resume: do a global tlb flush in S4 resume

    Linus Torvalds
     

02 Apr, 2010

2 commits

  • When 32bit numa is used, free_all_bootmem() will still only go over with
    node id 0.

    If node 0 doesn't have RAM installed, the lowest populated node
    becomes low RAM.

    This one fixes BOOTMEM path by iterating over the bdata_list.

    -v3: add more comments, and fix bootmem path too.
    -v4: seperate from one big patch

    Signed-off-by: Yinghai Lu
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     
  • On one system without RAM on node0, got following boot dump with a 32
    bit NUMA kernel:

    early_node_map[4] active PFN ranges
    1: 0x00000010 -> 0x00000099
    1: 0x00000100 -> 0x0007da00
    1: 0x0007e800 -> 0x0007ffa0
    1: 0x0007ffae -> 0x0007ffb0
    ...
    Subtract (29 early reservations)
    #000 [0000001000 - 0000002000]
    #001 [0000089000 - 000008f000]
    #002 [0000091000 - 0000093500]
    ...
    #027 [007cbfef40 - 007e800000]
    #028 [007e9ca000 - 007ff95000]
    (0 free memory ranges)
    Initializing HighMem for node 0 (00000000:00000000)
    Initializing HighMem for node 1 (00000000:00000000)
    Memory: 0k/2096832k available (6662k kernel code, 2096300k reserved, 4829k data, 484k init, 0k highmem)
    ...
    Checking if this processor honours the WP bit even in supervisor mode...Ok.
    swapper: page allocation failure. order:0, mode:0x0
    Pid: 0, comm: swapper Not tainted 2.6.34-rc3-tip-03818-g4b1ea6c-dirty #35
    Call Trace:
    [] ? printk+0xf/0x11
    [] __alloc_pages_nodemask+0x417/0x487
    [] new_slab+0xe2/0x1fe
    [] kmem_cache_open+0x185/0x358
    [] T.954+0x1c/0x60
    [] kmem_cache_init+0x24/0x113
    [] start_kernel+0x166/0x2e4
    [] ? unknown_bootoption+0x0/0x18e
    [] i386_start_kernel+0xce/0xd5
    Mem-Info:
    Node 1 DMA per-cpu:
    CPU 0: hi: 0, btch: 1 usd: 0
    Node 1 Normal per-cpu:
    CPU 0: hi: 0, btch: 1 usd: 0
    active_anon:0 inactive_anon:0 isolated_anon:0
    active_file:0 inactive_file:0 isolated_file:0
    unevictable:0 dirty:0 writeback:0 unstable:0
    free:0 slab_reclaimable:0 slab_unreclaimable:0
    mapped:0 shmem:0 pagetables:0 bounce:0

    When 32bit NUMA is used, free_all_bootmem() will still only go over with
    node id 0.

    If node 0 doesn't have RAM installed, We need to go with node1
    because early_node_map still use 1 for all ranges, and ram from node1
    become low ram.

    Use MAX_NUMNODES like 64-bit NUMA does.

    Note: BOOTMEM path has the same problem.
    this bug exist before We have NO_BOOTMEM support.

    -v3: add more comments, and fix bootmem path too.
    -v4: seperate bootmem path fix

    Signed-off-by: Yinghai Lu
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Mar, 2010

1 commit

  • Commit 08677214e318297 ("x86: Make 64 bit use early_res instead
    of bootmem before slab") introduced early_res replacement for
    bootmem, but left code in __free_pages_memory() which dumps all
    the ranges that are beeing freed, without any additional
    information, causing some noise in dmesg during bootup.

    Just remove printing of the ranges, that doesn't provide
    anything useful anyway.

    While at it, remove other commented-out KERN_DEBUG messages in
    the NO_BOOTMEM code as well.

    Signed-off-by: Jiri Kosina
    Found-OK-by: Andrew Morton
    Cc: Johannes Weiner
    Cc: Yinghai Lu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jiri Kosina
     

13 Feb, 2010

1 commit


16 Dec, 2009

1 commit


10 Nov, 2009

1 commit

  • Add a new function for freeing bootmem after the bootmem
    allocator has been released and the unreserved pages given to
    the page allocator.

    This allows us to reserve bootmem and then release it if we
    later discover it was not needed.

    ( This new API will be used by the swiotlb code to recover
    a significant amount of RAM (64MB). )

    Signed-off-by: FUJITA Tomonori
    Acked-by: Pekka Enberg
    Cc: chrisw@sous-sol.org
    Cc: dwmw2@infradead.org
    Cc: joerg.roedel@amd.com
    Cc: muli@il.ibm.com
    Cc: hannes@cmpxchg.org
    Cc: tj@kernel.org
    Cc: akpm@linux-foundation.org
    Cc: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    FUJITA Tomonori
     

27 Aug, 2009

1 commit


08 Jul, 2009

1 commit

  • This patch adds kmemleak_alloc/free callbacks to the bootmem allocator.
    This would allow scanning of such blocks and help avoiding a whole class
    of false positives and more kmemleak annotations.

    Signed-off-by: Catalin Marinas
    Cc: Ingo Molnar
    Acked-by: Pekka Enberg
    Reviewed-by: Johannes Weiner

    Catalin Marinas
     

20 Jun, 2009

1 commit


12 Jun, 2009

2 commits

  • If the user requested bootmem allocation on a specific node, we should use
    kzalloc_node() for the fallback allocation.

    Cc: Ingo Molnar
    Cc: Johannes Weiner
    Cc: Linus Torvalds
    Cc: Yinghai Lu
    Signed-off-by: Pekka Enberg

    Pekka Enberg
     
  • As a preparation for initializing the slab allocator early, make sure the
    bootmem allocator does not crash and burn if someone calls it after slab is up;
    otherwise we'd need a flag day for switching to early slab.

    Acked-by: Johannes Weiner
    Acked-by: Linus Torvalds
    Cc: Christoph Lameter
    Cc: Ingo Molnar
    Cc: Matt Mackall
    Cc: Nick Piggin
    Cc: Yinghai Lu
    Signed-off-by: Pekka Enberg

    Pekka Enberg
     

01 Mar, 2009

1 commit

  • Impact: fix new breakages introduced by previous fix

    Commit c132937556f56ee4b831ef4b23f1846e05fde102 tried to clean up
    bootmem arch wrapper but it wasn't quite correct. Before the commit,
    the followings were broken.

    * Low level interface functions prefixed with __ ignored arch
    preference.

    * reserve_bootmem(...) can't be mapped into
    reserve_bootmem_node(NODE_DATA(0)->bdata, ...) because the node is
    not preference here. The region specified MUST fall into the
    specified region; otherwise, it will panic.

    After the commit,

    * If allocation fails for the arch preferred node, it should fallback
    to whatever is available. Instead, it simply failed allocation.

    There are too many internal details to allow generic wrapping and
    still keep things simple for archs. Plus, all that arch wants is a
    way to prefer certain node over another.

    This patch drops the generic wrapping around alloc_bootmem_core() and
    add alloc_bootmem_core() instead. If necessary, arch can define
    bootmem_arch_referred_node() macro or function which takes all
    allocation information and returns the preferred node. bootmem
    generic code will always try the preferred node first and then
    fallback to other nodes as usual.

    Breakages noted and changes reviewed by Johannes Weiner.

    Signed-off-by: Tejun Heo
    Acked-by: Johannes Weiner

    Tejun Heo
     

24 Feb, 2009

1 commit

  • Impact: cleaner and consistent bootmem wrapping

    By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
    arch-specific wrappers for bootmem allocation. However, this is done
    a bit strangely in that only the high level convenience macros can be
    changed while lower level, but still exported, interface functions
    can't be wrapped. This not only is messy but also leads to strange
    situation where alloc_bootmem() does what the arch wants it to do but
    the equivalent __alloc_bootmem() call doesn't although they should be
    able to be used interchangeably.

    This patch updates bootmem such that archs can override / wrap the
    backend function - alloc_bootmem_core() instead of the highlevel
    interface functions to allow simpler and consistent wrapping. Also,
    HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM.

    Signed-off-by: Tejun Heo
    Cc: Johannes Weiner

    Tejun Heo
     

07 Jan, 2009

1 commit


17 Oct, 2008

1 commit


21 Aug, 2008

1 commit

  • Absolute alignment requirements may never be applied to node-relative
    offsets. Andreas Herrmann spotted this flaw when a bootmem allocation on
    an unaligned node was itself not aligned because the combination of an
    unaligned node with an aligned offset into that node is not garuanteed to
    be aligned itself.

    This patch introduces two helper functions that align a node-relative
    index or offset with respect to the node's starting address so that the
    absolute PFN or virtual address that results from combining the two
    satisfies the requested alignment.

    Then all the broken ALIGN()s in alloc_bootmem_core() are replaced by these
    helpers.

    Signed-off-by: Johannes Weiner
    Reported-by: Andreas Herrmann
    Debugged-by: Andreas Herrmann
    Reviewed-by: Andreas Herrmann
    Tested-by: Andreas Herrmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

15 Aug, 2008

1 commit

  • This is the minimal sequence that jams the allocator:

    void *p, *q, *r;
    p = alloc_bootmem(PAGE_SIZE);
    q = alloc_bootmem(64);
    free_bootmem(p, PAGE_SIZE);
    p = alloc_bootmem(PAGE_SIZE);
    r = alloc_bootmem(64);

    after this sequence (assuming that the allocator was empty or page-aligned
    before), pointer "q" will be equal to pointer "r".

    What's hapenning inside the allocator:
    p = alloc_bootmem(PAGE_SIZE);
    in allocator: last_end_off == PAGE_SIZE, bitmap contains bits 10000...
    q = alloc_bootmem(64);
    in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 11000...
    free_bootmem(p, PAGE_SIZE);
    in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 01000...
    p = alloc_bootmem(PAGE_SIZE);
    in allocator: last_end_off == PAGE_SIZE, bitmap contains 11000...
    r = alloc_bootmem(64);

    and now:

    it finds bit "2", as a place where to allocate (sidx)

    it hits the condition

    if (bdata->last_end_off && PFN_DOWN(bdata->last_end_off) + 1 == sidx))
    start_off = ALIGN(bdata->last_end_off, align);

    -you can see that the condition is true, so it assigns start_off =
    ALIGN(bdata->last_end_off, align); (that is PAGE_SIZE) and allocates
    over already allocated block.

    With the patch it tries to continue at the end of previous allocation only
    if the previous allocation ended in the middle of the page.

    Signed-off-by: Mikulas Patocka
    Acked-by: Johannes Weiner
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mikulas Patocka
     

25 Jul, 2008

10 commits

  • Almost all users of this field need a PFN instead of a physical address,
    so replace node_boot_start with node_min_pfn.

    [Lee.Schermerhorn@hp.com: fix spurious BUG_ON() in mark_bootmem()]
    Signed-off-by: Johannes Weiner
    Cc:
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Since alloc_bootmem_core does no goal-fallback anymore and just returns
    NULL if the allocation fails, we might now use it in alloc_bootmem_section
    without all the fixup code for a misplaced allocation.

    Also, the limit can be the first PFN of the next section as the semantics
    is that the limit is _above_ the allocated region, not within.

    Signed-off-by: Johannes Weiner
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • __alloc_bootmem_node already does this, make the interface consistent.

    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The old node-agnostic code tried allocating on all nodes starting from the
    one with the lowest range. alloc_bootmem_core retried without the goal if
    it could not satisfy it and so the goal was only respected at all when it
    happened to be on the first (lowest page numbers) node (or theoretically
    if allocations failed on all nodes before to the one holding the goal).

    Introduce a non-panicking helper that starts allocating from the node
    holding the goal and falls back only after all thes tries failed, thus
    moving the goal fallback code out of alloc_bootmem_core.

    Make all other allocation functions benefit from this new helper.

    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Andi Kleen
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Introduce new helpers that mark a range that resides completely on a node
    or node-agnostic ranges that might also span node boundaries.

    The free/reserve API functions will then directly use these helpers.

    Note that the free/reserve semantics become more strict: while the prior
    code took basically arbitrary range arguments and marked the PFNs that
    happen to fall into that range, the new code requires node-specific ranges
    to be completely on the node. The node-agnostic requests might span node
    boundaries as long as the nodes are contiguous.

    Passing ranges that do not satisfy these criteria is a bug.

    [akpm@linux-foundation.org: fix printk warnings]
    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Factor out the common operation of marking a range on the bitmap.

    [akpm@linux-foundation.org: fix various warnings]
    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • alloc_bootmem_core has become quite nasty to read over time. This is a
    clean rewrite that keeps the semantics.

    bdata->last_pos has been dropped.

    bdata->last_success has been renamed to hint_idx and it is now an index
    relative to the node's range. Since further block searching might start
    at this index, it is now set to the end of a succeeded allocation rather
    than its beginning.

    bdata->last_offset has been renamed to last_end_off to be more clear that
    it represents the ending address of the last allocation relative to the
    node.

    [y-goto@jp.fujitsu.com: fix new alloc_bootmem_core()]
    Signed-off-by: Johannes Weiner
    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Rewrite the code in a more concise way using less variables.

    [akpm@linux-foundation.org: fix printk warnings]
    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • link_bootmem handles an insertion of a new descriptor into the sorted list
    in more or less three explicit branches; empty list, insert in between and
    append. These cases can be expressed implicite.

    Also mark the sorted list as initdata as it can be thrown away after boot
    as well.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Reincarnate get_mapsize as bootmap_bytes and implement
    bootmem_bootmap_pages on top of it.

    Adjust users of these helpers and make free_all_bootmem_core use
    bootmem_bootmap_pages instead of open-coding it.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner