17 Oct, 2008

1 commit


21 Aug, 2008

1 commit

  • Absolute alignment requirements may never be applied to node-relative
    offsets. Andreas Herrmann spotted this flaw when a bootmem allocation on
    an unaligned node was itself not aligned because the combination of an
    unaligned node with an aligned offset into that node is not garuanteed to
    be aligned itself.

    This patch introduces two helper functions that align a node-relative
    index or offset with respect to the node's starting address so that the
    absolute PFN or virtual address that results from combining the two
    satisfies the requested alignment.

    Then all the broken ALIGN()s in alloc_bootmem_core() are replaced by these
    helpers.

    Signed-off-by: Johannes Weiner
    Reported-by: Andreas Herrmann
    Debugged-by: Andreas Herrmann
    Reviewed-by: Andreas Herrmann
    Tested-by: Andreas Herrmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

15 Aug, 2008

1 commit

  • This is the minimal sequence that jams the allocator:

    void *p, *q, *r;
    p = alloc_bootmem(PAGE_SIZE);
    q = alloc_bootmem(64);
    free_bootmem(p, PAGE_SIZE);
    p = alloc_bootmem(PAGE_SIZE);
    r = alloc_bootmem(64);

    after this sequence (assuming that the allocator was empty or page-aligned
    before), pointer "q" will be equal to pointer "r".

    What's hapenning inside the allocator:
    p = alloc_bootmem(PAGE_SIZE);
    in allocator: last_end_off == PAGE_SIZE, bitmap contains bits 10000...
    q = alloc_bootmem(64);
    in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 11000...
    free_bootmem(p, PAGE_SIZE);
    in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 01000...
    p = alloc_bootmem(PAGE_SIZE);
    in allocator: last_end_off == PAGE_SIZE, bitmap contains 11000...
    r = alloc_bootmem(64);

    and now:

    it finds bit "2", as a place where to allocate (sidx)

    it hits the condition

    if (bdata->last_end_off && PFN_DOWN(bdata->last_end_off) + 1 == sidx))
    start_off = ALIGN(bdata->last_end_off, align);

    -you can see that the condition is true, so it assigns start_off =
    ALIGN(bdata->last_end_off, align); (that is PAGE_SIZE) and allocates
    over already allocated block.

    With the patch it tries to continue at the end of previous allocation only
    if the previous allocation ended in the middle of the page.

    Signed-off-by: Mikulas Patocka
    Acked-by: Johannes Weiner
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mikulas Patocka
     

25 Jul, 2008

20 commits

  • Almost all users of this field need a PFN instead of a physical address,
    so replace node_boot_start with node_min_pfn.

    [Lee.Schermerhorn@hp.com: fix spurious BUG_ON() in mark_bootmem()]
    Signed-off-by: Johannes Weiner
    Cc:
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Since alloc_bootmem_core does no goal-fallback anymore and just returns
    NULL if the allocation fails, we might now use it in alloc_bootmem_section
    without all the fixup code for a misplaced allocation.

    Also, the limit can be the first PFN of the next section as the semantics
    is that the limit is _above_ the allocated region, not within.

    Signed-off-by: Johannes Weiner
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • __alloc_bootmem_node already does this, make the interface consistent.

    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The old node-agnostic code tried allocating on all nodes starting from the
    one with the lowest range. alloc_bootmem_core retried without the goal if
    it could not satisfy it and so the goal was only respected at all when it
    happened to be on the first (lowest page numbers) node (or theoretically
    if allocations failed on all nodes before to the one holding the goal).

    Introduce a non-panicking helper that starts allocating from the node
    holding the goal and falls back only after all thes tries failed, thus
    moving the goal fallback code out of alloc_bootmem_core.

    Make all other allocation functions benefit from this new helper.

    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Andi Kleen
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Introduce new helpers that mark a range that resides completely on a node
    or node-agnostic ranges that might also span node boundaries.

    The free/reserve API functions will then directly use these helpers.

    Note that the free/reserve semantics become more strict: while the prior
    code took basically arbitrary range arguments and marked the PFNs that
    happen to fall into that range, the new code requires node-specific ranges
    to be completely on the node. The node-agnostic requests might span node
    boundaries as long as the nodes are contiguous.

    Passing ranges that do not satisfy these criteria is a bug.

    [akpm@linux-foundation.org: fix printk warnings]
    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Factor out the common operation of marking a range on the bitmap.

    [akpm@linux-foundation.org: fix various warnings]
    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • alloc_bootmem_core has become quite nasty to read over time. This is a
    clean rewrite that keeps the semantics.

    bdata->last_pos has been dropped.

    bdata->last_success has been renamed to hint_idx and it is now an index
    relative to the node's range. Since further block searching might start
    at this index, it is now set to the end of a succeeded allocation rather
    than its beginning.

    bdata->last_offset has been renamed to last_end_off to be more clear that
    it represents the ending address of the last allocation relative to the
    node.

    [y-goto@jp.fujitsu.com: fix new alloc_bootmem_core()]
    Signed-off-by: Johannes Weiner
    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Rewrite the code in a more concise way using less variables.

    [akpm@linux-foundation.org: fix printk warnings]
    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • link_bootmem handles an insertion of a new descriptor into the sorted list
    in more or less three explicit branches; empty list, insert in between and
    append. These cases can be expressed implicite.

    Also mark the sorted list as initdata as it can be thrown away after boot
    as well.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Reincarnate get_mapsize as bootmap_bytes and implement
    bootmem_bootmap_pages on top of it.

    Adjust users of these helpers and make free_all_bootmem_core use
    bootmem_bootmap_pages instead of open-coding it.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Introduce the bootmem_debug kernel parameter that enables very verbose
    diagnostics regarding all range operations of bootmem as well as the
    initialization and release of nodes.

    [akpm@linux-foundation.org: fix printk warnings]
    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Change the description, move a misplaced comment about the allocator
    itself and add me to the list of copyright holders.

    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • This only reorders functions so that further patches will be easier to
    read. No code changed.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Straight forward variant of the existing __alloc_bootmem_node, only
    subsequent patch when allocating giant hugepages at boot -- don't want to
    panic if we can't allocate as many as the user asked for.

    Signed-off-by: Andi Kleen
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • This function has no external callers, so unexport it. Also fix its naming
    inconsistency.

    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Mel Gorman
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • All _core functions only need the bootmem data, not the whole node descriptor.
    Adjust the two functions that take the node descriptor unneededly.

    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • The check for node_boot_start is bogus because we start freeing at the
    corresponding pfn. So check if the pfn is properly aligned instead in a more
    readable way and adjust the documentation.

    Also remove an unneeded accounting variable.

    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • There are a lot of places that define either a single bootmem descriptor or an
    array of them. Use only one central array with MAX_NUMNODES items instead.

    Signed-off-by: Johannes Weiner
    Acked-by: Ralf Baechle
    Cc: Ingo Molnar
    Cc: Richard Henderson
    Cc: Russell King
    Cc: Tony Luck
    Cc: Hirokazu Takata
    Cc: Geert Uytterhoeven
    Cc: Kyle McMartin
    Cc: Paul Mackerras
    Cc: Paul Mundt
    Cc: David S. Miller
    Cc: Yinghai Lu
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • There are a number of different views to how much memory is currently active.
    There is the arch-independent zone-sizing view, the bootmem allocator and
    memory models view.

    Architectures register this information at different times and is not
    necessarily in sync particularly with respect to some SPARSEMEM limitations.

    This patch introduces mminit_validate_memmodel_limits() which is able to
    validate and correct PFN ranges with respect to the memory model. It is only
    SPARSEMEM that currently validates itself.

    Signed-off-by: Mel Gorman
    Cc: Christoph Lameter
    Cc: Andy Whitcroft
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

22 Jun, 2008

1 commit

  • This patch changes the function reserve_bootmem_node() from void to int,
    returning -ENOMEM if the allocation fails.

    This fixes a build problem on x86 with CONFIG_KEXEC=y and
    CONFIG_NEED_MULTIPLE_NODES=y

    Signed-off-by: Bernhard Walle
    Reported-by: Adrian Bunk
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     

28 Apr, 2008

2 commits

  • alloc_bootmem_section() can allocate specified section's area. This is used
    for usemap to keep same section with pgdat by later patch.

    Signed-off-by: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • This patch set is to free pages which is allocated by bootmem for
    memory-hotremove. Some structures of memory management are allocated by
    bootmem. ex) memmap, etc.

    To remove memory physically, some of them must be freed according to
    circumstance. This patch set makes basis to free those pages, and free
    memmaps.

    Basic my idea is using remain members of struct page to remember information
    of users of bootmem (section number or node id). When the section is
    removing, kernel can confirm it. By this information, some issues can be
    solved.

    1) When the memmap of removing section is allocated on other
    section by bootmem, it should/can be free.
    2) When the memmap of removing section is allocated on the
    same section, it shouldn't be freed. Because the section has to be
    logical memory offlined already and all pages must be isolated against
    page allocater. If it is freed, page allocator may use it which will
    be removed physically soon.
    3) When removing section has other section's memmap,
    kernel will be able to show easily which section should be removed
    before it for user. (Not implemented yet)
    4) When the above case 2), the page isolation will be able to check and skip
    memmap's page when logical memory offline (offline_pages()).
    Current page isolation code fails in this case because this page is
    just reserved page and it can't distinguish this pages can be
    removed or not. But, it will be able to do by this patch.
    (Not implemented yet.)
    5) The node information like pgdat has similar issues. But, this
    will be able to be solved too by this.
    (Not implemented yet, but, remembering node id in the pages.)

    Fortunately, current bootmem allocator just keeps PageReserved flags,
    and doesn't use any other members of page struct. The users of
    bootmem doesn't use them too.

    This patch:

    This is to register information which is node or section's id. Kernel can
    distinguish which node/section uses the pages allcated by bootmem. This is
    basis for hot-remove sections or nodes.

    Signed-off-by: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

27 Apr, 2008

3 commits

  • split reserve_bootmem_core() into two functions, one which checks
    conflicts, and one which sets the bits.

    and make reserve_bootmem to loop bdata_list to cross the nodes.

    user could be crashkernel and ramdisk..., in case the range provided
    by those externalities crosses the nodes.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • need offset alignment when node_boot_start's alignment is less than
    the alignment required.

    use local node_boot_start to match alignment - so don't add extra operation
    in search loop.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • Make the nodes other than node 0 use bdata->last_success for fast
    search too.

    We need to use __alloc_bootmem_core() for vmemmap allocation for other
    nodes when numa and sparsemem/vmemmap are enabled.

    Also, make fail_block path increase i with incr only after ALIGN
    to avoid extra increase when size is larger than align.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

25 Mar, 2008

1 commit

  • With numa enabled, some callers could have a range of memory on one node
    but try to free that on other node. This can cause some pages to be
    freed wrongly.

    For example: when we try to allocate 128g boot ram early for
    gart/swiotlb, and free that range later so gart/swiotlb can get some
    range afterwards.

    With this patch, we don't need to care which node holds the range, just
    loop to call free_bootmem_node for all online nodes.

    This patch makes free_bootmem_core() more robust by trimming the sidx
    and eidx according the ram range that the node has.

    And make the free_bootmem_core handle this out of range case. We could
    use bdata_list to make sure the range can be freed for sure. So next
    time, we don't need to loop online nodes and could use free_bootmem
    directly.

    Signed-off-by: Yinghai Lu
    Cc: Andi Kleen
    Cc: Yasunori Goto
    Cc: KAMEZAWA Hiroyuki
    Acked-by: Ingo Molnar
    Tested-by: Ingo Molnar
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

08 Feb, 2008

1 commit

  • This patchset adds a flags variable to reserve_bootmem() and uses the
    BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
    between crashkernel area and already used memory.

    This patch:

    Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
    If that flag is set, the function returns with -EBUSY if the memory already
    has been reserved in the past. This is to avoid conflicts.

    Because that code runs before SMP initialisation, there's no race condition
    inside reserve_bootmem_core().

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix powerpc build]
    Signed-off-by: Bernhard Walle
    Cc:
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     

08 Dec, 2006

2 commits


26 Sep, 2006

6 commits


11 Jul, 2006

1 commit