28 Apr, 2008

2 commits

  • alloc_bootmem_section() can allocate specified section's area. This is used
    for usemap to keep same section with pgdat by later patch.

    Signed-off-by: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • This patch set is to free pages which is allocated by bootmem for
    memory-hotremove. Some structures of memory management are allocated by
    bootmem. ex) memmap, etc.

    To remove memory physically, some of them must be freed according to
    circumstance. This patch set makes basis to free those pages, and free
    memmaps.

    Basic my idea is using remain members of struct page to remember information
    of users of bootmem (section number or node id). When the section is
    removing, kernel can confirm it. By this information, some issues can be
    solved.

    1) When the memmap of removing section is allocated on other
    section by bootmem, it should/can be free.
    2) When the memmap of removing section is allocated on the
    same section, it shouldn't be freed. Because the section has to be
    logical memory offlined already and all pages must be isolated against
    page allocater. If it is freed, page allocator may use it which will
    be removed physically soon.
    3) When removing section has other section's memmap,
    kernel will be able to show easily which section should be removed
    before it for user. (Not implemented yet)
    4) When the above case 2), the page isolation will be able to check and skip
    memmap's page when logical memory offline (offline_pages()).
    Current page isolation code fails in this case because this page is
    just reserved page and it can't distinguish this pages can be
    removed or not. But, it will be able to do by this patch.
    (Not implemented yet.)
    5) The node information like pgdat has similar issues. But, this
    will be able to be solved too by this.
    (Not implemented yet, but, remembering node id in the pages.)

    Fortunately, current bootmem allocator just keeps PageReserved flags,
    and doesn't use any other members of page struct. The users of
    bootmem doesn't use them too.

    This patch:

    This is to register information which is node or section's id. Kernel can
    distinguish which node/section uses the pages allcated by bootmem. This is
    basis for hot-remove sections or nodes.

    Signed-off-by: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

27 Apr, 2008

3 commits

  • split reserve_bootmem_core() into two functions, one which checks
    conflicts, and one which sets the bits.

    and make reserve_bootmem to loop bdata_list to cross the nodes.

    user could be crashkernel and ramdisk..., in case the range provided
    by those externalities crosses the nodes.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • need offset alignment when node_boot_start's alignment is less than
    the alignment required.

    use local node_boot_start to match alignment - so don't add extra operation
    in search loop.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • Make the nodes other than node 0 use bdata->last_success for fast
    search too.

    We need to use __alloc_bootmem_core() for vmemmap allocation for other
    nodes when numa and sparsemem/vmemmap are enabled.

    Also, make fail_block path increase i with incr only after ALIGN
    to avoid extra increase when size is larger than align.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

25 Mar, 2008

1 commit

  • With numa enabled, some callers could have a range of memory on one node
    but try to free that on other node. This can cause some pages to be
    freed wrongly.

    For example: when we try to allocate 128g boot ram early for
    gart/swiotlb, and free that range later so gart/swiotlb can get some
    range afterwards.

    With this patch, we don't need to care which node holds the range, just
    loop to call free_bootmem_node for all online nodes.

    This patch makes free_bootmem_core() more robust by trimming the sidx
    and eidx according the ram range that the node has.

    And make the free_bootmem_core handle this out of range case. We could
    use bdata_list to make sure the range can be freed for sure. So next
    time, we don't need to loop online nodes and could use free_bootmem
    directly.

    Signed-off-by: Yinghai Lu
    Cc: Andi Kleen
    Cc: Yasunori Goto
    Cc: KAMEZAWA Hiroyuki
    Acked-by: Ingo Molnar
    Tested-by: Ingo Molnar
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

08 Feb, 2008

1 commit

  • This patchset adds a flags variable to reserve_bootmem() and uses the
    BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
    between crashkernel area and already used memory.

    This patch:

    Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
    If that flag is set, the function returns with -EBUSY if the memory already
    has been reserved in the past. This is to avoid conflicts.

    Because that code runs before SMP initialisation, there's no race condition
    inside reserve_bootmem_core().

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix powerpc build]
    Signed-off-by: Bernhard Walle
    Cc:
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     

08 Dec, 2006

2 commits


26 Sep, 2006

6 commits


11 Jul, 2006

1 commit


10 Apr, 2006

1 commit

  • The node setup code would try to allocate the node metadata in the node
    itself, but that fails if there is no memory in there.

    This can happen with memory hotplug when the hotplug area defines an so
    far empty node.

    Now use bootmem to try to allocate the mem_map in other nodes.

    And if it fails don't panic, but just ignore the node.

    To make this work I added a new __alloc_bootmem_nopanic function that
    does what its name implies.

    TBD should try to use nearby nodes here. Currently we just use any.
    It's hard to do it better because bootmem doesn't have proper fallback
    lists yet.

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

28 Mar, 2006

1 commit

  • Add a list_head to bootmem_data_t and make bootmems use it. bootmem list is
    sorted by node_boot_start.

    Only nodes against which init_bootmem() is called are linked to the list.
    (i386 allocates bootmem only from one node(0) not from all online nodes.)

    A summary:
    1. for_each_online_pgdat() traverses all *online* nodes.
    2. alloc_bootmem() allocates memory only from initialized-for-bootmem nodes.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

26 Mar, 2006

1 commit


07 Jan, 2006

2 commits

  • The attached patch cleans up the way the bootmem allocator frees pages.

    A new function, __free_pages_bootmem(), is provided in mm/page_alloc.c that is
    called from mm/bootmem.c to turn pages over to the main allocator. All the
    bits of code to initialise pages (clearing PG_reserved and setting the page
    count) are moved to here. The checks on page validity are removed, on the
    assumption that the struct page arrays will have been prepared correctly.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Patch cleans up the alloc_bootmem fix for swiotlb. Patch removes
    alloc_bootmem_*_limit api and fixes alloc_boot_*low api to do the right
    thing -- allocate from low32 memory.

    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ravikiran G Thirumalai
     

13 Dec, 2005

1 commit

  • Hitting BUG_ON() in __alloc_bootmem_core() when there is no free page
    available in the first node's memory. For the case of kdump on PPC64
    (Power 4 machine), the captured kernel is used two memory regions - memory
    for TCE tables (tce-base and tce-size at top of RAM and reserved) and
    captured kernel memory region (crashk_base and crashk_size). Since we
    reserve the memory for the first node, we should be returning from
    __alloc_bootmem_core() to search for the next node (pg_dat).

    Currently, find_next_zero_bit() is returning the n^th bit (eidx) when there
    is no free page. Then, test_bit() is failed since we set 0xff only for the
    actual size initially (init_bootmem_core()) even though rounded up to one
    page for bdata->node_bootmem_map. We are hitting the BUG_ON after failing
    to enter second "for" loop.

    Signed-off-by: Haren Myneni
    Cc: Andy Whitcroft
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Haren Myneni
     

30 Oct, 2005

1 commit

  • Remove PageReserved() calls from core code by tightening VM_RESERVED
    handling in mm/ to cover PageReserved functionality.

    PageReserved special casing is removed from get_page and put_page.

    All setting and clearing of PageReserved is retained, and it is now flagged
    in the page_alloc checks to help ensure we don't introduce any refcount
    based freeing of Reserved pages.

    MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
    deprecated. We never completely handled it correctly anyway, and is be
    reintroduced in future if required (Hugh has a proof of concept).

    Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
    arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
    be trivially removed.

    Last real user of PageReserved is swsusp, which uses PageReserved to
    determine whether a struct page points to valid memory or not. This still
    needs to be addressed (a generic page_is_ram() should work).

    A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
    thus mapcounted and count towards shared rss). These writes to the struct
    page could cause excessive cacheline bouncing on big systems. There are a
    number of ways this could be addressed if it is an issue.

    Signed-off-by: Nick Piggin

    Refcount bug fix for filemap_xip.c

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

20 Oct, 2005

1 commit

  • This introduces a limit parameter to the core bootmem allocator; The new
    parameter indicates that physical memory allocated by the bootmem
    allocator should be within the requested limit.

    We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
    alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
    is the only api used for swiotlb.

    The existing alloc_bootmem_low_pages() api could instead have been
    changed and made to pass right limit to the core allocator. But that
    would make the patch more intrusive for 2.6.14, as other arches use
    alloc_bootmem_low_pages(). We may be done that post 2.6.14 as a
    cleanup.

    With this, swiotlb gets memory within 4G for both x86_64 and ia64
    arches.

    Signed-off-by: Yasunori Goto
    Cc: Ravikiran G Thirumalai
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

01 Oct, 2005

1 commit

  • As requested by Thomas Gleixner :

    "5d3d0f7704ed0bc7eaca0501eeae3e5da1ea6c87 breaks a couple of ARM
    boards, which depend on the historical bootmem allocation order.
    There is a cleaner solution around to remove the pgdat list
    completely, but this is a topic for post 2.6.14

    Andi signalled ACK already."

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

13 Sep, 2005

1 commit

  • This leads to bootmem allocating first from node 0 instead
    of from the last node. This avoids swiotlb allocating on the last node, which
    doesn't really work on a machine with >4GB.

    Note: there is a better patch around from someone else that gets
    rid of the pgdat list completely.

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

26 Jun, 2005

2 commits

  • This patch makes use of ALIGN() to remove duplicate round-up code.

    Signed-off-by: Nick Wilson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Wilson
     
  • This patch retrieves the max_pfn being used by previous kernel and stores it
    in a safe location (saved_max_pfn) before it is overwritten due to user
    defined memory map. This pfn is used to make sure that user does not try to
    read the physical memory beyond saved_max_pfn.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     

24 Jun, 2005

1 commit

  • Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
    mem_map[] is needed by discontiguous memory machines (like in the old
    CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
    replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
    become a complete replacement.

    A significant advantage over DISCONTIGMEM is that it's completely separated
    from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
    and DISCONTIG are often confused.

    Another advantage is that sparse doesn't require each NUMA node's ranges to be
    contiguous. It can handle overlapping ranges between nodes with no problems,
    where DISCONTIGMEM currently throws away that memory.

    Sparsemem uses an array to provide different pfn_to_page() translations for
    each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
    to be chopped up.

    In order to do quick pfn_to_page() operations, the section number of the page
    is encoded in page->flags. Part of the sparsemem infrastructure enables
    sharing of these bits more dynamically (at compile-time) between the
    page_zone() and sparsemem operations. However, on 32-bit architectures, the
    number of bits is quite limited, and may require growing the size of the
    page->flags type in certain conditions. Several things might force this to
    occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
    memory), an increase in the physical address space, or an increase in the
    number of used page->flags.

    One thing to note is that, once sparsemem is present, the NUMA node
    information no longer needs to be stored in the page->flags. It might provide
    speed increases on certain platforms and will be stored there if there is
    room. But, if out of room, an alternate (theoretically slower) mechanism is
    used.

    This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
    there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
    often have to compile out the same areas of code.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Martin Bligh
    Signed-off-by: Adrian Bunk
    Signed-off-by: Yasunori Goto
    Signed-off-by: Bob Picco
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds