22 Sep, 2009

1 commit

  • This is being done by allowing boot time allocations to specify that they
    may want a sub-page sized amount of memory.

    Overall this seems more consistent with the other hash table allocations,
    and allows making two supposedly mm-only variables really mm-only
    (nr_{kernel,all}_pages).

    Signed-off-by: Jan Beulich
    Cc: Ingo Molnar
    Cc: "Eric W. Biederman"
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

01 Apr, 2009

1 commit

  • On PowerPC we allocate large boot time hashes on node 0. This leads to an
    imbalance in the free memory, for example on a 64GB box (4 x 16GB nodes):

    Free memory:
    Node 0: 97.03%
    Node 1: 98.54%
    Node 2: 98.42%
    Node 3: 98.53%

    If we switch to using vmalloc (like ia64 and x86-64) things are more
    balanced:

    Free memory:
    Node 0: 97.53%
    Node 1: 98.35%
    Node 2: 98.33%
    Node 3: 98.33%

    For many HPC applications we are limited by the free available memory on
    the smallest node, so even though the same amount of memory is used the
    better balancing helps.

    Since all 64bit NUMA capable architectures should have sufficient vmalloc
    space, it makes sense to enable it via CONFIG_64BIT.

    Signed-off-by: Anton Blanchard
    Acked-by: David S. Miller
    Acked-by: Benjamin Herrenschmidt
    Acked-by: Ralf Baechle
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Ivan Kokshaysky
    Cc: Richard Henderson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     

24 Feb, 2009

2 commits

  • Impact: cleanup and addition of missing interface wrapper

    The interface functions in bootmem.h was ordered in not so orderly
    manner. Reorder them such that

    * functions allocating the same area group together -
    ie. alloc_bootmem group and alloc_bootmem_low group.

    * functions w/o node parameter come before the ones w/ node parameter.

    * nopanic variants are immediately below their panicky counterparts.

    While at it, add alloc_bootmem_pages_node_nopanic() which was missing.

    Signed-off-by: Tejun Heo
    Cc: Johannes Weiner

    Tejun Heo
     
  • Impact: cleaner and consistent bootmem wrapping

    By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
    arch-specific wrappers for bootmem allocation. However, this is done
    a bit strangely in that only the high level convenience macros can be
    changed while lower level, but still exported, interface functions
    can't be wrapped. This not only is messy but also leads to strange
    situation where alloc_bootmem() does what the arch wants it to do but
    the equivalent __alloc_bootmem() call doesn't although they should be
    able to be used interchangeably.

    This patch updates bootmem such that archs can override / wrap the
    backend function - alloc_bootmem_core() instead of the highlevel
    interface functions to allow simpler and consistent wrapping. Also,
    HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM.

    Signed-off-by: Tejun Heo
    Cc: Johannes Weiner

    Tejun Heo
     

13 Aug, 2008

1 commit


26 Jul, 2008

1 commit


25 Jul, 2008

6 commits

  • Almost all users of this field need a PFN instead of a physical address,
    so replace node_boot_start with node_min_pfn.

    [Lee.Schermerhorn@hp.com: fix spurious BUG_ON() in mark_bootmem()]
    Signed-off-by: Johannes Weiner
    Cc:
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • alloc_bootmem_core has become quite nasty to read over time. This is a
    clean rewrite that keeps the semantics.

    bdata->last_pos has been dropped.

    bdata->last_success has been renamed to hint_idx and it is now an index
    relative to the node's range. Since further block searching might start
    at this index, it is now set to the end of a succeeded allocation rather
    than its beginning.

    bdata->last_offset has been renamed to last_end_off to be more clear that
    it represents the ending address of the last allocation relative to the
    node.

    [y-goto@jp.fujitsu.com: fix new alloc_bootmem_core()]
    Signed-off-by: Johannes Weiner
    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • This only reorders functions so that further patches will be easier to
    read. No code changed.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Straight forward variant of the existing __alloc_bootmem_node, only
    subsequent patch when allocating giant hugepages at boot -- don't want to
    panic if we can't allocate as many as the user asked for.

    Signed-off-by: Andi Kleen
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • This function has no external callers, so unexport it. Also fix its naming
    inconsistency.

    Signed-off-by: Johannes Weiner
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Mel Gorman
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • There are a lot of places that define either a single bootmem descriptor or an
    array of them. Use only one central array with MAX_NUMNODES items instead.

    Signed-off-by: Johannes Weiner
    Acked-by: Ralf Baechle
    Cc: Ingo Molnar
    Cc: Richard Henderson
    Cc: Russell King
    Cc: Tony Luck
    Cc: Hirokazu Takata
    Cc: Geert Uytterhoeven
    Cc: Kyle McMartin
    Cc: Paul Mackerras
    Cc: Paul Mundt
    Cc: David S. Miller
    Cc: Yinghai Lu
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

08 Jul, 2008

1 commit


22 Jun, 2008

1 commit

  • This patch changes the function reserve_bootmem_node() from void to int,
    returning -ENOMEM if the allocation fails.

    This fixes a build problem on x86 with CONFIG_KEXEC=y and
    CONFIG_NEED_MULTIPLE_NODES=y

    Signed-off-by: Bernhard Walle
    Reported-by: Adrian Bunk
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     

28 Apr, 2008

1 commit

  • alloc_bootmem_section() can allocate specified section's area. This is used
    for usemap to keep same section with pgdat by later patch.

    Signed-off-by: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

08 Feb, 2008

1 commit

  • This patchset adds a flags variable to reserve_bootmem() and uses the
    BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
    between crashkernel area and already used memory.

    This patch:

    Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
    If that flag is set, the function returns with -EBUSY if the memory already
    has been reserved in the past. This is to avoid conflicts.

    Because that code runs before SMP initialisation, there's no race condition
    inside reserve_bootmem_core().

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix powerpc build]
    Signed-off-by: Bernhard Walle
    Cc:
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bernhard Walle
     

30 Oct, 2007

1 commit

  • This reverts commit 2e1c49db4c640b35df13889b86b9d62215ade4b6.

    First off, testing in Fedora has shown it to cause boot failures,
    bisected down by Martin Ebourne, and reported by Dave Jobes. So the
    commit will likely be reverted in the 2.6.23 stable kernels.

    Secondly, in the 2.6.24 model, x86-64 has now grown support for
    SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the
    bug is not visible any more, it's become invisible due to the code just
    being irrelevant and no longer enabled on the only architecture that
    this ever affected.

    Reported-by: Dave Jones
    Tested-by: Martin Ebourne
    Cc: Zou Nan hai
    Cc: Suresh Siddha
    Cc: Andrew Morton
    Acked-by: Andy Whitcroft
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Jun, 2007

1 commit

  • On systems with huge amount of physical memory, VFS cache and memory memmap
    may eat all available system memory under 4G, then the system may fail to
    allocate swiotlb bounce buffer.

    There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
    not cover sparsemem model.

    This patch add fix to sparsemem model by first try to allocate memmap above
    4G.

    Signed-off-by: Zou Nan hai
    Acked-by: Suresh Siddha
    Cc: Andi Kleen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zou Nan hai
     

03 May, 2007

1 commit

  • Enable system hashtable memory to be distributed among nodes on x86_64 NUMA

    Forcing the kernel to use node interleaved vmalloc instead of bootmem for
    the system hashtable memory (alloc_large_system_hash) reduces the memory
    imbalance on node 0 by around 40MB on a 8 node x86_64 NUMA box:

    Before the following patch, on bootup of a 8 node box:

    Node 0 MemTotal: 3407488 kB
    Node 0 MemFree: 3206296 kB
    Node 0 MemUsed: 201192 kB
    Node 0 Active: 7012 kB
    Node 0 Inactive: 512 kB
    Node 0 Dirty: 0 kB
    Node 0 Writeback: 0 kB
    Node 0 FilePages: 1912 kB
    Node 0 Mapped: 420 kB
    Node 0 AnonPages: 5612 kB
    Node 0 PageTables: 468 kB
    Node 0 NFS_Unstable: 0 kB
    Node 0 Bounce: 0 kB
    Node 0 Slab: 5408 kB
    Node 0 SReclaimable: 644 kB
    Node 0 SUnreclaim: 4764 kB

    After the patch (or using hashdist=1 on the kernel command line):

    Node 0 MemTotal: 3407488 kB
    Node 0 MemFree: 3247608 kB
    Node 0 MemUsed: 159880 kB
    Node 0 Active: 3012 kB
    Node 0 Inactive: 616 kB
    Node 0 Dirty: 0 kB
    Node 0 Writeback: 0 kB
    Node 0 FilePages: 2424 kB
    Node 0 Mapped: 380 kB
    Node 0 AnonPages: 1200 kB
    Node 0 PageTables: 396 kB
    Node 0 NFS_Unstable: 0 kB
    Node 0 Bounce: 0 kB
    Node 0 Slab: 6304 kB
    Node 0 SReclaimable: 1596 kB
    Node 0 SUnreclaim: 4708 kB

    I guess it is a good idea to keep HASHDIST_DEFAULT "on" for x86_64 NUMA
    since x86_64 has no dearth of vmalloc space? Or maybe enable hash
    distribution for all 64bit NUMA arches? The following patch does it only
    for x86_64.

    I ran a HPC MPI benchmark -- 'Ansys wingsolid', which takes up quite a bit of
    memory and uses up tlb entries. This was on a 4 way, 2 socket
    Tyan AMD box (non vsmp), with 8G total memory (4G pernode).

    The results with and without hash distribution are:

    1. Vanilla - runtime of 1188.000s
    2. With hashdist=1 runtime of 1154.000s

    Oprofile output for the duration of run is:

    1. Vanilla:
    PU: AMD64 processors, speed 2411.16 MHz (estimated)
    Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit
    mask of 0x00 (No unit mask) count 500
    samples % app name symbol name
    163054 6.5513 libansys1.so MultiFront::decompose(int, int,
    Elemset *, int *, int, int, int)
    162061 6.5114 libansys3.so blockSaxpy6L_fd
    162042 6.5107 libansys3.so blockInnerProduct6L_fd
    156286 6.2794 libansys3.so maxb33_
    87879 3.5309 libansys1.so elmatrixmultpcg_
    84857 3.4095 libansys4.so saxpy_pcg
    58637 2.3560 libansys4.so .st4560
    46612 1.8728 libansys4.so .st4282
    43043 1.7294 vmlinux-t copy_user_generic_string
    41326 1.6604 libansys3.so blockSaxpyBackSolve6L_fd
    41288 1.6589 libansys3.so blockInnerProductBackSolve6L_fd

    2. With hashdist=1
    CPU: AMD64 processors, speed 2411.13 MHz (estimated)
    Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit
    mask of 0x00 (No unit mask) count 500
    samples % app name symbol name
    162993 6.9814 libansys1.so MultiFront::decompose(int, int,
    Elemset *, int *, int, int, int)
    160799 6.8874 libansys3.so blockInnerProduct6L_fd
    160459 6.8729 libansys3.so blockSaxpy6L_fd
    156018 6.6826 libansys3.so maxb33_
    84700 3.6279 libansys4.so saxpy_pcg
    83434 3.5737 libansys1.so elmatrixmultpcg_
    58074 2.4875 libansys4.so .st4560
    46000 1.9703 libansys4.so .st4282
    41166 1.7632 libansys3.so blockSaxpyBackSolve6L_fd
    41033 1.7575 libansys3.so blockInnerProductBackSolve6L_fd
    35762 1.5318 libansys1.so inner_product_sub
    35591 1.5245 libansys1.so inner_product_sub2
    28259 1.2104 libansys4.so addVectors

    Signed-off-by: Pravin B. Shelar
    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Shai Fultheim
    Signed-off-by: Andi Kleen
    Acked-by: Christoph Lameter
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Ravikiran G Thirumalai
     

23 Mar, 2007

1 commit

  • Fix unannotated variable declarations. Variables that have allocation
    section annotations (such as __meminitdata) on their definitions must also
    have them on their declarations as not doing so may affect the addressing
    mode used by the compiler and may result in a linker error.

    Signed-off-by: David Howells
    Acked-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

08 Dec, 2006

1 commit


26 Sep, 2006

5 commits


23 Sep, 2006

1 commit

  • The grow algorithm is simple, we grow if:

    1) we see a hash chain collision at insert, and
    2) we haven't hit the hash size limit (currently 1*1024*1024 slots), and
    3) the number of xfrm_state objects is > the current hash mask

    All of this needs some tweaking.

    Remove __initdata from "hashdist" so we can use it safely at run time.

    Signed-off-by: David S. Miller

    David S. Miller
     

11 Jul, 2006

1 commit

  • Fix some FRV arch compile errors, including:

    (*) Marking nr_kernel_pages as __meminitdata so that references to it end up
    being properly calculated rather than being assumed to be in the small
    data section (and thus calculated wrt the GP register). Not doing this
    causes the linker to emit errors as the offset is too big to fit into the
    load instruction.

    (*) Move pm_power_off into an unconditionally compiled .c file as it's now
    unconditionally accessed.

    (*) Declare frv_change_cmode() in a header file rather than in a .c file, and
    declare it asmlinkage.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

23 Jun, 2006

1 commit


10 Apr, 2006

1 commit

  • The node setup code would try to allocate the node metadata in the node
    itself, but that fails if there is no memory in there.

    This can happen with memory hotplug when the hotplug area defines an so
    far empty node.

    Now use bootmem to try to allocate the mem_map in other nodes.

    And if it fails don't panic, but just ignore the node.

    To make this work I added a new __alloc_bootmem_nopanic function that
    does what its name implies.

    TBD should try to use nearby nodes here. Currently we just use any.
    It's hard to do it better because bootmem doesn't have proper fallback
    lists yet.

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

28 Mar, 2006

1 commit

  • Add a list_head to bootmem_data_t and make bootmems use it. bootmem list is
    sorted by node_boot_start.

    Only nodes against which init_bootmem() is called are linked to the list.
    (i386 allocates bootmem only from one node(0) not from all online nodes.)

    A summary:
    1. for_each_online_pgdat() traverses all *online* nodes.
    2. alloc_bootmem() allocates memory only from initialized-for-bootmem nodes.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

26 Mar, 2006

1 commit


07 Jan, 2006

1 commit


20 Oct, 2005

1 commit

  • This introduces a limit parameter to the core bootmem allocator; The new
    parameter indicates that physical memory allocated by the bootmem
    allocator should be within the requested limit.

    We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
    alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
    is the only api used for swiotlb.

    The existing alloc_bootmem_low_pages() api could instead have been
    changed and made to pass right limit to the core allocator. But that
    would make the patch more intrusive for 2.6.14, as other arches use
    alloc_bootmem_low_pages(). We may be done that post 2.6.14 as a
    cleanup.

    With this, swiotlb gets memory within 4G for both x86_64 and ia64
    arches.

    Signed-off-by: Yasunori Goto
    Cc: Ravikiran G Thirumalai
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

26 Jun, 2005

1 commit

  • This patch retrieves the max_pfn being used by previous kernel and stores it
    in a safe location (saved_max_pfn) before it is overwritten due to user
    defined memory map. This pfn is used to make sure that user does not try to
    read the physical memory beyond saved_max_pfn.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     

24 Jun, 2005

1 commit

  • Introduce a simple allocator for the NUMA remap space. This space is very
    scarce, used for structures which are best allocated node local.

    This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep
    the pgdat->node_mem_map initialized in a consistent place for all
    architectures.

    Issues:
    o alloc_remap takes a node_id where we might expect a pgdat which was intended
    to allow us to allocate the pgdat's using this mechanism; which we do not yet
    do. Could have alloc_remap_node() and alloc_remap_nid() for this purpose.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds