01 Jun, 2007

1 commit

  • On systems with huge amount of physical memory, VFS cache and memory memmap
    may eat all available system memory under 4G, then the system may fail to
    allocate swiotlb bounce buffer.

    There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
    not cover sparsemem model.

    This patch add fix to sparsemem model by first try to allocate memmap above
    4G.

    Signed-off-by: Zou Nan hai
    Acked-by: Suresh Siddha
    Cc: Andi Kleen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zou Nan hai
     

03 May, 2007

1 commit

  • Enable system hashtable memory to be distributed among nodes on x86_64 NUMA

    Forcing the kernel to use node interleaved vmalloc instead of bootmem for
    the system hashtable memory (alloc_large_system_hash) reduces the memory
    imbalance on node 0 by around 40MB on a 8 node x86_64 NUMA box:

    Before the following patch, on bootup of a 8 node box:

    Node 0 MemTotal: 3407488 kB
    Node 0 MemFree: 3206296 kB
    Node 0 MemUsed: 201192 kB
    Node 0 Active: 7012 kB
    Node 0 Inactive: 512 kB
    Node 0 Dirty: 0 kB
    Node 0 Writeback: 0 kB
    Node 0 FilePages: 1912 kB
    Node 0 Mapped: 420 kB
    Node 0 AnonPages: 5612 kB
    Node 0 PageTables: 468 kB
    Node 0 NFS_Unstable: 0 kB
    Node 0 Bounce: 0 kB
    Node 0 Slab: 5408 kB
    Node 0 SReclaimable: 644 kB
    Node 0 SUnreclaim: 4764 kB

    After the patch (or using hashdist=1 on the kernel command line):

    Node 0 MemTotal: 3407488 kB
    Node 0 MemFree: 3247608 kB
    Node 0 MemUsed: 159880 kB
    Node 0 Active: 3012 kB
    Node 0 Inactive: 616 kB
    Node 0 Dirty: 0 kB
    Node 0 Writeback: 0 kB
    Node 0 FilePages: 2424 kB
    Node 0 Mapped: 380 kB
    Node 0 AnonPages: 1200 kB
    Node 0 PageTables: 396 kB
    Node 0 NFS_Unstable: 0 kB
    Node 0 Bounce: 0 kB
    Node 0 Slab: 6304 kB
    Node 0 SReclaimable: 1596 kB
    Node 0 SUnreclaim: 4708 kB

    I guess it is a good idea to keep HASHDIST_DEFAULT "on" for x86_64 NUMA
    since x86_64 has no dearth of vmalloc space? Or maybe enable hash
    distribution for all 64bit NUMA arches? The following patch does it only
    for x86_64.

    I ran a HPC MPI benchmark -- 'Ansys wingsolid', which takes up quite a bit of
    memory and uses up tlb entries. This was on a 4 way, 2 socket
    Tyan AMD box (non vsmp), with 8G total memory (4G pernode).

    The results with and without hash distribution are:

    1. Vanilla - runtime of 1188.000s
    2. With hashdist=1 runtime of 1154.000s

    Oprofile output for the duration of run is:

    1. Vanilla:
    PU: AMD64 processors, speed 2411.16 MHz (estimated)
    Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit
    mask of 0x00 (No unit mask) count 500
    samples % app name symbol name
    163054 6.5513 libansys1.so MultiFront::decompose(int, int,
    Elemset *, int *, int, int, int)
    162061 6.5114 libansys3.so blockSaxpy6L_fd
    162042 6.5107 libansys3.so blockInnerProduct6L_fd
    156286 6.2794 libansys3.so maxb33_
    87879 3.5309 libansys1.so elmatrixmultpcg_
    84857 3.4095 libansys4.so saxpy_pcg
    58637 2.3560 libansys4.so .st4560
    46612 1.8728 libansys4.so .st4282
    43043 1.7294 vmlinux-t copy_user_generic_string
    41326 1.6604 libansys3.so blockSaxpyBackSolve6L_fd
    41288 1.6589 libansys3.so blockInnerProductBackSolve6L_fd

    2. With hashdist=1
    CPU: AMD64 processors, speed 2411.13 MHz (estimated)
    Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit
    mask of 0x00 (No unit mask) count 500
    samples % app name symbol name
    162993 6.9814 libansys1.so MultiFront::decompose(int, int,
    Elemset *, int *, int, int, int)
    160799 6.8874 libansys3.so blockInnerProduct6L_fd
    160459 6.8729 libansys3.so blockSaxpy6L_fd
    156018 6.6826 libansys3.so maxb33_
    84700 3.6279 libansys4.so saxpy_pcg
    83434 3.5737 libansys1.so elmatrixmultpcg_
    58074 2.4875 libansys4.so .st4560
    46000 1.9703 libansys4.so .st4282
    41166 1.7632 libansys3.so blockSaxpyBackSolve6L_fd
    41033 1.7575 libansys3.so blockInnerProductBackSolve6L_fd
    35762 1.5318 libansys1.so inner_product_sub
    35591 1.5245 libansys1.so inner_product_sub2
    28259 1.2104 libansys4.so addVectors

    Signed-off-by: Pravin B. Shelar
    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Shai Fultheim
    Signed-off-by: Andi Kleen
    Acked-by: Christoph Lameter
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Ravikiran G Thirumalai
     

23 Mar, 2007

1 commit

  • Fix unannotated variable declarations. Variables that have allocation
    section annotations (such as __meminitdata) on their definitions must also
    have them on their declarations as not doing so may affect the addressing
    mode used by the compiler and may result in a linker error.

    Signed-off-by: David Howells
    Acked-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

08 Dec, 2006

1 commit


26 Sep, 2006

5 commits


23 Sep, 2006

1 commit

  • The grow algorithm is simple, we grow if:

    1) we see a hash chain collision at insert, and
    2) we haven't hit the hash size limit (currently 1*1024*1024 slots), and
    3) the number of xfrm_state objects is > the current hash mask

    All of this needs some tweaking.

    Remove __initdata from "hashdist" so we can use it safely at run time.

    Signed-off-by: David S. Miller

    David S. Miller
     

11 Jul, 2006

1 commit

  • Fix some FRV arch compile errors, including:

    (*) Marking nr_kernel_pages as __meminitdata so that references to it end up
    being properly calculated rather than being assumed to be in the small
    data section (and thus calculated wrt the GP register). Not doing this
    causes the linker to emit errors as the offset is too big to fit into the
    load instruction.

    (*) Move pm_power_off into an unconditionally compiled .c file as it's now
    unconditionally accessed.

    (*) Declare frv_change_cmode() in a header file rather than in a .c file, and
    declare it asmlinkage.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

23 Jun, 2006

1 commit


10 Apr, 2006

1 commit

  • The node setup code would try to allocate the node metadata in the node
    itself, but that fails if there is no memory in there.

    This can happen with memory hotplug when the hotplug area defines an so
    far empty node.

    Now use bootmem to try to allocate the mem_map in other nodes.

    And if it fails don't panic, but just ignore the node.

    To make this work I added a new __alloc_bootmem_nopanic function that
    does what its name implies.

    TBD should try to use nearby nodes here. Currently we just use any.
    It's hard to do it better because bootmem doesn't have proper fallback
    lists yet.

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

28 Mar, 2006

1 commit

  • Add a list_head to bootmem_data_t and make bootmems use it. bootmem list is
    sorted by node_boot_start.

    Only nodes against which init_bootmem() is called are linked to the list.
    (i386 allocates bootmem only from one node(0) not from all online nodes.)

    A summary:
    1. for_each_online_pgdat() traverses all *online* nodes.
    2. alloc_bootmem() allocates memory only from initialized-for-bootmem nodes.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

26 Mar, 2006

1 commit


07 Jan, 2006

1 commit


20 Oct, 2005

1 commit

  • This introduces a limit parameter to the core bootmem allocator; The new
    parameter indicates that physical memory allocated by the bootmem
    allocator should be within the requested limit.

    We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
    alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
    is the only api used for swiotlb.

    The existing alloc_bootmem_low_pages() api could instead have been
    changed and made to pass right limit to the core allocator. But that
    would make the patch more intrusive for 2.6.14, as other arches use
    alloc_bootmem_low_pages(). We may be done that post 2.6.14 as a
    cleanup.

    With this, swiotlb gets memory within 4G for both x86_64 and ia64
    arches.

    Signed-off-by: Yasunori Goto
    Cc: Ravikiran G Thirumalai
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

26 Jun, 2005

1 commit

  • This patch retrieves the max_pfn being used by previous kernel and stores it
    in a safe location (saved_max_pfn) before it is overwritten due to user
    defined memory map. This pfn is used to make sure that user does not try to
    read the physical memory beyond saved_max_pfn.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     

24 Jun, 2005

1 commit

  • Introduce a simple allocator for the NUMA remap space. This space is very
    scarce, used for structures which are best allocated node local.

    This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep
    the pgdat->node_mem_map initialized in a consistent place for all
    architectures.

    Issues:
    o alloc_remap takes a node_id where we might expect a pgdat which was intended
    to allow us to allocate the pgdat's using this mechanism; which we do not yet
    do. Could have alloc_remap_node() and alloc_remap_nid() for this purpose.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds