23 Jul, 2007

1 commit

  • Fix following warning:
    WARNING: vmlinux.o(.text+0x188ea): Section mismatch: reference to .init.text:__alloc_bootmem_core (between 'alloc_bootmem_high_node' and 'get_gate_vma')

    alloc_bootmem_high_node() is only used from __init scope so declare it __init.
    And in addition declare the weak variant __init too.

    Signed-off-by: Sam Ravnborg
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Sam Ravnborg
     

09 Jun, 2007

1 commit


01 Jun, 2007

1 commit

  • On systems with huge amount of physical memory, VFS cache and memory memmap
    may eat all available system memory under 4G, then the system may fail to
    allocate swiotlb bounce buffer.

    There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
    not cover sparsemem model.

    This patch add fix to sparsemem model by first try to allocate memmap above
    4G.

    Signed-off-by: Zou Nan hai
    Acked-by: Suresh Siddha
    Cc: Andi Kleen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zou Nan hai
     

19 May, 2007

1 commit

  • modpost had two cases hardcoded for mm/
    Shift over to __init_refok and kill the
    hardcoded function names in modpost.

    This has the drawback that the functions
    will always be kept no matter configuration.
    With previous code the function were placed in
    init section if configuration allowed it.

    Signed-off-by: Sam Ravnborg

    Sam Ravnborg
     

09 May, 2007

2 commits

  • This patch is add white list into modpost.c for some functions and
    ia64's section to fix section mismatchs.

    sparse_index_alloc() and zone_wait_table_init() calls bootmem allocator
    at boot time, and kmalloc/vmalloc at hotplug time. If config
    memory hotplug is on, there are references of bootmem allocater(init text)
    from them (normal text). This is cause of section mismatch.

    Bootmem is called by many functions and it must be
    used only at boot time. I think __init of them should keep for
    section mismatch check. So, I would like to register sparse_index_alloc()
    and zone_wait_table_init() into white list.

    In addition, ia64's .machvec section is function table of some platform
    dependent code. It is mixture of .init.text and normal text. These
    reference of __init functions are valid too.

    Signed-off-by: Yasunori Goto
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • This is to fix many section mismatches of code related to memory hotplug.
    I checked compile with memory hotplug on/off on ia64 and x86-64 box.

    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

07 May, 2007

1 commit


08 Dec, 2006

2 commits

  • NUMA node ids are passed as either int or unsigned int almost exclusivly
    page_to_nid and zone_to_nid both return unsigned long. This is a throw
    back to when page_to_nid was a #define and was thus exposing the real type
    of the page flags field.

    In addition to fixing up the definitions of page_to_nid and zone_to_nid I
    audited the users of these functions identifying the following incorrect
    uses:

    1) mm/page_alloc.c show_node() -- printk dumping the node id,
    2) include/asm-ia64/pgalloc.h pgtable_quicklist_free() -- comparison
    against numa_node_id() which returns an int from cpu_to_node(), and
    3) mm/mpolicy.c check_pte_range -- used as an index in node_isset which
    uses bit_set which in generic code takes an int.

    Signed-off-by: Andy Whitcroft
    Cc: Christoph Lameter
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • The zone table is mostly not needed. If we have a node in the page flags
    then we can get to the zone via NODE_DATA() which is much more likely to be
    already in the cpu cache.

    In case of SMP and UP NODE_DATA() is a constant pointer which allows us to
    access an exact replica of zonetable in the node_zones field. In all of
    the above cases there will be no need at all for the zone table.

    The only remaining case is if in a NUMA system the node numbers do not fit
    into the page flags. In that case we make sparse generate a table that
    maps sections to nodes and use that table to to figure out the node number.
    This table is sized to fit in a single cache line for the known 32 bit
    NUMA platform which makes it very likely that the information can be
    obtained without a cache miss.

    For sparsemem the zone table seems to be have been fairly large based on
    the maximum possible number of sections and the number of zones per node.
    There is some memory saving by removing zone_table. The main benefit is to
    reduce the cache foootprint of the VM from the frequent lookups of zones.
    Plus it simplifies the page allocator.

    [akpm@osdl.org: build fix]
    Signed-off-by: Christoph Lameter
    Cc: Dave Hansen
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

29 Oct, 2006

1 commit

  • Add __GFP_NOWARN flag to calling of __alloc_pages() in
    __kmalloc_section_memmap(). It can reduce noisy failure message.

    In ia64, section size is 1 GB, this means that order 8 pages are necessary
    for each section's memmap. It is often very hard requirement under heavy
    memory pressure as you know. So, __alloc_pages() gives up allocation and
    shows many noisy stack traces which means no page for each sections.
    (Current my environment shows 32 times of stack trace....)

    But, __kmalloc_section_memmap() calls vmalloc() after failure of it, and it
    can succeed allocation of memmap. So, its stack trace warning becomes just
    noisy. I suppose it shouldn't be shown.

    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

01 Jul, 2006

1 commit


28 Jun, 2006

1 commit

  • locking init cleanups:

    - convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
    - convert rwlocks in a similar manner

    this patch was generated automatically.

    Motivation:

    - cleanliness
    - lockdep needs control of lock initialization, which the open-coded
    variants do not give
    - it's also useful for -rt and for lock debugging in general

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

23 Jun, 2006

1 commit

  • Record the node id as we mark sections for instantiation. Use this nid
    during instantiation to direct allocations.

    Signed-off-by: Andy Whitcroft
    Cc: Mike Kravetz
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Bob Picco
    Cc: Jack Steiner
    Cc: Yasunori Goto
    Cc: Martin Bligh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     

22 May, 2006

1 commit

  • A bad calculation/loop in __section_nr() could result in incorrect section
    information being put into sysfs memory entries. This primarily impacts
    memory add operations as the sysfs information is used while onlining new
    memory.

    Fix suggested by Dave Hansen.

    Note that the bug may not be obvious from the patch. It actually occurs in
    the function's return statement:

    return (root_nr * SECTIONS_PER_ROOT) + (ms - root);

    In the existing code, root_nr has already been multiplied by
    SECTIONS_PER_ROOT.

    Signed-off-by: Mike Kravetz
    Cc: Dave Hansen
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

16 May, 2006

1 commit


02 May, 2006

1 commit

  • This patch fixes two bugs with the way sparsemem interacts with memory add.
    They are:

    - memory leak if memmap for section already exists

    - calling alloc_bootmem_node() after boot

    These bugs were discovered and a first cut at the fixes were provided by
    Arnd Bergmann and Joel Schopp .

    Signed-off-by: Mike Kravetz
    Signed-off-by: Joel Schopp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

09 Jan, 2006

1 commit

  • ____cacheline_maxaligned_in_smp is currently used to align critical structures
    and avoid false sharing. It uses per-arch L1_CACHE_SHIFT_MAX and people find
    L1_CACHE_SHIFT_MAX useless.

    However, we have been using ____cacheline_maxaligned_in_smp to align
    structures on the internode cacheline size. As per Andi's suggestion,
    following patch kills ____cacheline_maxaligned_in_smp and introduces
    INTERNODE_CACHE_SHIFT, which defaults to L1_CACHE_SHIFT for all arches.
    Arches needing L3/Internode cacheline alignment can define
    INTERNODE_CACHE_SHIFT in the arch asm/cache.h. Patch replaces
    ____cacheline_maxaligned_in_smp with ____cacheline_internodealigned_in_smp

    With this patch, L1_CACHE_SHIFT_MAX can be killed

    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Shai Fultheim
    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ravikiran G Thirumalai
     

30 Oct, 2005

2 commits


05 Sep, 2005

3 commits

  • This splits up sparse_index_alloc() into two pieces. This is needed
    because we'll allocate the memory for the second level in a different place
    from where we actually consume it to keep the allocation from happening
    underneath a lock

    Signed-off-by: Dave Hansen
    Signed-off-by: Bob Picco
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • With cleanups from Dave Hansen

    SPARSEMEM_EXTREME makes mem_section a one dimensional array of pointers to
    mem_sections. This two level layout scheme is able to achieve smaller
    memory requirements for SPARSEMEM with the tradeoff of an additional shift
    and load when fetching the memory section. The current SPARSEMEM
    implementation is a one dimensional array of mem_sections which is the
    default SPARSEMEM configuration. The patch attempts isolates the
    implementation details of the physical layout of the sparsemem section
    array.

    SPARSEMEM_EXTREME requires bootmem to be functioning at the time of
    memory_present() calls. This is not always feasible, so architectures
    which do not need it may allocate everything statically by using
    SPARSEMEM_STATIC.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Bob Picco
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Picco
     
  • A new option for SPARSEMEM is ARCH_SPARSEMEM_EXTREME. Architecture
    platforms with a very sparse physical address space would likely want to
    select this option. For those architecture platforms that don't select the
    option, the code generated is equivalent to SPARSEMEM currently in -mm.
    I'll be posting a patch on ia64 ml which uses this new SPARSEMEM feature.

    ARCH_SPARSEMEM_EXTREME makes mem_section a one dimensional array of
    pointers to mem_sections. This two level layout scheme is able to achieve
    smaller memory requirements for SPARSEMEM with the tradeoff of an
    additional shift and load when fetching the memory section. The current
    SPARSEMEM -mm implementation is a one dimensional array of mem_sections
    which is the default SPARSEMEM configuration. The patch attempts isolates
    the implementation details of the physical layout of the sparsemem section
    array.

    ARCH_SPARSEMEM_EXTREME depends on 64BIT and is by default boolean false.

    I've boot tested under aim load ia64 configured for ARCH_SPARSEMEM_EXTREME.
    I've also boot tested a 4 way Opteron machine with !ARCH_SPARSEMEM_EXTREME
    and tested with aim.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Bob Picco
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Picco
     

24 Jun, 2005

2 commits

  • Make sparse's initalization be accessible at runtime. This allows sparse
    mappings to be created after boot in a hotplug situation.

    This patch is separated from the previous one just to give an indication how
    much of the sparse infrastructure is *just* for hotplug memory.

    The section_mem_map doesn't really store a pointer. It stores something that
    is convenient to do some math against to get a pointer. It isn't valid to
    just do *section_mem_map, so I don't think it should be stored as a pointer.

    There are a couple of things I'd like to store about a section. First of all,
    the fact that it is !NULL does not mean that it is present. There could be
    such a combination where section_mem_map *is* NULL, but the math gets you
    properly to a real mem_map. So, I don't think that check is safe.

    Since we're storing 32-bit-aligned structures, we have a few bits in the
    bottom of the pointer to play with. Use one bit to encode whether there's
    really a mem_map there, and the other one to tell whether there's a valid
    section there. We need to distinguish between the two because sometimes
    there's a gap between when a section is discovered to be present and when we
    can get the mem_map for it.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andy Whitcroft
    Signed-off-by: Jack Steiner
    Signed-off-by: Bob Picco
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
    mem_map[] is needed by discontiguous memory machines (like in the old
    CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
    replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
    become a complete replacement.

    A significant advantage over DISCONTIGMEM is that it's completely separated
    from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
    and DISCONTIG are often confused.

    Another advantage is that sparse doesn't require each NUMA node's ranges to be
    contiguous. It can handle overlapping ranges between nodes with no problems,
    where DISCONTIGMEM currently throws away that memory.

    Sparsemem uses an array to provide different pfn_to_page() translations for
    each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
    to be chopped up.

    In order to do quick pfn_to_page() operations, the section number of the page
    is encoded in page->flags. Part of the sparsemem infrastructure enables
    sharing of these bits more dynamically (at compile-time) between the
    page_zone() and sparsemem operations. However, on 32-bit architectures, the
    number of bits is quite limited, and may require growing the size of the
    page->flags type in certain conditions. Several things might force this to
    occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
    memory), an increase in the physical address space, or an increase in the
    number of used page->flags.

    One thing to note is that, once sparsemem is present, the NUMA node
    information no longer needs to be stored in the page->flags. It might provide
    speed increases on certain platforms and will be stored there if there is
    room. But, if out of room, an alternate (theoretically slower) mechanism is
    used.

    This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
    there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
    often have to compile out the same areas of code.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Martin Bligh
    Signed-off-by: Adrian Bunk
    Signed-off-by: Yasunori Goto
    Signed-off-by: Bob Picco
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft