06 Feb, 2008

2 commits

  • Fix following warning:
    WARNING: mm/built-in.o(.text+0x22069): Section mismatch in reference from the function sparse_early_usemap_alloc() to the function .init.text:__alloc_bootmem_node()

    static sparse_early_usemap_alloc() were used only by sparse_init()
    and with sparse_init() annotated _init it is safe to
    annotate sparse_early_usemap_alloc with __init too.

    Signed-off-by: Sam Ravnborg
    Cc: Andy Whitcroft
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sam Ravnborg
     
  • Checking if an address is a vmalloc address is done in a couple of places.
    Define a common version in mm.h and replace the other checks.

    Again the include structures suck. The definition of VMALLOC_START and
    VMALLOC_END is not available in vmalloc.h since highmem.c cannot be included
    there.

    Signed-off-by: Christoph Lameter
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

18 Dec, 2007

2 commits

  • Improve the error handling for mm/sparse.c::sparse_add_one_section(). And I
    see no reason to check 'usemap' until holding the 'pgdat_resize_lock'.

    [geoffrey.levand@am.sony.com: sparse_index_init() returns -EEXIST]
    Cc: Christoph Lameter
    Acked-by: Dave Hansen
    Cc: Rik van Riel
    Acked-by: Yasunori Goto
    Cc: Andy Whitcroft
    Signed-off-by: WANG Cong
    Signed-off-by: Geoff Levand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Cong
     
  • Since sparse_index_alloc() can return NULL on memory allocation failure,
    we must deal with the failure condition when calling it.

    Signed-off-by: WANG Cong
    Cc: Christoph Lameter
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Cong
     

30 Oct, 2007

1 commit

  • This reverts commit 2e1c49db4c640b35df13889b86b9d62215ade4b6.

    First off, testing in Fedora has shown it to cause boot failures,
    bisected down by Martin Ebourne, and reported by Dave Jobes. So the
    commit will likely be reverted in the 2.6.23 stable kernels.

    Secondly, in the 2.6.24 model, x86-64 has now grown support for
    SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the
    bug is not visible any more, it's become invisible due to the code just
    being irrelevant and no longer enabled on the only architecture that
    this ever affected.

    Reported-by: Dave Jones
    Tested-by: Martin Ebourne
    Cc: Zou Nan hai
    Cc: Suresh Siddha
    Cc: Andrew Morton
    Acked-by: Andy Whitcroft
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

17 Oct, 2007

5 commits

  • This patch is to avoid panic when memory hot-add is executed with
    sparsemem-vmemmap. Current vmemmap-sparsemem code doesn't support memory
    hot-add. Vmemmap must be populated when hot-add. This is for
    2.6.23-rc2-mm2.

    Todo: # Even if this patch is applied, the message "[xxxx-xxxx] potential
    offnode page_structs" is displayed. To allocate memmap on its node,
    memmap (and pgdat) must be initialized itself like chicken and
    egg relationship.

    # vmemmap_unpopulate will be necessary for followings.
    - For cancel hot-add due to error.
    - For unplug.

    Signed-off-by: Yasunori Goto
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • There are problems in the use of SPARSEMEM and pageblock flags that causes
    problems on ia64.

    The first part of the problem is that units are incorrect in
    SECTION_BLOCKFLAGS_BITS computation. This results in a map_section's
    section_mem_map being treated as part of a bitmap which isn't good. This
    was evident with an invalid virtual address when mem_init attempted to free
    bootmem pages while relinquishing control from the bootmem allocator.

    The second part of the problem occurs because the pageblock flags bitmap is
    be located with the mem_section. The SECTIONS_PER_ROOT computation using
    sizeof (mem_section) may not be a power of 2 depending on the size of the
    bitmap. This renders masks and other such things not power of 2 base.
    This issue was seen with SPARSEMEM_EXTREME on ia64. This patch moves the
    bitmap outside of mem_section and uses a pointer instead in the
    mem_section. The bitmaps are allocated when the section is being
    initialised.

    Note that sparse_early_usemap_alloc() does not use alloc_remap() like
    sparse_early_mem_map_alloc(). The allocation required for the bitmap on
    x86, the only architecture that uses alloc_remap is typically smaller than
    a cache line. alloc_remap() pads out allocations to the cache size which
    would be a needless waste.

    Credit to Bob Picco for identifying the original problem and effecting a
    fix for the SECTION_BLOCKFLAGS_BITS calculation. Credit to Andy Whitcroft
    for devising the best way of allocating the bitmaps only when required for
    the section.

    [wli@holomorphy.com: warning fix]
    Signed-off-by: Bob Picco
    Signed-off-by: Andy Whitcroft
    Signed-off-by: Mel Gorman
    Cc: "Luck, Tony"
    Signed-off-by: William Irwin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • SPARSEMEM is a pretty nice framework that unifies quite a bit of code over all
    the arches. It would be great if it could be the default so that we can get
    rid of various forms of DISCONTIG and other variations on memory maps. So far
    what has hindered this are the additional lookups that SPARSEMEM introduces
    for virt_to_page and page_address. This goes so far that the code to do this
    has to be kept in a separate function and cannot be used inline.

    This patch introduces a virtual memmap mode for SPARSEMEM, in which the memmap
    is mapped into a virtually contigious area, only the active sections are
    physically backed. This allows virt_to_page page_address and cohorts become
    simple shift/add operations. No page flag fields, no table lookups, nothing
    involving memory is required.

    The two key operations pfn_to_page and page_to_page become:

    #define __pfn_to_page(pfn) (vmemmap + (pfn))
    #define __page_to_pfn(page) ((page) - vmemmap)

    By having a virtual mapping for the memmap we allow simple access without
    wasting physical memory. As kernel memory is typically already mapped 1:1
    this introduces no additional overhead. The virtual mapping must be big
    enough to allow a struct page to be allocated and mapped for all valid
    physical pages. This vill make a virtual memmap difficult to use on 32 bit
    platforms that support 36 address bits.

    However, if there is enough virtual space available and the arch already maps
    its 1-1 kernel space using TLBs (f.e. true of IA64 and x86_64) then this
    technique makes SPARSEMEM lookups even more efficient than CONFIG_FLATMEM.
    FLATMEM needs to read the contents of the mem_map variable to get the start of
    the memmap and then add the offset to the required entry. vmemmap is a
    constant to which we can simply add the offset.

    This patch has the potential to allow us to make SPARSMEM the default (and
    even the only) option for most systems. It should be optimal on UP, SMP and
    NUMA on most platforms. Then we may even be able to remove the other memory
    models: FLATMEM, DISCONTIG etc.

    [apw@shadowen.org: config cleanups, resplit code etc]
    [kamezawa.hiroyu@jp.fujitsu.com: Fix sparsemem_vmemmap init]
    [apw@shadowen.org: vmemmap: remove excess debugging]
    [apw@shadowen.org: simplify initialisation code and reduce duplication]
    [apw@shadowen.org: pull out the vmemmap code into its own file]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andy Whitcroft
    Acked-by: Mel Gorman
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Cc: "David S. Miller"
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • We have flags to indicate whether a section actually has a valid mem_map
    associated with it. This is never set and we rely solely on the present bit
    to indicate a section is valid. By definition a section is not valid if it
    has no mem_map and there is a window during init where the present bit is set
    but there is no mem_map, during which pfn_valid() will return true
    incorrectly.

    Use the existing SECTION_HAS_MEM_MAP flag to indicate the presence of a valid
    mem_map. Switch valid_section{,_nr} and pfn_valid() to this bit. Add a new
    present_section{,_nr} and pfn_present() interfaces for those users who care to
    know that a section is going to be valid.

    [akpm@linux-foundation.org: coding-syle fixes]
    Signed-off-by: Andy Whitcroft
    Acked-by: Mel Gorman
    Cc: Christoph Lameter
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Cc: "David S. Miller"
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • SPARSEMEM is a pretty nice framework that unifies quite a bit of code over all
    the arches. It would be great if it could be the default so that we can get
    rid of various forms of DISCONTIG and other variations on memory maps. So far
    what has hindered this are the additional lookups that SPARSEMEM introduces
    for virt_to_page and page_address. This goes so far that the code to do this
    has to be kept in a separate function and cannot be used inline.

    This patch introduces a virtual memmap mode for SPARSEMEM, in which the memmap
    is mapped into a virtually contigious area, only the active sections are
    physically backed. This allows virt_to_page page_address and cohorts become
    simple shift/add operations. No page flag fields, no table lookups, nothing
    involving memory is required.

    The two key operations pfn_to_page and page_to_page become:

    #define __pfn_to_page(pfn) (vmemmap + (pfn))
    #define __page_to_pfn(page) ((page) - vmemmap)

    By having a virtual mapping for the memmap we allow simple access without
    wasting physical memory. As kernel memory is typically already mapped 1:1
    this introduces no additional overhead. The virtual mapping must be big
    enough to allow a struct page to be allocated and mapped for all valid
    physical pages. This vill make a virtual memmap difficult to use on 32 bit
    platforms that support 36 address bits.

    However, if there is enough virtual space available and the arch already maps
    its 1-1 kernel space using TLBs (f.e. true of IA64 and x86_64) then this
    technique makes SPARSEMEM lookups even more efficient than CONFIG_FLATMEM.
    FLATMEM needs to read the contents of the mem_map variable to get the start of
    the memmap and then add the offset to the required entry. vmemmap is a
    constant to which we can simply add the offset.

    This patch has the potential to allow us to make SPARSMEM the default (and
    even the only) option for most systems. It should be optimal on UP, SMP and
    NUMA on most platforms. Then we may even be able to remove the other memory
    models: FLATMEM, DISCONTIG etc.

    The current aim is to bring a common virtually mapped mem_map to all
    architectures. This should facilitate the removal of the bespoke
    implementations from the architectures. This also brings performance
    improvements for most architecture making sparsmem vmemmap the more desirable
    memory model. The ultimate aim of this work is to expand sparsemem support to
    encompass all the features of the other memory models. This could allow us to
    drop support for and remove the other models in the longer term.

    Below are some comparitive kernbench numbers for various architectures,
    comparing default memory model against SPARSEMEM VMEMMAP. All but ia64 show
    marginal improvement; we expect the ia64 figures to be sorted out when the
    larger mapping support returns.

    x86-64 non-NUMA
    Base VMEMAP % change (-ve good)
    User 85.07 84.84 -0.26
    System 34.32 33.84 -1.39
    Total 119.38 118.68 -0.59

    ia64
    Base VMEMAP % change (-ve good)
    User 1016.41 1016.93 0.05
    System 50.83 51.02 0.36
    Total 1067.25 1067.95 0.07

    x86-64 NUMA
    Base VMEMAP % change (-ve good)
    User 30.77 431.73 0.22
    System 45.39 43.98 -3.11
    Total 476.17 475.71 -0.10

    ppc64
    Base VMEMAP % change (-ve good)
    User 488.77 488.35 -0.09
    System 56.92 56.37 -0.97
    Total 545.69 544.72 -0.18

    Below are some AIM bencharks on IA64 and x86-64 (thank Bob). The seems
    pretty much flat as you would expect.

    ia64 results 2 cpu non-numa 4Gb SCSI disk

    Benchmark Version Machine Run Date
    AIM Multiuser Benchmark - Suite VII "1.1" extreme Jun 1 07:17:24 2007

    Tasks Jobs/Min JTI Real CPU Jobs/sec/task
    1 98.9 100 58.9 1.3 1.6482
    101 5547.1 95 106.0 79.4 0.9154
    201 6377.7 95 183.4 158.3 0.5288
    301 6932.2 95 252.7 237.3 0.3838
    401 7075.8 93 329.8 316.7 0.2941
    501 7235.6 94 403.0 396.2 0.2407
    600 7387.5 94 472.7 475.0 0.2052

    Benchmark Version Machine Run Date
    AIM Multiuser Benchmark - Suite VII "1.1" vmemmap Jun 1 09:59:04 2007

    Tasks Jobs/Min JTI Real CPU Jobs/sec/task
    1 99.1 100 58.8 1.2 1.6509
    101 5480.9 95 107.2 79.2 0.9044
    201 6490.3 95 180.2 157.8 0.5382
    301 6886.6 94 254.4 236.8 0.3813
    401 7078.2 94 329.7 316.0 0.2942
    501 7250.3 95 402.2 395.4 0.2412
    600 7399.1 94 471.9 473.9 0.2055

    open power 710 2 cpu, 4 Gb, SCSI and configured physically

    Benchmark Version Machine Run Date
    AIM Multiuser Benchmark - Suite VII "1.1" extreme May 29 15:42:53 2007

    Tasks Jobs/Min JTI Real CPU Jobs/sec/task
    1 25.7 100 226.3 4.3 0.4286
    101 1096.0 97 536.4 199.8 0.1809
    201 1236.4 96 946.1 389.1 0.1025
    301 1280.5 96 1368.0 582.3 0.0709
    401 1270.2 95 1837.4 771.0 0.0528
    501 1251.4 96 2330.1 955.9 0.0416
    601 1252.6 96 2792.4 1139.2 0.0347
    701 1245.2 96 3276.5 1334.6 0.0296
    918 1229.5 96 4345.4 1728.7 0.0223

    Benchmark Version Machine Run Date
    AIM Multiuser Benchmark - Suite VII "1.1" vmemmap May 30 07:28:26 2007

    Tasks Jobs/Min JTI Real CPU Jobs/sec/task
    1 25.6 100 226.9 4.3 0.4275
    101 1049.3 97 560.2 198.1 0.1731
    201 1199.1 97 975.6 390.7 0.0994
    301 1261.7 96 1388.5 591.5 0.0699
    401 1256.1 96 1858.1 771.9 0.0522
    501 1220.1 96 2389.7 955.3 0.0406
    601 1224.6 96 2856.3 1133.4 0.0340
    701 1252.0 96 3258.7 1314.1 0.0298
    915 1232.8 96 4319.7 1704.0 0.0225

    amd64 2 2-core, 4Gb and SATA

    Benchmark Version Machine Run Date
    AIM Multiuser Benchmark - Suite VII "1.1" extreme Jun 2 03:59:48 2007

    Tasks Jobs/Min JTI Real CPU Jobs/sec/task
    1 13.0 100 446.4 2.1 0.2173
    101 533.4 97 1102.0 110.2 0.0880
    201 578.3 97 2022.8 220.8 0.0480
    301 583.8 97 3000.6 332.3 0.0323
    401 580.5 97 4020.1 442.2 0.0241
    501 574.8 98 5072.8 558.8 0.0191
    600 566.5 98 6163.8 671.0 0.0157

    Benchmark Version Machine Run Date
    AIM Multiuser Benchmark - Suite VII "1.1" vmemmap Jun 3 04:19:31 2007

    Tasks Jobs/Min JTI Real CPU Jobs/sec/task
    1 13.0 100 447.8 2.0 0.2166
    101 536.5 97 1095.6 109.7 0.0885
    201 567.7 97 2060.5 219.3 0.0471
    301 582.1 96 3009.4 330.2 0.0322
    401 578.2 96 4036.4 442.4 0.0240
    501 585.1 98 4983.2 555.1 0.0195
    600 565.5 98 6175.2 660.6 0.0157

    This patch:

    Fix some spelling errors.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andy Whitcroft
    Acked-by: Mel Gorman
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Cc: "David S. Miller"
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     

23 Aug, 2007

1 commit

  • Booting SPARSEMEM on NUMA systems trips a BUG in page_alloc.c:

    Initializing HighMem for node 0 (00038000:00100000)
    Initializing HighMem for node 1 (00100000:001ffe00)
    ------------[ cut here ]------------
    kernel BUG at /home/apw/git/linux-2.6/mm/page_alloc.c:456!
    [...]

    This occurs because the section to node id mapping is not being
    setup correctly during init under SPARSEMEM_STATIC, leading to an
    attempt to free pages from all nodes into the zones on node 0.

    When the zone_table[] was removed in the following commit, a new
    section to node mapping table was introduced:

    commit 89689ae7f95995723fbcd5c116c47933a3bb8b13
    [PATCH] Get rid of zone_table[]

    That conversion inadvertantly only initialised the node mapping in
    SPARSEMEM_EXTREME. Ensure we initialise the node mapping in
    SPARSEMEM_STATIC.

    [akpm@linux-foundation.org: make the stubs static inline]
    Signed-off-by: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     

23 Jul, 2007

1 commit

  • Fix following warning:
    WARNING: vmlinux.o(.text+0x188ea): Section mismatch: reference to .init.text:__alloc_bootmem_core (between 'alloc_bootmem_high_node' and 'get_gate_vma')

    alloc_bootmem_high_node() is only used from __init scope so declare it __init.
    And in addition declare the weak variant __init too.

    Signed-off-by: Sam Ravnborg
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Sam Ravnborg
     

09 Jun, 2007

1 commit


01 Jun, 2007

1 commit

  • On systems with huge amount of physical memory, VFS cache and memory memmap
    may eat all available system memory under 4G, then the system may fail to
    allocate swiotlb bounce buffer.

    There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
    not cover sparsemem model.

    This patch add fix to sparsemem model by first try to allocate memmap above
    4G.

    Signed-off-by: Zou Nan hai
    Acked-by: Suresh Siddha
    Cc: Andi Kleen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zou Nan hai
     

19 May, 2007

1 commit

  • modpost had two cases hardcoded for mm/
    Shift over to __init_refok and kill the
    hardcoded function names in modpost.

    This has the drawback that the functions
    will always be kept no matter configuration.
    With previous code the function were placed in
    init section if configuration allowed it.

    Signed-off-by: Sam Ravnborg

    Sam Ravnborg
     

09 May, 2007

2 commits

  • This patch is add white list into modpost.c for some functions and
    ia64's section to fix section mismatchs.

    sparse_index_alloc() and zone_wait_table_init() calls bootmem allocator
    at boot time, and kmalloc/vmalloc at hotplug time. If config
    memory hotplug is on, there are references of bootmem allocater(init text)
    from them (normal text). This is cause of section mismatch.

    Bootmem is called by many functions and it must be
    used only at boot time. I think __init of them should keep for
    section mismatch check. So, I would like to register sparse_index_alloc()
    and zone_wait_table_init() into white list.

    In addition, ia64's .machvec section is function table of some platform
    dependent code. It is mixture of .init.text and normal text. These
    reference of __init functions are valid too.

    Signed-off-by: Yasunori Goto
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • This is to fix many section mismatches of code related to memory hotplug.
    I checked compile with memory hotplug on/off on ia64 and x86-64 box.

    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

07 May, 2007

1 commit


08 Dec, 2006

2 commits

  • NUMA node ids are passed as either int or unsigned int almost exclusivly
    page_to_nid and zone_to_nid both return unsigned long. This is a throw
    back to when page_to_nid was a #define and was thus exposing the real type
    of the page flags field.

    In addition to fixing up the definitions of page_to_nid and zone_to_nid I
    audited the users of these functions identifying the following incorrect
    uses:

    1) mm/page_alloc.c show_node() -- printk dumping the node id,
    2) include/asm-ia64/pgalloc.h pgtable_quicklist_free() -- comparison
    against numa_node_id() which returns an int from cpu_to_node(), and
    3) mm/mpolicy.c check_pte_range -- used as an index in node_isset which
    uses bit_set which in generic code takes an int.

    Signed-off-by: Andy Whitcroft
    Cc: Christoph Lameter
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • The zone table is mostly not needed. If we have a node in the page flags
    then we can get to the zone via NODE_DATA() which is much more likely to be
    already in the cpu cache.

    In case of SMP and UP NODE_DATA() is a constant pointer which allows us to
    access an exact replica of zonetable in the node_zones field. In all of
    the above cases there will be no need at all for the zone table.

    The only remaining case is if in a NUMA system the node numbers do not fit
    into the page flags. In that case we make sparse generate a table that
    maps sections to nodes and use that table to to figure out the node number.
    This table is sized to fit in a single cache line for the known 32 bit
    NUMA platform which makes it very likely that the information can be
    obtained without a cache miss.

    For sparsemem the zone table seems to be have been fairly large based on
    the maximum possible number of sections and the number of zones per node.
    There is some memory saving by removing zone_table. The main benefit is to
    reduce the cache foootprint of the VM from the frequent lookups of zones.
    Plus it simplifies the page allocator.

    [akpm@osdl.org: build fix]
    Signed-off-by: Christoph Lameter
    Cc: Dave Hansen
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

29 Oct, 2006

1 commit

  • Add __GFP_NOWARN flag to calling of __alloc_pages() in
    __kmalloc_section_memmap(). It can reduce noisy failure message.

    In ia64, section size is 1 GB, this means that order 8 pages are necessary
    for each section's memmap. It is often very hard requirement under heavy
    memory pressure as you know. So, __alloc_pages() gives up allocation and
    shows many noisy stack traces which means no page for each sections.
    (Current my environment shows 32 times of stack trace....)

    But, __kmalloc_section_memmap() calls vmalloc() after failure of it, and it
    can succeed allocation of memmap. So, its stack trace warning becomes just
    noisy. I suppose it shouldn't be shown.

    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

01 Jul, 2006

1 commit


28 Jun, 2006

1 commit

  • locking init cleanups:

    - convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
    - convert rwlocks in a similar manner

    this patch was generated automatically.

    Motivation:

    - cleanliness
    - lockdep needs control of lock initialization, which the open-coded
    variants do not give
    - it's also useful for -rt and for lock debugging in general

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

23 Jun, 2006

1 commit

  • Record the node id as we mark sections for instantiation. Use this nid
    during instantiation to direct allocations.

    Signed-off-by: Andy Whitcroft
    Cc: Mike Kravetz
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Bob Picco
    Cc: Jack Steiner
    Cc: Yasunori Goto
    Cc: Martin Bligh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     

22 May, 2006

1 commit

  • A bad calculation/loop in __section_nr() could result in incorrect section
    information being put into sysfs memory entries. This primarily impacts
    memory add operations as the sysfs information is used while onlining new
    memory.

    Fix suggested by Dave Hansen.

    Note that the bug may not be obvious from the patch. It actually occurs in
    the function's return statement:

    return (root_nr * SECTIONS_PER_ROOT) + (ms - root);

    In the existing code, root_nr has already been multiplied by
    SECTIONS_PER_ROOT.

    Signed-off-by: Mike Kravetz
    Cc: Dave Hansen
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

16 May, 2006

1 commit


02 May, 2006

1 commit

  • This patch fixes two bugs with the way sparsemem interacts with memory add.
    They are:

    - memory leak if memmap for section already exists

    - calling alloc_bootmem_node() after boot

    These bugs were discovered and a first cut at the fixes were provided by
    Arnd Bergmann and Joel Schopp .

    Signed-off-by: Mike Kravetz
    Signed-off-by: Joel Schopp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     

09 Jan, 2006

1 commit

  • ____cacheline_maxaligned_in_smp is currently used to align critical structures
    and avoid false sharing. It uses per-arch L1_CACHE_SHIFT_MAX and people find
    L1_CACHE_SHIFT_MAX useless.

    However, we have been using ____cacheline_maxaligned_in_smp to align
    structures on the internode cacheline size. As per Andi's suggestion,
    following patch kills ____cacheline_maxaligned_in_smp and introduces
    INTERNODE_CACHE_SHIFT, which defaults to L1_CACHE_SHIFT for all arches.
    Arches needing L3/Internode cacheline alignment can define
    INTERNODE_CACHE_SHIFT in the arch asm/cache.h. Patch replaces
    ____cacheline_maxaligned_in_smp with ____cacheline_internodealigned_in_smp

    With this patch, L1_CACHE_SHIFT_MAX can be killed

    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Shai Fultheim
    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ravikiran G Thirumalai
     

30 Oct, 2005

2 commits


05 Sep, 2005

3 commits

  • This splits up sparse_index_alloc() into two pieces. This is needed
    because we'll allocate the memory for the second level in a different place
    from where we actually consume it to keep the allocation from happening
    underneath a lock

    Signed-off-by: Dave Hansen
    Signed-off-by: Bob Picco
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • With cleanups from Dave Hansen

    SPARSEMEM_EXTREME makes mem_section a one dimensional array of pointers to
    mem_sections. This two level layout scheme is able to achieve smaller
    memory requirements for SPARSEMEM with the tradeoff of an additional shift
    and load when fetching the memory section. The current SPARSEMEM
    implementation is a one dimensional array of mem_sections which is the
    default SPARSEMEM configuration. The patch attempts isolates the
    implementation details of the physical layout of the sparsemem section
    array.

    SPARSEMEM_EXTREME requires bootmem to be functioning at the time of
    memory_present() calls. This is not always feasible, so architectures
    which do not need it may allocate everything statically by using
    SPARSEMEM_STATIC.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Bob Picco
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Picco
     
  • A new option for SPARSEMEM is ARCH_SPARSEMEM_EXTREME. Architecture
    platforms with a very sparse physical address space would likely want to
    select this option. For those architecture platforms that don't select the
    option, the code generated is equivalent to SPARSEMEM currently in -mm.
    I'll be posting a patch on ia64 ml which uses this new SPARSEMEM feature.

    ARCH_SPARSEMEM_EXTREME makes mem_section a one dimensional array of
    pointers to mem_sections. This two level layout scheme is able to achieve
    smaller memory requirements for SPARSEMEM with the tradeoff of an
    additional shift and load when fetching the memory section. The current
    SPARSEMEM -mm implementation is a one dimensional array of mem_sections
    which is the default SPARSEMEM configuration. The patch attempts isolates
    the implementation details of the physical layout of the sparsemem section
    array.

    ARCH_SPARSEMEM_EXTREME depends on 64BIT and is by default boolean false.

    I've boot tested under aim load ia64 configured for ARCH_SPARSEMEM_EXTREME.
    I've also boot tested a 4 way Opteron machine with !ARCH_SPARSEMEM_EXTREME
    and tested with aim.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Bob Picco
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Picco
     

24 Jun, 2005

2 commits

  • Make sparse's initalization be accessible at runtime. This allows sparse
    mappings to be created after boot in a hotplug situation.

    This patch is separated from the previous one just to give an indication how
    much of the sparse infrastructure is *just* for hotplug memory.

    The section_mem_map doesn't really store a pointer. It stores something that
    is convenient to do some math against to get a pointer. It isn't valid to
    just do *section_mem_map, so I don't think it should be stored as a pointer.

    There are a couple of things I'd like to store about a section. First of all,
    the fact that it is !NULL does not mean that it is present. There could be
    such a combination where section_mem_map *is* NULL, but the math gets you
    properly to a real mem_map. So, I don't think that check is safe.

    Since we're storing 32-bit-aligned structures, we have a few bits in the
    bottom of the pointer to play with. Use one bit to encode whether there's
    really a mem_map there, and the other one to tell whether there's a valid
    section there. We need to distinguish between the two because sometimes
    there's a gap between when a section is discovered to be present and when we
    can get the mem_map for it.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andy Whitcroft
    Signed-off-by: Jack Steiner
    Signed-off-by: Bob Picco
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
    mem_map[] is needed by discontiguous memory machines (like in the old
    CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
    replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
    become a complete replacement.

    A significant advantage over DISCONTIGMEM is that it's completely separated
    from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
    and DISCONTIG are often confused.

    Another advantage is that sparse doesn't require each NUMA node's ranges to be
    contiguous. It can handle overlapping ranges between nodes with no problems,
    where DISCONTIGMEM currently throws away that memory.

    Sparsemem uses an array to provide different pfn_to_page() translations for
    each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
    to be chopped up.

    In order to do quick pfn_to_page() operations, the section number of the page
    is encoded in page->flags. Part of the sparsemem infrastructure enables
    sharing of these bits more dynamically (at compile-time) between the
    page_zone() and sparsemem operations. However, on 32-bit architectures, the
    number of bits is quite limited, and may require growing the size of the
    page->flags type in certain conditions. Several things might force this to
    occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
    memory), an increase in the physical address space, or an increase in the
    number of used page->flags.

    One thing to note is that, once sparsemem is present, the NUMA node
    information no longer needs to be stored in the page->flags. It might provide
    speed increases on certain platforms and will be stored there if there is
    room. But, if out of room, an alternate (theoretically slower) mechanism is
    used.

    This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
    there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
    often have to compile out the same areas of code.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Martin Bligh
    Signed-off-by: Adrian Bunk
    Signed-off-by: Yasunori Goto
    Signed-off-by: Bob Picco
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft