07 Jan, 2009

2 commits

  • GFP_HIGHUSER_PAGECACHE is just an alias for GFP_HIGHUSER_MOVABLE, making
    that harder to track down: remove it, and its out-of-work brothers
    GFP_NOFS_PAGECACHE and GFP_USER_PAGECACHE.

    Since we're making that improvement to hotremove_migrate_alloc(), I think
    we can now also remove one of the "o"s from its comment.

    Signed-off-by: Hugh Dickins
    Acked-by: Mel Gorman
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Show node to memory section relationship with symlinks in sysfs

    Add /sys/devices/system/node/nodeX/memoryY symlinks for all
    the memory sections located on nodeX. For example:
    /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
    indicates that memory section 135 resides on node1.

    Also revises documentation to cover this change as well as updating
    Documentation/ABI/testing/sysfs-devices-memory to include descriptions
    of memory hotremove files 'phys_device', 'phys_index', and 'state'
    that were previously not described there.

    In addition to it always being a good policy to provide users with
    the maximum possible amount of physical location information for
    resources that can be hot-added and/or hot-removed, the following
    are some (but likely not all) of the user benefits provided by
    this change.
    Immediate:
    - Provides information needed to determine the specific node
    on which a defective DIMM is located. This will reduce system
    downtime when the node or defective DIMM is swapped out.
    - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM. This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node. The consequences of reintroducing the defective memory
    could be ugly.
    - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
    Future:
    - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

    Symlink creation during boot was tested on 2-node x86_64, 2-node
    ppc64, and 2-node ia64 systems. Symlink creation during physical
    memory hot-add tested on a 2-node x86_64 system.

    Signed-off-by: Gary Hade
    Signed-off-by: Badari Pulavarty
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gary Hade
     

01 Dec, 2008

1 commit


20 Nov, 2008

1 commit

  • After adding a node into the machine, top cpuset's mems isn't updated.

    By reviewing the code, we found that the update function

    cpuset_track_online_nodes()

    was invoked after node_states[N_ONLINE] changes. It is wrong because
    N_ONLINE just means node has pgdat, and if node has/added memory, we use
    N_HIGH_MEMORY. So, We should invoke the update function after
    node_states[N_HIGH_MEMORY] changes, just like its commit says.

    This patch fixes it. And we use notifier of memory hotplug instead of
    direct calling of cpuset_track_online_nodes().

    Signed-off-by: Miao Xie
    Acked-by: Yasunori Goto
    Cc: David Rientjes
    Cc: Paul Menage
    Signed-off-by: Linus Torvalds

    Miao Xie
     

20 Oct, 2008

3 commits

  • During hotplug memory remove, memory regions should be released on a
    PAGES_PER_SECTION size chunks. This mirrors the code in add_memory where
    resources are requested on a PAGES_PER_SECTION size.

    Attempting to release the entire memory region fails because there is not
    a single resource for the total number of pages being removed. Instead
    the resources for the pages are split in PAGES_PER_SECTION size chunks as
    requested during memory add.

    Signed-off-by: Nathan Fontenot
    Signed-off-by: Badari Pulavarty
    Acked-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nathan Fontenot
     
  • On large memory systems, the VM can spend way too much time scanning
    through pages that it cannot (or should not) evict from memory. Not only
    does it use up CPU time, but it also provokes lock contention and can
    leave large systems under memory presure in a catatonic state.

    This patch series improves VM scalability by:

    1) putting filesystem backed, swap backed and unevictable pages
    onto their own LRUs, so the system only scans the pages that it
    can/should evict from memory

    2) switching to two handed clock replacement for the anonymous LRUs,
    so the number of pages that need to be scanned when the system
    starts swapping is bound to a reasonable number

    3) keeping unevictable pages off the LRU completely, so the
    VM does not waste CPU time scanning them. ramfs, ramdisk,
    SHM_LOCKED shared memory segments and mlock()ed VMA pages
    are keept on the unevictable list.

    This patch:

    isolate_lru_page logically belongs to be in vmscan.c than migrate.c.

    It is tough, because we don't need that function without memory migration
    so there is a valid argument to have it in migrate.c. However a
    subsequent patch needs to make use of it in the core mm, so we can happily
    move it to vmscan.c.

    Also, make the function a little more generic by not requiring that it
    adds an isolated page to a given list. Callers can do that.

    Note that we now have '__isolate_lru_page()', that does
    something quite different, visible outside of vmscan.c
    for use with memory controller. Methinks we need to
    rationalize these names/purposes. --lts

    [akpm@linux-foundation.org: fix mm/memory_hotplug.c build]
    Signed-off-by: Nick Piggin
    Signed-off-by: Rik van Riel
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • There is nothing architecture specific about remove_memory().
    remove_memory() function is common for all architectures which support
    hotplug memory remove. Instead of duplicating it in every architecture,
    collapse them into arch neutral function.

    [akpm@linux-foundation.org: fix the export]
    Signed-off-by: Badari Pulavarty
    Cc: Yasunori Goto
    Cc: Gary Hade
    Cc: Mel Gorman
    Cc: Yasunori Goto
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

25 Jul, 2008

5 commits

  • Memory may be hot-removed on a per-memory-block basis, particularly on
    POWER where the SPARSEMEM section size often matches the memory-block
    size. A user-level agent must be able to identify which sections of
    memory are likely to be removable before attempting the potentially
    expensive operation. This patch adds a file called "removable" to the
    memory directory in sysfs to help such an agent. In this patch, a memory
    block is considered removable if;

    o It contains only MOVABLE pageblocks
    o It contains only pageblocks with free pages regardless of pageblock type

    On the other hand, a memory block starting with a PageReserved() page will
    never be considered removable. Without this patch, the user-agent is
    forced to choose a memory block to remove randomly.

    Sample output of the sysfs files:

    ./memory/memory0/removable: 0
    ./memory/memory1/removable: 0
    ./memory/memory2/removable: 0
    ./memory/memory3/removable: 0
    ./memory/memory4/removable: 0
    ./memory/memory5/removable: 0
    ./memory/memory6/removable: 0
    ./memory/memory7/removable: 1
    ./memory/memory8/removable: 0
    ./memory/memory9/removable: 0
    ./memory/memory10/removable: 0
    ./memory/memory11/removable: 0
    ./memory/memory12/removable: 0
    ./memory/memory13/removable: 0
    ./memory/memory14/removable: 0
    ./memory/memory15/removable: 0
    ./memory/memory16/removable: 0
    ./memory/memory17/removable: 1
    ./memory/memory18/removable: 1
    ./memory/memory19/removable: 1
    ./memory/memory20/removable: 1
    ./memory/memory21/removable: 1
    ./memory/memory22/removable: 1

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Mel Gorman
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • If zonelist is required to be rebuilt in online_pages(), there is no need
    to recalculate vm_total_pages in that function, as it has been updated in
    the call build_all_zonelists().

    Signed-off-by: Kent Liu
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Yasunori Goto
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Liu
     
  • - Change some naming
    * Magic -> types
    * MIX_INFO -> MIX_SECTION_INFO
    * Change definition of bootmem type from direct hex value

    - __free_pages_bootmem() becomes __meminit.

    Signed-off-by: Yasunori Goto
    Cc: Andy Whitcroft
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • Make the needlessly global register_page_bootmem_info_section() static.

    Signed-off-by: Adrian Bunk
    Acked-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • free_area_init_node() gets passed in the node id as well as the node
    descriptor. This is redundant as the function can trivially get the node
    descriptor itself by means of NODE_DATA() and the node's id.

    I checked all the users and NODE_DATA() seems to be usable everywhere
    from where this function is called.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

15 May, 2008

3 commits

  • Trying to online a new memory section that was added via memory hotplug
    sometimes results in crashes when the new pages are added via __free_page.
    Reason for that is that the pageblock bitmap isn't initialized and hence
    contains random stuff. That means that get_pageblock_migratetype()
    returns also random stuff and therefore

    list_add(&page->lru,
    &zone->free_area[order].free_list[migratetype]);

    in __free_one_page() tries to do a list_add to something that isn't even
    necessarily a list.

    This happens since 86051ca5eaf5e560113ec7673462804c54284456 ("mm: fix
    usemap initialization") which makes sure that the pageblock bitmap gets
    only initialized for pages present in a zone. Unfortunately for hot-added
    memory the zones "grow" after the memmap and the pageblock memmap have
    been initialized. Which means that the new pages have an unitialized
    bitmap. To solve this the calls to grow_zone_span() and grow_pgdat_span()
    are moved to __add_zone() just before the initialization happens.

    The patch also moves the two functions since __add_zone() is the only
    caller and I didn't want to add a forward declaration.

    Signed-off-by: Heiko Carstens
    Cc: Andy Whitcroft
    Cc: Dave Hansen
    Cc: Gerald Schaefer
    Cc: KAMEZAWA Hiroyuki
    Cc: Yasunori Goto
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Add a check to online_pages() to test for failure of
    walk_memory_resource(). This fixes a condition where a failure
    of walk_memory_resource() can lead to online_pages() returning
    success without the requested pages being onlined.

    Signed-off-by: Geoff Levand
    Cc: Yasunori Goto
    Cc: KAMEZAWA Hiroyuki
    Cc: Dave Hansen
    Cc: Keith Mannthey
    Cc: Christoph Lameter
    Cc: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geoff Levand
     
  • __add_zone calls memmap_init_zone twice if memory gets attached to an empty
    zone. Once via init_currently_empty_zone and once explictly right after that
    call.

    Looks like this is currently not a bug, however the call is superfluous and
    might lead to subtle bugs if memmap_init_zone gets changed. So make sure it
    is called only once.

    Cc: Yasunori Goto
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Dave Hansen
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

29 Apr, 2008

1 commit

  • This patch fixes the following compile error caused by commit
    04753278769f3b6c3b79a080edb52f21d83bf6e2 ("memory hotplug: register
    section/node id to free"):

    CC mm/memory_hotplug.o
    /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/memory_hotplug.c: In function ‘put_page_bootmem’:
    /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/memory_hotplug.c:82: error: implicit declaration of function ‘__free_pages_bootmem’
    /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/memory_hotplug.c: At top level:
    /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/memory_hotplug.c:87: warning: no previous prototype for ‘register_page_bootmem_info_section’
    make[2]: *** [mm/memory_hotplug.o] Error 1

    [ Andrew: "Argh. The -mm-only memory-hotplug-add-removable-to-sysfs-
    to-show-memblock-removability.patch debugging patch adds that include
    so nobody hit this before. ]

    Signed-off-by: Adrian Bunk
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

28 Apr, 2008

4 commits

  • This patch is to free memmaps which is allocated by bootmem.

    Freeing usemap is not necessary. The pages of usemap may be necessary for
    other sections.

    If removing section is last section on the node, its section is the final user
    of usemap page. (usemaps are allocated on its section by previous patch.) But
    it shouldn't be freed too, because the section must be logical offline state
    which all pages are isolated against page allocater. If it is freed, page
    alloctor may use it which will be removed physically soon. It will be
    disaster. So, this patch keeps it as it is.

    Signed-off-by: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • This patch set is to free pages which is allocated by bootmem for
    memory-hotremove. Some structures of memory management are allocated by
    bootmem. ex) memmap, etc.

    To remove memory physically, some of them must be freed according to
    circumstance. This patch set makes basis to free those pages, and free
    memmaps.

    Basic my idea is using remain members of struct page to remember information
    of users of bootmem (section number or node id). When the section is
    removing, kernel can confirm it. By this information, some issues can be
    solved.

    1) When the memmap of removing section is allocated on other
    section by bootmem, it should/can be free.
    2) When the memmap of removing section is allocated on the
    same section, it shouldn't be freed. Because the section has to be
    logical memory offlined already and all pages must be isolated against
    page allocater. If it is freed, page allocator may use it which will
    be removed physically soon.
    3) When removing section has other section's memmap,
    kernel will be able to show easily which section should be removed
    before it for user. (Not implemented yet)
    4) When the above case 2), the page isolation will be able to check and skip
    memmap's page when logical memory offline (offline_pages()).
    Current page isolation code fails in this case because this page is
    just reserved page and it can't distinguish this pages can be
    removed or not. But, it will be able to do by this patch.
    (Not implemented yet.)
    5) The node information like pgdat has similar issues. But, this
    will be able to be solved too by this.
    (Not implemented yet, but, remembering node id in the pages.)

    Fortunately, current bootmem allocator just keeps PageReserved flags,
    and doesn't use any other members of page struct. The users of
    bootmem doesn't use them too.

    This patch:

    This is to register information which is node or section's id. Kernel can
    distinguish which node/section uses the pages allcated by bootmem. This is
    basis for hot-remove sections or nodes.

    Signed-off-by: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • All architectures use an effectively identical definition of online_page(), so
    just make it common code. x86-64, ia64, powerpc and sh are actually
    identical; x86-32 is slightly different.

    x86-32's differences arise because it puts its hotplug pages in the highmem
    zone. We can handle this in the generic code by inspecting the page to see if
    its in highmem, and update the totalhigh_pages count appropriately. This
    leaves init_32.c:free_new_highpage with a single caller, so I folded it into
    add_one_highpage_init.

    I also removed an incorrect comment referring to the NUMA case; any NUMA
    details have already been dealt with by the time online_page() is called.

    [akpm@linux-foundation.org: fix indenting]
    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Dave Hansen
    Reviewed-by: KAMEZAWA Hiroyuki
    Tested-by: KAMEZAWA Hiroyuki
    Cc: Yasunori Goto
    Cc: Christoph Lameter
    Acked-by: Ingo Molnar
    Acked-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeremy Fitzhardinge
     
  • Generic helper function to remove section mappings and sysfs entries for the
    section of the memory we are removing. offline_pages() correctly adjusted
    zone and marked the pages reserved.

    TODO: Yasunori Goto is working on patches to free up allocations from bootmem.

    Signed-off-by: Badari Pulavarty
    Acked-by: Yasunori Goto
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

20 Apr, 2008

1 commit


06 Feb, 2008

1 commit

  • - Add comments explaing how drain_pages() works.

    - Eliminate useless functions

    - Rename drain_all_local_pages to drain_all_pages(). It does drain
    all pages not only those of the local processor.

    - Eliminate useless interrupt off / on sequences. drain_pages()
    disables interrupts on its own. The execution thread is
    pinned to processor by the caller. So there is no need to
    disable interrupts.

    - Put drain_all_pages() declaration in gfp.h and remove the
    declarations from suspend.h and from mm/memory_hotplug.c

    - Make software suspend call drain_all_pages(). The draining
    of processor local pages is may not the right approach if
    software suspend wants to support SMP. If they call drain_all_pages
    then we can make drain_pages() static.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Christoph Lameter
    Acked-by: Mel Gorman
    Cc: "Rafael J. Wysocki"
    Cc: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

15 Nov, 2007

2 commits

  • i386 and x86-64 registers System RAM as IORESOURCE_MEM | IORESOURCE_BUSY.

    But ia64 registers it as IORESOURCE_MEM only.
    In addition, memory hotplug code registers new memory as IORESOURCE_MEM too.

    This difference causes a failure of memory unplug of x86-64. This patch
    fixes it.

    This patch adds IORESOURCE_BUSY to avoid potential overlap mapping by PCI
    device.

    Signed-off-by: Yasunori Goto
    Signed-off-by: Badari Pulavarty
    Cc: Luck, Tony"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • We should unset migrate type "ISOLATE" when we successfully removed memory.
    But current code has BUG and cannot works well.

    This patch also includes bugfix? to change get_pageblock_flags to
    get_pageblock_migratetype().

    Thanks to Badari Pulavarty for finding this.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

22 Oct, 2007

1 commit

  • Current memory notifier has some defects yet. (Fortunately, nothing uses
    it.) This patch is to fix and rearrange for them.

    - Add information of start_pfn, nr_pages, and node id if node status is
    changes from/to memoryless node for callback functions.
    Callbacks can't do anything without those information.
    - Add notification going-online status.
    It is necessary for creating per node structure before the node's
    pages are available.
    - Move GOING_OFFLINE status notification after page isolation.
    It is good place for return memory like cache for callback,
    because returned page is not used again.
    - Make CANCEL events for rollingback when error occurs.
    - Delete MEM_MAPPING_INVALID notification. It will be not used.
    - Fix compile error of (un)register_memory_notifier().

    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

20 Oct, 2007

1 commit


17 Oct, 2007

4 commits

  • Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
    This patch cleans up them. This is against 2.6.23-rc6-mm1.

    - fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case.
    - For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
    which returns -EINVAL.
    - removed remove_pages() only used in powerpc.
    - removed no-op remove_memory() in i386, sh, sparc64, x86_64.

    - only powerpc returns -ENOSYS at memory hot remove(no-op). changes it
    to return -EINVAL.

    Note:
    Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
    archs if there are requirements and testers.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Logic.
    - set all pages in [start,end) as isolated migration-type.
    by this, all free pages in the range will be not-for-use.
    - Migrate all LRU pages in the range.
    - Test all pages in the range's refcnt is zero or not.

    Todo:
    - allocate migration destination page from better area.
    - confirm page_count(page)== 0 && PageReserved(page) page is safe to be freed..
    (I don't like this kind of page but..
    - Find out pages which cannot be migrated.
    - more running tests.
    - Use reclaim for unplugging other memory type area.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • A clean up patch for "scanning memory resource [start, end)" operation.

    Now, find_next_system_ram() function is used in memory hotplug, but this
    interface is not easy to use and codes are complicated.

    This patch adds walk_memory_resouce(start,len,arg,func) function.
    The function 'func' is called per valid memory resouce range in [start,pfn).

    [pbadari@us.ibm.com: Error handling in walk_memory_resource()]
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • It is necessary to know if nodes have memory since we have recently begun to
    add support for memoryless nodes. For that purpose we introduce a two new
    node states: N_HIGH_MEMORY and N_NORMAL_MEMORY.

    A node has its bit in N_HIGH_MEMORY set if it has any memory regardless of the
    type of mmemory. If a node has memory then it has at least one zone defined
    in its pgdat structure that is located in the pgdat itself.

    A node has its bit in N_NORMAL_MEMORY set if it has a lower zone than
    ZONE_HIGHMEM. This means it is possible to allocate memory that is not
    subject to kmap.

    N_HIGH_MEMORY and N_NORMAL_MEMORY can then be used in various places to insure
    that we do the right thing when we encounter a memoryless node.

    [akpm@linux-foundation.org: build fix]
    [Lee.Schermerhorn@hp.com: update N_HIGH_MEMORY node state for memory hotadd]
    [y-goto@jp.fujitsu.com: Fix memory hotplug + sparsemem build]
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Christoph Lameter
    Acked-by: Bob Picco
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Yasunori Goto
    Signed-off-by: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

01 Jun, 2007

1 commit


12 Jan, 2007

1 commit

  • Fix an oops experienced on the Cell architecture when init-time functions,
    early_*(), are called at runtime. It alters the call paths to make sure
    that the callers explicitly say whether the call is being made on behalf of
    a hotplug even, or happening at boot-time.

    It has been compile tested on ppc64, ia64, s390, i386 and x86_64.

    Acked-by: Arnd Bergmann
    Signed-off-by: Dave Hansen
    Cc: Yasunori Goto
    Acked-by: Andy Whitcroft
    Cc: Christoph Lameter
    Cc: Martin Schwidefsky
    Acked-by: Heiko Carstens
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

08 Dec, 2006

1 commit

  • The zone table is mostly not needed. If we have a node in the page flags
    then we can get to the zone via NODE_DATA() which is much more likely to be
    already in the cpu cache.

    In case of SMP and UP NODE_DATA() is a constant pointer which allows us to
    access an exact replica of zonetable in the node_zones field. In all of
    the above cases there will be no need at all for the zone table.

    The only remaining case is if in a NUMA system the node numbers do not fit
    into the page flags. In that case we make sparse generate a table that
    maps sections to nodes and use that table to to figure out the node number.
    This table is sized to fit in a single cache line for the known 32 bit
    NUMA platform which makes it very likely that the information can be
    obtained without a cache miss.

    For sparsemem the zone table seems to be have been fairly large based on
    the maximum possible number of sections and the number of zones per node.
    There is some memory saving by removing zone_table. The main benefit is to
    reduce the cache foootprint of the VM from the frequent lookups of zones.
    Plus it simplifies the page allocator.

    [akpm@osdl.org: build fix]
    Signed-off-by: Christoph Lameter
    Cc: Dave Hansen
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

01 Oct, 2006

3 commits


30 Sep, 2006

2 commits

  • ratelimit_pages in page-writeback.c is recalculated (in set_ratelimit())
    every time a CPU is hot-added/removed. But this value is not recalculated
    when new pages are hot-added.

    This patch fixes that problem by calling set_ratelimit() when new pages
    are hot-added.

    [akpm@osdl.org: cleanups]
    Signed-off-by: Chandra Seetharaman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chandra Seetharaman
     
  • Change the list of memory nodes allowed to tasks in the top (root) nodeset
    to dynamically track what cpus are online, using a call to a cpuset hook
    from the memory hotplug code. Make this top cpus file read-only.

    On systems that have cpusets configured in their kernel, but that aren't
    actively using cpusets (for some distros, this covers the majority of
    systems) all tasks end up in the top cpuset.

    If that system does support memory hotplug, then these tasks cannot make
    use of memory nodes that are added after system boot, because the memory
    nodes are not allowed in the top cpuset. This is a surprising regression
    over earlier kernels that didn't have cpusets enabled.

    One key motivation for this change is to remain consistent with the
    behaviour for the top_cpuset's 'cpus', which is also read-only, and which
    automatically tracks the cpu_online_map.

    This change also has the minor benefit that it fixes a long standing,
    little noticed, minor bug in cpusets. The cpuset performance tweak to
    short circuit the cpuset_zone_allowed() check on systems with just a single
    cpuset (see 'number_of_cpusets', in linux/cpuset.h) meant that simply
    changing the 'mems' of the top_cpuset had no affect, even though the change
    (the write system call) appeared to succeed. With the following change,
    that write to the 'mems' file fails -EACCES, and the 'mems' file stubbornly
    refuses to be changed via user space writes. Thus no one should be mislead
    into thinking they've changed the top_cpusets's 'mems' when in affect they
    haven't.

    In order to keep the behaviour of cpusets consistent between systems
    actively making use of them and systems not using them, this patch changes
    the behaviour of the 'mems' file in the top (root) cpuset, making it read
    only, and making it automatically track the value of node_online_map. Thus
    tasks in the top cpuset will have automatic use of hot plugged memory nodes
    allowed by their cpuset.

    [akpm@osdl.org: build fix]
    [bunk@stusta.de: build fix]
    Signed-off-by: Paul Jackson
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     

06 Aug, 2006

2 commits

  • This patch is for collision check enhancement for memory hot add.

    It's better to do resouce collision check before doing memory hot add,
    which will touch memory management structures.

    And add_section() should check section exists or not before calling
    sparse_add_one_section(). (sparse_add_one_section() will do another
    check anyway. but checking in memory_hotplug.c will be easy to understand.)

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: keith mannthey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • find_next_system_ram() is used to find available memory resource at onlining
    newly added memory. This patch fixes following problem.

    find_next_system_ram() cannot catch this case.

    Resource: (start)-------------(end)
    Section : (start)-------------(end)

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Keith Mannthey
    Cc: Yasunori Goto
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki