26 Jul, 2011

1 commit

  • This patch contains online_page_callback and apropriate functions for
    registering/unregistering online page callbacks. It allows to do some
    machine specific tasks during online page stage which is required to
    implement memory hotplug in virtual machines. Currently this patch is
    required by latest memory hotplug support for Xen balloon driver patch
    which will be posted soon.

    Additionally, originial online_page() function was splited into
    following functions doing "atomic" operations:

    - __online_page_set_limits() - set new limits for memory management code,
    - __online_page_increment_counters() - increment totalram_pages and totalhigh_pages,
    - __online_page_free() - free page to allocator.

    It was done to:
    - not duplicate existing code,
    - ease hotplug code devolpment by usage of well defined interface,
    - avoid stupid bugs which are unavoidable when the same code
    (by design) is developed in many places.

    [akpm@linux-foundation.org: use explicit indirect-call syntax]
    Signed-off-by: Daniel Kiper
    Reviewed-by: Konrad Rzeszutek Wilk
    Cc: Ian Campbell
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Kiper
     

15 Jan, 2011

1 commit


14 Jan, 2011

1 commit

  • PG_buddy can be converted to _mapcount == -2. So the PG_compound_lock can
    be added to page->flags without overflowing (because of the sparse section
    bits increasing) with CONFIG_X86_PAE=y and CONFIG_X86_PAT=y. This also
    has to move the memory hotplug code from _mapcount to lru.next to avoid
    any risk of clashes. We can't use lru.next for PG_buddy removal, but
    memory hotplug can use lru.next even more easily than the mapcount
    instead.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

11 Jan, 2011

1 commit

  • Now, memory_hotplug_(un)lock() is used for add/remove/offline pages
    for avoiding races with hibernation. But this should be held in
    online_pages(), too. It seems asymmetric.

    There are cases where one has to avoid a race with memory hotplug
    notifier and his own local code, and hotplug v.s. hotplug.
    This will add a generic solution for avoiding races. In other view,
    having lock here has no big impacts. online pages is tend to be
    done by udev script at el against each memory section one by one.

    Then, it's better to have lock here, too.

    Cc: # 2.6.37
    Reviewed-by: Christoph Lameter
    Acked-by: David Rientjes
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Pekka Enberg

    KAMEZAWA Hiroyuki
     

03 Dec, 2010

1 commit

  • Presently hwpoison is using lock_system_sleep() to prevent a race with
    memory hotplug. However lock_system_sleep() is a no-op if
    CONFIG_HIBERNATION=n. Therefore we need a new lock.

    Signed-off-by: KOSAKI Motohiro
    Cc: Andi Kleen
    Cc: Kamezawa Hiroyuki
    Suggested-by: Hugh Dickins
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

27 Oct, 2010

1 commit

  • Now, sysfs interface of memory hotplug shows whether the section is
    removable or not. But it checks only migrateype of pages and doesn't
    check details of cluster of pages.

    Next, memory hotplug's set_migratetype_isolate() has the same kind of
    check, too.

    This patch adds the function __count_unmovable_pages() and makes above 2
    checks to use the same logic. Then, is_removable and hotremove code uses
    the same logic. No changes in the hotremove logic itself.

    TODO: need to find a way to check RECLAMABLE. But, considering bit,
    calling shrink_slab() against a range before starting memory hotremove
    sounds better. If so, this patch's logic doesn't need to be changed.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Reported-by: Michal Hocko
    Cc: Wu Fengguang
    Cc: Mel Gorman
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

25 May, 2010

1 commit

  • Enable users to online CPUs even if the CPUs belongs to a numa node which
    doesn't have onlined local memory.

    The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
    in system boot/init period, or at the time of local memory online. For a
    numa node without onlined local memory, its zonelists are not initialized
    at present. As a result, any memory allocation operations executed by
    CPUs within this node will fail. In fact, an out-of-memory error is
    triggered when attempt to online CPUs before memory comes to online.

    This patch tries to create zonelists for such numa nodes, so that the
    memory allocation for this node can be fallback'ed to other nodes.

    [akpm@linux-foundation.org: remove unneeded export]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: minskey guo
    Cc: Minchan Kim
    Cc: Yasunori Goto
    Cc: Andi Kleen
    Cc: Christoph Lameter
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    minskey guo
     

16 Dec, 2009

1 commit


23 Sep, 2009

1 commit

  • Originally, walk_memory_resource() was introduced to traverse all memory
    of "System RAM" for detecting memory hotplug/unplug range. For doing so,
    flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
    memory hotplug.

    But for using other purpose, /proc/kcore, this may includes some firmware
    area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the
    check strict to find out busy "System RAM".

    Note: PPC64 keeps their own walk_memory_resouce(), which walk through
    ppc64's lmb informaton. Because old kclist_add() is called per lmb, this
    patch makes no difference in behavior, finally.

    And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
    Because pfn_valid() just show "there is memmap or not* and cannot be used
    for "there is physical memory or not", this function is useful in generic
    to scan physical memory range.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Cc: Américo Wang
    Cc: David Rientjes
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

07 Jan, 2009

1 commit

  • Show node to memory section relationship with symlinks in sysfs

    Add /sys/devices/system/node/nodeX/memoryY symlinks for all
    the memory sections located on nodeX. For example:
    /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
    indicates that memory section 135 resides on node1.

    Also revises documentation to cover this change as well as updating
    Documentation/ABI/testing/sysfs-devices-memory to include descriptions
    of memory hotremove files 'phys_device', 'phys_index', and 'state'
    that were previously not described there.

    In addition to it always being a good policy to provide users with
    the maximum possible amount of physical location information for
    resources that can be hot-added and/or hot-removed, the following
    are some (but likely not all) of the user benefits provided by
    this change.
    Immediate:
    - Provides information needed to determine the specific node
    on which a defective DIMM is located. This will reduce system
    downtime when the node or defective DIMM is swapped out.
    - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM. This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node. The consequences of reintroducing the defective memory
    could be ugly.
    - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
    Future:
    - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

    Symlink creation during boot was tested on 2-node x86_64, 2-node
    ppc64, and 2-node ia64 systems. Symlink creation during physical
    memory hot-add tested on a 2-node x86_64 system.

    Signed-off-by: Gary Hade
    Signed-off-by: Badari Pulavarty
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gary Hade
     

25 Jul, 2008

2 commits

  • Memory may be hot-removed on a per-memory-block basis, particularly on
    POWER where the SPARSEMEM section size often matches the memory-block
    size. A user-level agent must be able to identify which sections of
    memory are likely to be removable before attempting the potentially
    expensive operation. This patch adds a file called "removable" to the
    memory directory in sysfs to help such an agent. In this patch, a memory
    block is considered removable if;

    o It contains only MOVABLE pageblocks
    o It contains only pageblocks with free pages regardless of pageblock type

    On the other hand, a memory block starting with a PageReserved() page will
    never be considered removable. Without this patch, the user-agent is
    forced to choose a memory block to remove randomly.

    Sample output of the sysfs files:

    ./memory/memory0/removable: 0
    ./memory/memory1/removable: 0
    ./memory/memory2/removable: 0
    ./memory/memory3/removable: 0
    ./memory/memory4/removable: 0
    ./memory/memory5/removable: 0
    ./memory/memory6/removable: 0
    ./memory/memory7/removable: 1
    ./memory/memory8/removable: 0
    ./memory/memory9/removable: 0
    ./memory/memory10/removable: 0
    ./memory/memory11/removable: 0
    ./memory/memory12/removable: 0
    ./memory/memory13/removable: 0
    ./memory/memory14/removable: 0
    ./memory/memory15/removable: 0
    ./memory/memory16/removable: 0
    ./memory/memory17/removable: 1
    ./memory/memory18/removable: 1
    ./memory/memory19/removable: 1
    ./memory/memory20/removable: 1
    ./memory/memory21/removable: 1
    ./memory/memory22/removable: 1

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Mel Gorman
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • - Change some naming
    * Magic -> types
    * MIX_INFO -> MIX_SECTION_INFO
    * Change definition of bootmem type from direct hex value

    - __free_pages_bootmem() becomes __meminit.

    Signed-off-by: Yasunori Goto
    Cc: Andy Whitcroft
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

09 Jun, 2008

1 commit

  • The ehea driver was recently changed[1] to use walk_memory_resource() to
    detect the system's memory layout. However, walk_memory_resource() is
    available only when memory hotplug is enabled. So CONFIG_EHEA was
    made to depend on MEMORY_HOTPLUG [2], but it is inappropriate for a
    network driver to have such a dependency.

    Make the declaration of walk_memory_resource() and its powerpc
    implementation (ehea is powerpc-specific) unconditionally available.

    [1] 48cfb14f8b89d4d5b3df6c16f08b258686fb12ad
    "ehea: Add DLPAR memory remove support"

    [2] fb7b6ca2b6b7c23b52be143bdd5f55a23b9780c8
    "ehea: Add dependency to Kconfig"

    Signed-off-by: Nathan Lynch
    Acked-by: Badari Pulavarty
    Signed-off-by: Paul Mackerras

    Nathan Lynch
     

28 Apr, 2008

2 commits

  • This patch set is to free pages which is allocated by bootmem for
    memory-hotremove. Some structures of memory management are allocated by
    bootmem. ex) memmap, etc.

    To remove memory physically, some of them must be freed according to
    circumstance. This patch set makes basis to free those pages, and free
    memmaps.

    Basic my idea is using remain members of struct page to remember information
    of users of bootmem (section number or node id). When the section is
    removing, kernel can confirm it. By this information, some issues can be
    solved.

    1) When the memmap of removing section is allocated on other
    section by bootmem, it should/can be free.
    2) When the memmap of removing section is allocated on the
    same section, it shouldn't be freed. Because the section has to be
    logical memory offlined already and all pages must be isolated against
    page allocater. If it is freed, page allocator may use it which will
    be removed physically soon.
    3) When removing section has other section's memmap,
    kernel will be able to show easily which section should be removed
    before it for user. (Not implemented yet)
    4) When the above case 2), the page isolation will be able to check and skip
    memmap's page when logical memory offline (offline_pages()).
    Current page isolation code fails in this case because this page is
    just reserved page and it can't distinguish this pages can be
    removed or not. But, it will be able to do by this patch.
    (Not implemented yet.)
    5) The node information like pgdat has similar issues. But, this
    will be able to be solved too by this.
    (Not implemented yet, but, remembering node id in the pages.)

    Fortunately, current bootmem allocator just keeps PageReserved flags,
    and doesn't use any other members of page struct. The users of
    bootmem doesn't use them too.

    This patch:

    This is to register information which is node or section's id. Kernel can
    distinguish which node/section uses the pages allcated by bootmem. This is
    basis for hot-remove sections or nodes.

    Signed-off-by: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • Generic helper function to remove section mappings and sysfs entries for the
    section of the memory we are removing. offline_pages() correctly adjusted
    zone and marked the pages reserved.

    TODO: Yasunori Goto is working on patches to free up allocations from bootmem.

    Signed-off-by: Badari Pulavarty
    Acked-by: Yasunori Goto
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

17 Oct, 2007

4 commits

  • Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
    This patch cleans up them. This is against 2.6.23-rc6-mm1.

    - fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case.
    - For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
    which returns -EINVAL.
    - removed remove_pages() only used in powerpc.
    - removed no-op remove_memory() in i386, sh, sparc64, x86_64.

    - only powerpc returns -ENOSYS at memory hot remove(no-op). changes it
    to return -EINVAL.

    Note:
    Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
    archs if there are requirements and testers.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Logic.
    - set all pages in [start,end) as isolated migration-type.
    by this, all free pages in the range will be not-for-use.
    - Migrate all LRU pages in the range.
    - Test all pages in the range's refcnt is zero or not.

    Todo:
    - allocate migration destination page from better area.
    - confirm page_count(page)== 0 && PageReserved(page) page is safe to be freed..
    (I don't like this kind of page but..
    - Find out pages which cannot be migrated.
    - more running tests.
    - Use reclaim for unplugging other memory type area.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • A clean up patch for "scanning memory resource [start, end)" operation.

    Now, find_next_system_ram() function is used in memory hotplug, but this
    interface is not easy to use and codes are complicated.

    This patch adds walk_memory_resouce(start,len,arg,func) function.
    The function 'func' is called per valid memory resouce range in [start,pfn).

    [pbadari@us.ibm.com: Error handling in walk_memory_resource()]
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • This patch cleans up duplicate includes in
    include/linux/memory_hotplug.h

    Signed-off-by: Jesper Juhl
    Acked-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     

01 Oct, 2006

1 commit


28 Jun, 2006

5 commits

  • This is a patch to allocate pgdat and per node data area for ia64. The size
    for them can be calculated by compute_pernodesize().

    Signed-off-by: Yasunori Goto
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • This is to refresh node_data[] array for ia64. As I mentioned previous
    patches, ia64 has copies of information of pgdat address array on each node as
    per node data.

    At v2 of node_add, this function used stop_machine_run() to update them. (I
    wished that they were copied safety as much as possible.) But, in this patch,
    this arrays are just copied simply, and set node_online_map bit after
    completion of pgdat initialization.

    So, kernel must touch NODE_DATA() macro after checking node_online_map().
    (Current code has already done it.) This is more simple way for just
    hot-add.....

    Note : It will be problem when hot-remove will occur,
    because, even if online_map bit is set, kernel may
    touch NODE_DATA() due to race condition. :-(

    Signed-off-by: Yasunori Goto
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • Refresh NODE_DATA() for generic archs. In this case, NODE_DATA(nid) ==
    node_data[nid]. node_data[] is array of address of pgdat. So, refresh is
    quite simple.

    Signed-off-by: Yasunori Goto
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Dave Hansen
    Cc: "Brown, Len"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • For node hotplug, basically we have to allocate new pgdat. But, there are
    several types of implementations of pgdat.

    1. Allocate only pgdat.
    This style allocate only pgdat area.
    And its address is recorded in node_data[].
    It is most popular style.

    2. Static array of pgdat
    In this case, all of pgdats are static array.
    Some archs use this style.

    3. Allocate not only pgdat, but also per node data.
    To increase performance, each node has copy of some data as
    a per node data. So, this area must be allocated too.

    Ia64 is this style. Ia64 has the copies of node_data[] array
    on each per node data to increase performance.

    In this series of patches, treat (1) as generic arch.

    generic archs can use generic function. (2) and (3) should have
    its own if necessary.

    This patch defines pgdat allocator.
    Updating NODE_DATA() macro function is in other patch.

    Signed-off-by: Yasonori Goto
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Dave Hansen
    Cc: "Brown, Len"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • Change the name of old add_memory() to arch_add_memory. And use node id to
    get pgdat for the node at NODE_DATA().

    Note: Powerpc's old add_memory() is defined as __devinit. However,
    add_memory() is usually called only after bootup.
    I suppose it may be redundant. But, I'm not well known about powerpc.
    So, I keep it. (But, __meminit is better at least.)

    Signed-off-by: Yasunori Goto
    Cc: Dave Hansen
    Cc: "Brown, Len"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

20 Apr, 2006

1 commit

  • We don't have to #if guard prototypes.

    This also fixes a bug observed by Randy Dunlap due to a misspelled
    option in the #if.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

10 Apr, 2006

1 commit


07 Mar, 2006

1 commit

  • include/linux/memory_hotplug.h:53: warning: 'struct page' declared inside parameter list

    (akpm: I tossed in a couple more possibly-needed-sometime struct decls too)

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

30 Oct, 2005

3 commits

  • This adds generic memory add/remove and supporting functions for memory
    hotplug into a new file as well as a memory hotplug kernel config option.

    Individual architecture patches will follow.

    For now, disable memory hotplug when swsusp is enabled. There's a lot of
    churn there right now. We'll fix it up properly once it calms down.

    Signed-off-by: Matt Tolentino
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • See the "fixup bad_range()" patch for more information, but this actually
    creates a the lock to protect things making assumptions about a zone's size
    staying constant at runtime.

    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • pgdat->node_size_lock is basically only neeeded in one place in the normal
    code: show_mem(), which is the arch-specific sysrq-m printing function.

    Strictly speaking, the architectures not doing memory hotplug do no need this
    locking in show_mem(). However, they are all included for completeness. This
    should also make any future consolidation of all of the implementations a
    little more straightforward.

    This lock is also held in the sparsemem code during a memory removal, as
    sections are invalidated. This is the place there pfn_valid() is made false
    for a memory area that's being removed. The lock is only required when doing
    pfn_valid() operations on memory which the user does not already have a
    reference on the page, such as in show_mem().

    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen