25 May, 2010

3 commits

  • Add global mutex zonelists_mutex to fix the possible race:

    CPU0 CPU1 CPU2
    (1) zone->present_pages += online_pages;
    (2) build_all_zonelists();
    (3) alloc_page();
    (4) free_page();
    (5) build_all_zonelists();
    (6) __build_all_zonelists();
    (7) zone->pageset = alloc_percpu();

    In step (3,4), zone->pageset still points to boot_pageset, so bad
    things may happen if 2+ nodes are in this state. Even if only 1 node
    is accessing the boot_pageset, (3) may still consume too much memory
    to fail the memory allocations in step (7).

    Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
    since there is a new fresh memory block added in step(6).

    [haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
    Signed-off-by: Haicheng Li
    Signed-off-by: Wu Fengguang
    Reviewed-by: Andi Kleen
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Haicheng Li
     
  • For each new populated zone of hotadded node, need to update its pagesets
    with dynamically allocated per_cpu_pageset struct for all possible CPUs:

    1) Detach zone->pageset from the shared boot_pageset
    at end of __build_all_zonelists().

    2) Use mutex to protect zone->pageset when it's still
    shared in onlined_pages()

    Otherwises, multiple zones of different nodes would share same boot strapping
    boot_pageset for same CPU, which will finally cause below kernel panic:

    ------------[ cut here ]------------
    kernel BUG at mm/page_alloc.c:1239!
    invalid opcode: 0000 [#1] SMP
    ...
    Call Trace:
    [] __alloc_pages_nodemask+0x131/0x7b0
    [] alloc_pages_current+0x87/0xd0
    [] __page_cache_alloc+0x67/0x70
    [] __do_page_cache_readahead+0x120/0x260
    [] ra_submit+0x21/0x30
    [] ondemand_readahead+0x166/0x2c0
    [] page_cache_async_readahead+0x80/0xa0
    [] generic_file_aio_read+0x364/0x670
    [] nfs_file_read+0xca/0x130
    [] do_sync_read+0xfa/0x140
    [] vfs_read+0xb5/0x1a0
    [] sys_read+0x51/0x80
    [] system_call_fastpath+0x16/0x1b
    RIP [] get_page_from_freelist+0x883/0x900
    RSP
    ---[ end trace 4bda28328b9990db ]

    [akpm@linux-foundation.org: merge fix]
    Signed-off-by: Haicheng Li
    Signed-off-by: Wu Fengguang
    Reviewed-by: Andi Kleen
    Reviewed-by: Christoph Lameter
    Cc: Mel Gorman
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Haicheng Li
     
  • Enable users to online CPUs even if the CPUs belongs to a numa node which
    doesn't have onlined local memory.

    The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
    in system boot/init period, or at the time of local memory online. For a
    numa node without onlined local memory, its zonelists are not initialized
    at present. As a result, any memory allocation operations executed by
    CPUs within this node will fail. In fact, an out-of-memory error is
    triggered when attempt to online CPUs before memory comes to online.

    This patch tries to create zonelists for such numa nodes, so that the
    memory allocation for this node can be fallback'ed to other nodes.

    [akpm@linux-foundation.org: remove unneeded export]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: minskey guo
    Cc: Minchan Kim
    Cc: Yasunori Goto
    Cc: Andi Kleen
    Cc: Christoph Lameter
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    minskey guo
     

13 Mar, 2010

1 commit

  • - introduce dump_page() to print the page info for debugging some error
    condition.

    - convert three mm users: bad_page(), print_bad_pte() and memory offline
    failure.

    - print an extra field: the symbolic names of page->flags

    Example dump_page() output:

    [ 157.521694] page:ffffea0000a7cba8 count:2 mapcount:1 mapping:ffff88001c901791 index:0x147
    [ 157.525570] page flags: 0x100000000100068(uptodate|lru|active|swapbacked)

    Signed-off-by: Wu Fengguang
    Cc: Ingo Molnar
    Cc: Alex Chiang
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

07 Mar, 2010

1 commit

  • A memmap is a directory in sysfs which includes 3 text files: start, end
    and type. For example:

    start: 0x100000
    end: 0x7e7b1cff
    type: System RAM

    Interface firmware_map_add was not called explicitly. Remove it and add
    function firmware_map_add_hotplug as hotplug interface of memmap.

    Each memory entry has a memmap in sysfs, When we hot-add new memory, sysfs
    does not export memmap entry for it. We add a call in function add_memory
    to function firmware_map_add_hotplug.

    Add a new function add_sysfs_fw_map_entry() to create memmap entry, it
    will be called when initialize memmap and hot-add memory.

    [akpm@linux-foundation.org: un-kernedoc a no longer kerneldoc comment]
    Signed-off-by: Shaohui Zheng
    Acked-by: Andi Kleen
    Acked-by: Yasunori Goto
    Reviewed-by: Wu Fengguang
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@linux-foundation.org
     

16 Dec, 2009

5 commits

  • __free_pages_bootmem() is a __meminit function - which has been called
    from put_pages_bootmem thus causes a section mismatch warning.

    We were warned by the following warning:

    LD mm/built-in.o
    WARNING: mm/built-in.o(.text+0x26b22): Section mismatch in reference
    from the function put_page_bootmem() to the function
    .meminit.text:__free_pages_bootmem()
    The function put_page_bootmem() references
    the function __meminit __free_pages_bootmem().
    This is often because put_page_bootmem lacks a __meminit
    annotation or the annotation of __free_pages_bootmem is wrong.

    Signed-off-by: Rakib Mullick
    Cc: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rakib Mullick
     
  • It has no references outside memory_hotplug.c.

    Cc: "Rafael J. Wysocki"
    Cc: Andi Kleen
    Cc: Gerald Schaefer
    Cc: KOSAKI Motohiro
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The previous patch enables page migration of ksm pages, but that soon gets
    into trouble: not surprising, since we're using the ksm page lock to lock
    operations on its stable_node, but page migration switches the page whose
    lock is to be used for that. Another layer of locking would fix it, but
    do we need that yet?

    Do we actually need page migration of ksm pages? Yes, memory hotremove
    needs to offline sections of memory: and since we stopped allocating ksm
    pages with GFP_HIGHUSER, they will tend to be GFP_HIGHUSER_MOVABLE
    candidates for migration.

    But KSM is currently unconscious of NUMA issues, happily merging pages
    from different NUMA nodes: at present the rule must be, not to use
    MADV_MERGEABLE where you care about NUMA. So no, NUMA page migration of
    ksm pages does not make sense yet.

    So, to complete support for ksm swapping we need to make hotremove safe.
    ksm_memory_callback() take ksm_thread_mutex when MEM_GOING_OFFLINE and
    release it when MEM_OFFLINE or MEM_CANCEL_OFFLINE. But if mapped pages
    are freed before migration reaches them, stable_nodes may be left still
    pointing to struct pages which have been removed from the system: the
    stable_node needs to identify a page by pfn rather than page pointer, then
    it can safely prune them when MEM_OFFLINE.

    And make NUMA migration skip PageKsm pages where it skips PageReserved.
    But it's only when we reach unmap_and_move() that the page lock is taken
    and we can be sure that raised pagecount has prevented a PageAnon from
    being upgraded: so add offlining arg to migrate_pages(), to migrate ksm
    page when offlining (has sufficient locking) but reject it otherwise.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • When memory is hot-removed, its node must be cleared in N_HIGH_MEMORY if
    there are no present pages left.

    In such a situation, kswapd must also be stopped since it has nothing left
    to do.

    Signed-off-by: David Rientjes
    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: Yasunori Goto
    Cc: Mel Gorman
    Cc: Rafael J. Wysocki
    Cc: Rik van Riel
    Cc: KAMEZAWA Hiroyuki
    Cc: Lee Schermerhorn
    Cc: Mel Gorman
    Cc: Randy Dunlap
    Cc: Nishanth Aravamudan
    Cc: Andi Kleen
    Cc: David Rientjes
    Cc: Adam Litke
    Cc: Andy Whitcroft
    Cc: Eric Whitney
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Christoph pointed out inc_zone_page_state(NR_ISOLATED) should be placed
    in right after isolate_page().

    This patch does it.

    Reviewed-by: Christoph Lameter
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

18 Nov, 2009

2 commits

  • Allow memory hotplug and hibernation in the same kernel

    Memory hotplug and hibernation were exclusive in Kconfig. This is
    obviously a problem for distribution kernels who want to support both in
    the same image.

    After some discussions with Rafael and others the only problem is with
    parallel memory hotadd or removal while a hibernation operation is in
    process. It was also working for s390 before.

    This patch removes the Kconfig level exclusion, and simply makes the
    memory add / remove functions grab the pm_mutex to exclude against
    hibernation.

    Fixes a regression - old kernels didn't exclude memory hotadd and
    hibernation.

    Signed-off-by: Andi Kleen
    Cc: Gerald Schaefer
    Cc: KOSAKI Motohiro
    Cc: Yasunori Goto
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • With CONFIG_MEMORY_HOTPLUG I got following warning:

    WARNING: vmlinux.o(.text+0x1276b0): Section mismatch in reference from
    the function hotadd_new_pgdat() to the function
    .meminit.text:free_area_init_node()
    The function hotadd_new_pgdat() references
    the function __meminit free_area_init_node().
    This is often because hotadd_new_pgdat lacks a __meminit
    annotation or the annotation of free_area_init_node is wrong.

    Use __ref to fix this.

    Signed-off-by: Hidetoshi Seto
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidetoshi Seto
     

23 Sep, 2009

1 commit

  • Originally, walk_memory_resource() was introduced to traverse all memory
    of "System RAM" for detecting memory hotplug/unplug range. For doing so,
    flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
    memory hotplug.

    But for using other purpose, /proc/kcore, this may includes some firmware
    area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the
    check strict to find out busy "System RAM".

    Note: PPC64 keeps their own walk_memory_resouce(), which walk through
    ppc64's lmb informaton. Because old kclist_add() is called per lmb, this
    patch makes no difference in behavior, finally.

    And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
    Because pfn_valid() just show "there is memmap or not* and cannot be used
    for "there is physical memory or not", this function is useful in generic
    to scan physical memory range.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Cc: Américo Wang
    Cc: David Rientjes
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

22 Sep, 2009

2 commits

  • Sizing of memory allocations shouldn't depend on the number of physical
    pages found in a system, as that generally includes (perhaps a huge amount
    of) non-RAM pages. The amount of what actually is usable as storage
    should instead be used as a basis here.

    In line with that, the memory hotplug code should update num_physpages in
    a way that it retains its original (post-boot) meaning; in particular,
    decreasing the value should at best be done with great care - this patch
    doesn't try to ever decrease this value at all as it doesn't really seem
    meaningful to do so.

    Signed-off-by: Jan Beulich
    Acked-by: Rusty Russell
    Cc: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: Dave Hansen
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • In my test, 128M memory is hot added, but zone's pcp batch is 0, which is
    an obvious error. When pages are onlined, zone pcp should be updated
    accordingly.

    [akpm@linux-foundation.org: fix warnings]
    Signed-off-by: Shaohua Li
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: Yakui Zhao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

17 Jun, 2009

2 commits

  • Solve two problems.

    Whenever memory hotplug sucessfully happens, zone->present_pages
    have to be changed.

    1) Now memory hotplug calls setup_per_zone_wmark_min only when
    online_pages called, not offline_pages.

    It breaks balance.

    2) If zone->present_pages is changed, we also have to change
    zone->inactive_ratio. That's because inactive_ratio depends on
    zone->present_pages.

    Signed-off-by: Minchan Kim
    Acked-by: Yasunori Goto
    Cc: Rik van Riel
    Cc: KOSAKI Motohiro
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Change the names of two functions. It doesn't affect behavior.

    Presently, setup_per_zone_pages_min() changes low, high of zone as well as
    min. So a better name is setup_per_zone_wmarks(). That's because Mel
    changed zone->pages_[hig/low/min] to zone->watermark array in "page
    allocator: replace the watermark-related union in struct zone with a
    watermark[] array".

    * setup_per_zone_pages_min => setup_per_zone_wmarks

    Of course, we have to change init_per_zone_pages_min, too. There are not
    pages_min any more.

    * init_per_zone_pages_min => init_per_zone_wmark_min

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Minchan Kim
    Acked-by: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

07 Jan, 2009

2 commits

  • GFP_HIGHUSER_PAGECACHE is just an alias for GFP_HIGHUSER_MOVABLE, making
    that harder to track down: remove it, and its out-of-work brothers
    GFP_NOFS_PAGECACHE and GFP_USER_PAGECACHE.

    Since we're making that improvement to hotremove_migrate_alloc(), I think
    we can now also remove one of the "o"s from its comment.

    Signed-off-by: Hugh Dickins
    Acked-by: Mel Gorman
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Show node to memory section relationship with symlinks in sysfs

    Add /sys/devices/system/node/nodeX/memoryY symlinks for all
    the memory sections located on nodeX. For example:
    /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
    indicates that memory section 135 resides on node1.

    Also revises documentation to cover this change as well as updating
    Documentation/ABI/testing/sysfs-devices-memory to include descriptions
    of memory hotremove files 'phys_device', 'phys_index', and 'state'
    that were previously not described there.

    In addition to it always being a good policy to provide users with
    the maximum possible amount of physical location information for
    resources that can be hot-added and/or hot-removed, the following
    are some (but likely not all) of the user benefits provided by
    this change.
    Immediate:
    - Provides information needed to determine the specific node
    on which a defective DIMM is located. This will reduce system
    downtime when the node or defective DIMM is swapped out.
    - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM. This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node. The consequences of reintroducing the defective memory
    could be ugly.
    - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
    Future:
    - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

    Symlink creation during boot was tested on 2-node x86_64, 2-node
    ppc64, and 2-node ia64 systems. Symlink creation during physical
    memory hot-add tested on a 2-node x86_64 system.

    Signed-off-by: Gary Hade
    Signed-off-by: Badari Pulavarty
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gary Hade
     

01 Dec, 2008

1 commit


20 Nov, 2008

1 commit

  • After adding a node into the machine, top cpuset's mems isn't updated.

    By reviewing the code, we found that the update function

    cpuset_track_online_nodes()

    was invoked after node_states[N_ONLINE] changes. It is wrong because
    N_ONLINE just means node has pgdat, and if node has/added memory, we use
    N_HIGH_MEMORY. So, We should invoke the update function after
    node_states[N_HIGH_MEMORY] changes, just like its commit says.

    This patch fixes it. And we use notifier of memory hotplug instead of
    direct calling of cpuset_track_online_nodes().

    Signed-off-by: Miao Xie
    Acked-by: Yasunori Goto
    Cc: David Rientjes
    Cc: Paul Menage
    Signed-off-by: Linus Torvalds

    Miao Xie
     

20 Oct, 2008

3 commits

  • During hotplug memory remove, memory regions should be released on a
    PAGES_PER_SECTION size chunks. This mirrors the code in add_memory where
    resources are requested on a PAGES_PER_SECTION size.

    Attempting to release the entire memory region fails because there is not
    a single resource for the total number of pages being removed. Instead
    the resources for the pages are split in PAGES_PER_SECTION size chunks as
    requested during memory add.

    Signed-off-by: Nathan Fontenot
    Signed-off-by: Badari Pulavarty
    Acked-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nathan Fontenot
     
  • On large memory systems, the VM can spend way too much time scanning
    through pages that it cannot (or should not) evict from memory. Not only
    does it use up CPU time, but it also provokes lock contention and can
    leave large systems under memory presure in a catatonic state.

    This patch series improves VM scalability by:

    1) putting filesystem backed, swap backed and unevictable pages
    onto their own LRUs, so the system only scans the pages that it
    can/should evict from memory

    2) switching to two handed clock replacement for the anonymous LRUs,
    so the number of pages that need to be scanned when the system
    starts swapping is bound to a reasonable number

    3) keeping unevictable pages off the LRU completely, so the
    VM does not waste CPU time scanning them. ramfs, ramdisk,
    SHM_LOCKED shared memory segments and mlock()ed VMA pages
    are keept on the unevictable list.

    This patch:

    isolate_lru_page logically belongs to be in vmscan.c than migrate.c.

    It is tough, because we don't need that function without memory migration
    so there is a valid argument to have it in migrate.c. However a
    subsequent patch needs to make use of it in the core mm, so we can happily
    move it to vmscan.c.

    Also, make the function a little more generic by not requiring that it
    adds an isolated page to a given list. Callers can do that.

    Note that we now have '__isolate_lru_page()', that does
    something quite different, visible outside of vmscan.c
    for use with memory controller. Methinks we need to
    rationalize these names/purposes. --lts

    [akpm@linux-foundation.org: fix mm/memory_hotplug.c build]
    Signed-off-by: Nick Piggin
    Signed-off-by: Rik van Riel
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • There is nothing architecture specific about remove_memory().
    remove_memory() function is common for all architectures which support
    hotplug memory remove. Instead of duplicating it in every architecture,
    collapse them into arch neutral function.

    [akpm@linux-foundation.org: fix the export]
    Signed-off-by: Badari Pulavarty
    Cc: Yasunori Goto
    Cc: Gary Hade
    Cc: Mel Gorman
    Cc: Yasunori Goto
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

25 Jul, 2008

5 commits

  • Memory may be hot-removed on a per-memory-block basis, particularly on
    POWER where the SPARSEMEM section size often matches the memory-block
    size. A user-level agent must be able to identify which sections of
    memory are likely to be removable before attempting the potentially
    expensive operation. This patch adds a file called "removable" to the
    memory directory in sysfs to help such an agent. In this patch, a memory
    block is considered removable if;

    o It contains only MOVABLE pageblocks
    o It contains only pageblocks with free pages regardless of pageblock type

    On the other hand, a memory block starting with a PageReserved() page will
    never be considered removable. Without this patch, the user-agent is
    forced to choose a memory block to remove randomly.

    Sample output of the sysfs files:

    ./memory/memory0/removable: 0
    ./memory/memory1/removable: 0
    ./memory/memory2/removable: 0
    ./memory/memory3/removable: 0
    ./memory/memory4/removable: 0
    ./memory/memory5/removable: 0
    ./memory/memory6/removable: 0
    ./memory/memory7/removable: 1
    ./memory/memory8/removable: 0
    ./memory/memory9/removable: 0
    ./memory/memory10/removable: 0
    ./memory/memory11/removable: 0
    ./memory/memory12/removable: 0
    ./memory/memory13/removable: 0
    ./memory/memory14/removable: 0
    ./memory/memory15/removable: 0
    ./memory/memory16/removable: 0
    ./memory/memory17/removable: 1
    ./memory/memory18/removable: 1
    ./memory/memory19/removable: 1
    ./memory/memory20/removable: 1
    ./memory/memory21/removable: 1
    ./memory/memory22/removable: 1

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Mel Gorman
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • If zonelist is required to be rebuilt in online_pages(), there is no need
    to recalculate vm_total_pages in that function, as it has been updated in
    the call build_all_zonelists().

    Signed-off-by: Kent Liu
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Yasunori Goto
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Liu
     
  • - Change some naming
    * Magic -> types
    * MIX_INFO -> MIX_SECTION_INFO
    * Change definition of bootmem type from direct hex value

    - __free_pages_bootmem() becomes __meminit.

    Signed-off-by: Yasunori Goto
    Cc: Andy Whitcroft
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • Make the needlessly global register_page_bootmem_info_section() static.

    Signed-off-by: Adrian Bunk
    Acked-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • free_area_init_node() gets passed in the node id as well as the node
    descriptor. This is redundant as the function can trivially get the node
    descriptor itself by means of NODE_DATA() and the node's id.

    I checked all the users and NODE_DATA() seems to be usable everywhere
    from where this function is called.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

15 May, 2008

3 commits

  • Trying to online a new memory section that was added via memory hotplug
    sometimes results in crashes when the new pages are added via __free_page.
    Reason for that is that the pageblock bitmap isn't initialized and hence
    contains random stuff. That means that get_pageblock_migratetype()
    returns also random stuff and therefore

    list_add(&page->lru,
    &zone->free_area[order].free_list[migratetype]);

    in __free_one_page() tries to do a list_add to something that isn't even
    necessarily a list.

    This happens since 86051ca5eaf5e560113ec7673462804c54284456 ("mm: fix
    usemap initialization") which makes sure that the pageblock bitmap gets
    only initialized for pages present in a zone. Unfortunately for hot-added
    memory the zones "grow" after the memmap and the pageblock memmap have
    been initialized. Which means that the new pages have an unitialized
    bitmap. To solve this the calls to grow_zone_span() and grow_pgdat_span()
    are moved to __add_zone() just before the initialization happens.

    The patch also moves the two functions since __add_zone() is the only
    caller and I didn't want to add a forward declaration.

    Signed-off-by: Heiko Carstens
    Cc: Andy Whitcroft
    Cc: Dave Hansen
    Cc: Gerald Schaefer
    Cc: KAMEZAWA Hiroyuki
    Cc: Yasunori Goto
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Add a check to online_pages() to test for failure of
    walk_memory_resource(). This fixes a condition where a failure
    of walk_memory_resource() can lead to online_pages() returning
    success without the requested pages being onlined.

    Signed-off-by: Geoff Levand
    Cc: Yasunori Goto
    Cc: KAMEZAWA Hiroyuki
    Cc: Dave Hansen
    Cc: Keith Mannthey
    Cc: Christoph Lameter
    Cc: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geoff Levand
     
  • __add_zone calls memmap_init_zone twice if memory gets attached to an empty
    zone. Once via init_currently_empty_zone and once explictly right after that
    call.

    Looks like this is currently not a bug, however the call is superfluous and
    might lead to subtle bugs if memmap_init_zone gets changed. So make sure it
    is called only once.

    Cc: Yasunori Goto
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Dave Hansen
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

29 Apr, 2008

1 commit

  • This patch fixes the following compile error caused by commit
    04753278769f3b6c3b79a080edb52f21d83bf6e2 ("memory hotplug: register
    section/node id to free"):

    CC mm/memory_hotplug.o
    /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/memory_hotplug.c: In function ‘put_page_bootmem’:
    /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/memory_hotplug.c:82: error: implicit declaration of function ‘__free_pages_bootmem’
    /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/memory_hotplug.c: At top level:
    /home/bunk/linux/kernel-2.6/git/linux-2.6/mm/memory_hotplug.c:87: warning: no previous prototype for ‘register_page_bootmem_info_section’
    make[2]: *** [mm/memory_hotplug.o] Error 1

    [ Andrew: "Argh. The -mm-only memory-hotplug-add-removable-to-sysfs-
    to-show-memblock-removability.patch debugging patch adds that include
    so nobody hit this before. ]

    Signed-off-by: Adrian Bunk
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

28 Apr, 2008

4 commits

  • This patch is to free memmaps which is allocated by bootmem.

    Freeing usemap is not necessary. The pages of usemap may be necessary for
    other sections.

    If removing section is last section on the node, its section is the final user
    of usemap page. (usemaps are allocated on its section by previous patch.) But
    it shouldn't be freed too, because the section must be logical offline state
    which all pages are isolated against page allocater. If it is freed, page
    alloctor may use it which will be removed physically soon. It will be
    disaster. So, this patch keeps it as it is.

    Signed-off-by: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • This patch set is to free pages which is allocated by bootmem for
    memory-hotremove. Some structures of memory management are allocated by
    bootmem. ex) memmap, etc.

    To remove memory physically, some of them must be freed according to
    circumstance. This patch set makes basis to free those pages, and free
    memmaps.

    Basic my idea is using remain members of struct page to remember information
    of users of bootmem (section number or node id). When the section is
    removing, kernel can confirm it. By this information, some issues can be
    solved.

    1) When the memmap of removing section is allocated on other
    section by bootmem, it should/can be free.
    2) When the memmap of removing section is allocated on the
    same section, it shouldn't be freed. Because the section has to be
    logical memory offlined already and all pages must be isolated against
    page allocater. If it is freed, page allocator may use it which will
    be removed physically soon.
    3) When removing section has other section's memmap,
    kernel will be able to show easily which section should be removed
    before it for user. (Not implemented yet)
    4) When the above case 2), the page isolation will be able to check and skip
    memmap's page when logical memory offline (offline_pages()).
    Current page isolation code fails in this case because this page is
    just reserved page and it can't distinguish this pages can be
    removed or not. But, it will be able to do by this patch.
    (Not implemented yet.)
    5) The node information like pgdat has similar issues. But, this
    will be able to be solved too by this.
    (Not implemented yet, but, remembering node id in the pages.)

    Fortunately, current bootmem allocator just keeps PageReserved flags,
    and doesn't use any other members of page struct. The users of
    bootmem doesn't use them too.

    This patch:

    This is to register information which is node or section's id. Kernel can
    distinguish which node/section uses the pages allcated by bootmem. This is
    basis for hot-remove sections or nodes.

    Signed-off-by: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • All architectures use an effectively identical definition of online_page(), so
    just make it common code. x86-64, ia64, powerpc and sh are actually
    identical; x86-32 is slightly different.

    x86-32's differences arise because it puts its hotplug pages in the highmem
    zone. We can handle this in the generic code by inspecting the page to see if
    its in highmem, and update the totalhigh_pages count appropriately. This
    leaves init_32.c:free_new_highpage with a single caller, so I folded it into
    add_one_highpage_init.

    I also removed an incorrect comment referring to the NUMA case; any NUMA
    details have already been dealt with by the time online_page() is called.

    [akpm@linux-foundation.org: fix indenting]
    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Dave Hansen
    Reviewed-by: KAMEZAWA Hiroyuki
    Tested-by: KAMEZAWA Hiroyuki
    Cc: Yasunori Goto
    Cc: Christoph Lameter
    Acked-by: Ingo Molnar
    Acked-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeremy Fitzhardinge
     
  • Generic helper function to remove section mappings and sysfs entries for the
    section of the memory we are removing. offline_pages() correctly adjusted
    zone and marked the pages reserved.

    TODO: Yasunori Goto is working on patches to free up allocations from bootmem.

    Signed-off-by: Badari Pulavarty
    Acked-by: Yasunori Goto
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

20 Apr, 2008

1 commit


06 Feb, 2008

1 commit

  • - Add comments explaing how drain_pages() works.

    - Eliminate useless functions

    - Rename drain_all_local_pages to drain_all_pages(). It does drain
    all pages not only those of the local processor.

    - Eliminate useless interrupt off / on sequences. drain_pages()
    disables interrupts on its own. The execution thread is
    pinned to processor by the caller. So there is no need to
    disable interrupts.

    - Put drain_all_pages() declaration in gfp.h and remove the
    declarations from suspend.h and from mm/memory_hotplug.c

    - Make software suspend call drain_all_pages(). The draining
    of processor local pages is may not the right approach if
    software suspend wants to support SMP. If they call drain_all_pages
    then we can make drain_pages() static.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Christoph Lameter
    Acked-by: Mel Gorman
    Cc: "Rafael J. Wysocki"
    Cc: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

15 Nov, 2007

1 commit

  • i386 and x86-64 registers System RAM as IORESOURCE_MEM | IORESOURCE_BUSY.

    But ia64 registers it as IORESOURCE_MEM only.
    In addition, memory hotplug code registers new memory as IORESOURCE_MEM too.

    This difference causes a failure of memory unplug of x86-64. This patch
    fixes it.

    This patch adds IORESOURCE_BUSY to avoid potential overlap mapping by PCI
    device.

    Signed-off-by: Yasunori Goto
    Signed-off-by: Badari Pulavarty
    Cc: Luck, Tony"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto