03 Dec, 2010

1 commit

  • Presently hwpoison is using lock_system_sleep() to prevent a race with
    memory hotplug. However lock_system_sleep() is a no-op if
    CONFIG_HIBERNATION=n. Therefore we need a new lock.

    Signed-off-by: KOSAKI Motohiro
    Cc: Andi Kleen
    Cc: Kamezawa Hiroyuki
    Suggested-by: Hugh Dickins
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

27 Oct, 2010

6 commits

  • Simple code for reducing list_empty(&source) check.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • If not_managed is true all pages will be putback to lru, so break the loop
    earlier to skip other pages isolate.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • Reported-by: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Now, sysfs interface of memory hotplug shows whether the section is
    removable or not. But it checks only migrateype of pages and doesn't
    check details of cluster of pages.

    Next, memory hotplug's set_migratetype_isolate() has the same kind of
    check, too.

    This patch adds the function __count_unmovable_pages() and makes above 2
    checks to use the same logic. Then, is_removable and hotremove code uses
    the same logic. No changes in the hotremove logic itself.

    TODO: need to find a way to check RECLAMABLE. But, considering bit,
    calling shrink_slab() against a range before starting memory hotremove
    sounds better. If so, this patch's logic doesn't need to be changed.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Reported-by: Michal Hocko
    Cc: Wu Fengguang
    Cc: Mel Gorman
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Presently update_nr_listpages() doesn't have a role. That's because lists
    passed is always empty just after calling migrate_pages. The
    migrate_pages cleans up page list which have failed to migrate before
    returning by aaa994b3.

    [PATCH] page migration: handle freeing of pages in migrate_pages()

    Do not leave pages on the lists passed to migrate_pages(). Seems that we will
    not need any postprocessing of pages. This will simplify the handling of
    pages by the callers of migrate_pages().

    At that time, we thought we don't need any postprocessing of pages. But
    the situation is changed. The compaction need to know the number of
    failed to migrate for COMPACTPAGEFAILED stat

    This patch makes new rule for caller of migrate_pages to call
    putback_lru_pages. So caller need to clean up the lists so it has a
    chance to postprocess the pages. [suggested by Christoph Lameter]

    Signed-off-by: Minchan Kim
    Cc: Hugh Dickins
    Cc: Andi Kleen
    Reviewed-by: Mel Gorman
    Reviewed-by: Wu Fengguang
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • scan_lru_pages returns pfn. So, it's type should be "unsigned long"
    not "int".

    Note: I guess this has been work until now because memory hotplug tester's
    machine has not very big memory....
    physical address < 32bit << PAGE_SHIFT.

    Reported-by: KOSAKI Motohiro
    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: KOSAKI Motohiro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

19 Oct, 2010

1 commit

  • lru_add_drain_all() uses schedule_on_each_cpu() which is synchronous.
    There is no reason to call flush_scheduled_work() after
    lru_add_drain_all(). Drop the spurious calls.

    This is to prepare for the deprecation and removal of
    flush_scheduled_work().

    Signed-off-by: Tejun Heo
    Acked-by: KAMEZAWA Hiroyuki
    Reviewed-by: Minchan Kim
    Acked-by: Mel Gorman

    Tejun Heo
     

10 Sep, 2010

1 commit

  • next_active_pageblock() is for finding next _used_ freeblock. It skips
    several blocks when it finds there are a chunk of free pages lager than
    pageblock. But it has 2 bugs.

    1. We have no lock. page_order(page) - pageblock_order can be minus.
    2. pageblocks_stride += is wrong. it should skip page_order(p) of pages.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Michal Hocko
    Cc: Wu Fengguang
    Cc: Mel Gorman
    Cc: KAMEZAWA Hiroyuki
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

25 May, 2010

3 commits

  • Add global mutex zonelists_mutex to fix the possible race:

    CPU0 CPU1 CPU2
    (1) zone->present_pages += online_pages;
    (2) build_all_zonelists();
    (3) alloc_page();
    (4) free_page();
    (5) build_all_zonelists();
    (6) __build_all_zonelists();
    (7) zone->pageset = alloc_percpu();

    In step (3,4), zone->pageset still points to boot_pageset, so bad
    things may happen if 2+ nodes are in this state. Even if only 1 node
    is accessing the boot_pageset, (3) may still consume too much memory
    to fail the memory allocations in step (7).

    Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
    since there is a new fresh memory block added in step(6).

    [haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
    Signed-off-by: Haicheng Li
    Signed-off-by: Wu Fengguang
    Reviewed-by: Andi Kleen
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Haicheng Li
     
  • For each new populated zone of hotadded node, need to update its pagesets
    with dynamically allocated per_cpu_pageset struct for all possible CPUs:

    1) Detach zone->pageset from the shared boot_pageset
    at end of __build_all_zonelists().

    2) Use mutex to protect zone->pageset when it's still
    shared in onlined_pages()

    Otherwises, multiple zones of different nodes would share same boot strapping
    boot_pageset for same CPU, which will finally cause below kernel panic:

    ------------[ cut here ]------------
    kernel BUG at mm/page_alloc.c:1239!
    invalid opcode: 0000 [#1] SMP
    ...
    Call Trace:
    [] __alloc_pages_nodemask+0x131/0x7b0
    [] alloc_pages_current+0x87/0xd0
    [] __page_cache_alloc+0x67/0x70
    [] __do_page_cache_readahead+0x120/0x260
    [] ra_submit+0x21/0x30
    [] ondemand_readahead+0x166/0x2c0
    [] page_cache_async_readahead+0x80/0xa0
    [] generic_file_aio_read+0x364/0x670
    [] nfs_file_read+0xca/0x130
    [] do_sync_read+0xfa/0x140
    [] vfs_read+0xb5/0x1a0
    [] sys_read+0x51/0x80
    [] system_call_fastpath+0x16/0x1b
    RIP [] get_page_from_freelist+0x883/0x900
    RSP
    ---[ end trace 4bda28328b9990db ]

    [akpm@linux-foundation.org: merge fix]
    Signed-off-by: Haicheng Li
    Signed-off-by: Wu Fengguang
    Reviewed-by: Andi Kleen
    Reviewed-by: Christoph Lameter
    Cc: Mel Gorman
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Haicheng Li
     
  • Enable users to online CPUs even if the CPUs belongs to a numa node which
    doesn't have onlined local memory.

    The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
    in system boot/init period, or at the time of local memory online. For a
    numa node without onlined local memory, its zonelists are not initialized
    at present. As a result, any memory allocation operations executed by
    CPUs within this node will fail. In fact, an out-of-memory error is
    triggered when attempt to online CPUs before memory comes to online.

    This patch tries to create zonelists for such numa nodes, so that the
    memory allocation for this node can be fallback'ed to other nodes.

    [akpm@linux-foundation.org: remove unneeded export]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: minskey guo
    Cc: Minchan Kim
    Cc: Yasunori Goto
    Cc: Andi Kleen
    Cc: Christoph Lameter
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    minskey guo
     

13 Mar, 2010

1 commit

  • - introduce dump_page() to print the page info for debugging some error
    condition.

    - convert three mm users: bad_page(), print_bad_pte() and memory offline
    failure.

    - print an extra field: the symbolic names of page->flags

    Example dump_page() output:

    [ 157.521694] page:ffffea0000a7cba8 count:2 mapcount:1 mapping:ffff88001c901791 index:0x147
    [ 157.525570] page flags: 0x100000000100068(uptodate|lru|active|swapbacked)

    Signed-off-by: Wu Fengguang
    Cc: Ingo Molnar
    Cc: Alex Chiang
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

07 Mar, 2010

1 commit

  • A memmap is a directory in sysfs which includes 3 text files: start, end
    and type. For example:

    start: 0x100000
    end: 0x7e7b1cff
    type: System RAM

    Interface firmware_map_add was not called explicitly. Remove it and add
    function firmware_map_add_hotplug as hotplug interface of memmap.

    Each memory entry has a memmap in sysfs, When we hot-add new memory, sysfs
    does not export memmap entry for it. We add a call in function add_memory
    to function firmware_map_add_hotplug.

    Add a new function add_sysfs_fw_map_entry() to create memmap entry, it
    will be called when initialize memmap and hot-add memory.

    [akpm@linux-foundation.org: un-kernedoc a no longer kerneldoc comment]
    Signed-off-by: Shaohui Zheng
    Acked-by: Andi Kleen
    Acked-by: Yasunori Goto
    Reviewed-by: Wu Fengguang
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@linux-foundation.org
     

16 Dec, 2009

5 commits

  • __free_pages_bootmem() is a __meminit function - which has been called
    from put_pages_bootmem thus causes a section mismatch warning.

    We were warned by the following warning:

    LD mm/built-in.o
    WARNING: mm/built-in.o(.text+0x26b22): Section mismatch in reference
    from the function put_page_bootmem() to the function
    .meminit.text:__free_pages_bootmem()
    The function put_page_bootmem() references
    the function __meminit __free_pages_bootmem().
    This is often because put_page_bootmem lacks a __meminit
    annotation or the annotation of __free_pages_bootmem is wrong.

    Signed-off-by: Rakib Mullick
    Cc: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rakib Mullick
     
  • It has no references outside memory_hotplug.c.

    Cc: "Rafael J. Wysocki"
    Cc: Andi Kleen
    Cc: Gerald Schaefer
    Cc: KOSAKI Motohiro
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The previous patch enables page migration of ksm pages, but that soon gets
    into trouble: not surprising, since we're using the ksm page lock to lock
    operations on its stable_node, but page migration switches the page whose
    lock is to be used for that. Another layer of locking would fix it, but
    do we need that yet?

    Do we actually need page migration of ksm pages? Yes, memory hotremove
    needs to offline sections of memory: and since we stopped allocating ksm
    pages with GFP_HIGHUSER, they will tend to be GFP_HIGHUSER_MOVABLE
    candidates for migration.

    But KSM is currently unconscious of NUMA issues, happily merging pages
    from different NUMA nodes: at present the rule must be, not to use
    MADV_MERGEABLE where you care about NUMA. So no, NUMA page migration of
    ksm pages does not make sense yet.

    So, to complete support for ksm swapping we need to make hotremove safe.
    ksm_memory_callback() take ksm_thread_mutex when MEM_GOING_OFFLINE and
    release it when MEM_OFFLINE or MEM_CANCEL_OFFLINE. But if mapped pages
    are freed before migration reaches them, stable_nodes may be left still
    pointing to struct pages which have been removed from the system: the
    stable_node needs to identify a page by pfn rather than page pointer, then
    it can safely prune them when MEM_OFFLINE.

    And make NUMA migration skip PageKsm pages where it skips PageReserved.
    But it's only when we reach unmap_and_move() that the page lock is taken
    and we can be sure that raised pagecount has prevented a PageAnon from
    being upgraded: so add offlining arg to migrate_pages(), to migrate ksm
    page when offlining (has sufficient locking) but reject it otherwise.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • When memory is hot-removed, its node must be cleared in N_HIGH_MEMORY if
    there are no present pages left.

    In such a situation, kswapd must also be stopped since it has nothing left
    to do.

    Signed-off-by: David Rientjes
    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: Yasunori Goto
    Cc: Mel Gorman
    Cc: Rafael J. Wysocki
    Cc: Rik van Riel
    Cc: KAMEZAWA Hiroyuki
    Cc: Lee Schermerhorn
    Cc: Mel Gorman
    Cc: Randy Dunlap
    Cc: Nishanth Aravamudan
    Cc: Andi Kleen
    Cc: David Rientjes
    Cc: Adam Litke
    Cc: Andy Whitcroft
    Cc: Eric Whitney
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Christoph pointed out inc_zone_page_state(NR_ISOLATED) should be placed
    in right after isolate_page().

    This patch does it.

    Reviewed-by: Christoph Lameter
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

18 Nov, 2009

2 commits

  • Allow memory hotplug and hibernation in the same kernel

    Memory hotplug and hibernation were exclusive in Kconfig. This is
    obviously a problem for distribution kernels who want to support both in
    the same image.

    After some discussions with Rafael and others the only problem is with
    parallel memory hotadd or removal while a hibernation operation is in
    process. It was also working for s390 before.

    This patch removes the Kconfig level exclusion, and simply makes the
    memory add / remove functions grab the pm_mutex to exclude against
    hibernation.

    Fixes a regression - old kernels didn't exclude memory hotadd and
    hibernation.

    Signed-off-by: Andi Kleen
    Cc: Gerald Schaefer
    Cc: KOSAKI Motohiro
    Cc: Yasunori Goto
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • With CONFIG_MEMORY_HOTPLUG I got following warning:

    WARNING: vmlinux.o(.text+0x1276b0): Section mismatch in reference from
    the function hotadd_new_pgdat() to the function
    .meminit.text:free_area_init_node()
    The function hotadd_new_pgdat() references
    the function __meminit free_area_init_node().
    This is often because hotadd_new_pgdat lacks a __meminit
    annotation or the annotation of free_area_init_node is wrong.

    Use __ref to fix this.

    Signed-off-by: Hidetoshi Seto
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidetoshi Seto
     

23 Sep, 2009

1 commit

  • Originally, walk_memory_resource() was introduced to traverse all memory
    of "System RAM" for detecting memory hotplug/unplug range. For doing so,
    flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
    memory hotplug.

    But for using other purpose, /proc/kcore, this may includes some firmware
    area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the
    check strict to find out busy "System RAM".

    Note: PPC64 keeps their own walk_memory_resouce(), which walk through
    ppc64's lmb informaton. Because old kclist_add() is called per lmb, this
    patch makes no difference in behavior, finally.

    And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
    Because pfn_valid() just show "there is memmap or not* and cannot be used
    for "there is physical memory or not", this function is useful in generic
    to scan physical memory range.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Cc: Américo Wang
    Cc: David Rientjes
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

22 Sep, 2009

2 commits

  • Sizing of memory allocations shouldn't depend on the number of physical
    pages found in a system, as that generally includes (perhaps a huge amount
    of) non-RAM pages. The amount of what actually is usable as storage
    should instead be used as a basis here.

    In line with that, the memory hotplug code should update num_physpages in
    a way that it retains its original (post-boot) meaning; in particular,
    decreasing the value should at best be done with great care - this patch
    doesn't try to ever decrease this value at all as it doesn't really seem
    meaningful to do so.

    Signed-off-by: Jan Beulich
    Acked-by: Rusty Russell
    Cc: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: Dave Hansen
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • In my test, 128M memory is hot added, but zone's pcp batch is 0, which is
    an obvious error. When pages are onlined, zone pcp should be updated
    accordingly.

    [akpm@linux-foundation.org: fix warnings]
    Signed-off-by: Shaohua Li
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: Yakui Zhao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

17 Jun, 2009

2 commits

  • Solve two problems.

    Whenever memory hotplug sucessfully happens, zone->present_pages
    have to be changed.

    1) Now memory hotplug calls setup_per_zone_wmark_min only when
    online_pages called, not offline_pages.

    It breaks balance.

    2) If zone->present_pages is changed, we also have to change
    zone->inactive_ratio. That's because inactive_ratio depends on
    zone->present_pages.

    Signed-off-by: Minchan Kim
    Acked-by: Yasunori Goto
    Cc: Rik van Riel
    Cc: KOSAKI Motohiro
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Change the names of two functions. It doesn't affect behavior.

    Presently, setup_per_zone_pages_min() changes low, high of zone as well as
    min. So a better name is setup_per_zone_wmarks(). That's because Mel
    changed zone->pages_[hig/low/min] to zone->watermark array in "page
    allocator: replace the watermark-related union in struct zone with a
    watermark[] array".

    * setup_per_zone_pages_min => setup_per_zone_wmarks

    Of course, we have to change init_per_zone_pages_min, too. There are not
    pages_min any more.

    * init_per_zone_pages_min => init_per_zone_wmark_min

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Minchan Kim
    Acked-by: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

07 Jan, 2009

2 commits

  • GFP_HIGHUSER_PAGECACHE is just an alias for GFP_HIGHUSER_MOVABLE, making
    that harder to track down: remove it, and its out-of-work brothers
    GFP_NOFS_PAGECACHE and GFP_USER_PAGECACHE.

    Since we're making that improvement to hotremove_migrate_alloc(), I think
    we can now also remove one of the "o"s from its comment.

    Signed-off-by: Hugh Dickins
    Acked-by: Mel Gorman
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Show node to memory section relationship with symlinks in sysfs

    Add /sys/devices/system/node/nodeX/memoryY symlinks for all
    the memory sections located on nodeX. For example:
    /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
    indicates that memory section 135 resides on node1.

    Also revises documentation to cover this change as well as updating
    Documentation/ABI/testing/sysfs-devices-memory to include descriptions
    of memory hotremove files 'phys_device', 'phys_index', and 'state'
    that were previously not described there.

    In addition to it always being a good policy to provide users with
    the maximum possible amount of physical location information for
    resources that can be hot-added and/or hot-removed, the following
    are some (but likely not all) of the user benefits provided by
    this change.
    Immediate:
    - Provides information needed to determine the specific node
    on which a defective DIMM is located. This will reduce system
    downtime when the node or defective DIMM is swapped out.
    - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM. This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node. The consequences of reintroducing the defective memory
    could be ugly.
    - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
    Future:
    - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

    Symlink creation during boot was tested on 2-node x86_64, 2-node
    ppc64, and 2-node ia64 systems. Symlink creation during physical
    memory hot-add tested on a 2-node x86_64 system.

    Signed-off-by: Gary Hade
    Signed-off-by: Badari Pulavarty
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gary Hade
     

01 Dec, 2008

1 commit


20 Nov, 2008

1 commit

  • After adding a node into the machine, top cpuset's mems isn't updated.

    By reviewing the code, we found that the update function

    cpuset_track_online_nodes()

    was invoked after node_states[N_ONLINE] changes. It is wrong because
    N_ONLINE just means node has pgdat, and if node has/added memory, we use
    N_HIGH_MEMORY. So, We should invoke the update function after
    node_states[N_HIGH_MEMORY] changes, just like its commit says.

    This patch fixes it. And we use notifier of memory hotplug instead of
    direct calling of cpuset_track_online_nodes().

    Signed-off-by: Miao Xie
    Acked-by: Yasunori Goto
    Cc: David Rientjes
    Cc: Paul Menage
    Signed-off-by: Linus Torvalds

    Miao Xie
     

20 Oct, 2008

3 commits

  • During hotplug memory remove, memory regions should be released on a
    PAGES_PER_SECTION size chunks. This mirrors the code in add_memory where
    resources are requested on a PAGES_PER_SECTION size.

    Attempting to release the entire memory region fails because there is not
    a single resource for the total number of pages being removed. Instead
    the resources for the pages are split in PAGES_PER_SECTION size chunks as
    requested during memory add.

    Signed-off-by: Nathan Fontenot
    Signed-off-by: Badari Pulavarty
    Acked-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nathan Fontenot
     
  • On large memory systems, the VM can spend way too much time scanning
    through pages that it cannot (or should not) evict from memory. Not only
    does it use up CPU time, but it also provokes lock contention and can
    leave large systems under memory presure in a catatonic state.

    This patch series improves VM scalability by:

    1) putting filesystem backed, swap backed and unevictable pages
    onto their own LRUs, so the system only scans the pages that it
    can/should evict from memory

    2) switching to two handed clock replacement for the anonymous LRUs,
    so the number of pages that need to be scanned when the system
    starts swapping is bound to a reasonable number

    3) keeping unevictable pages off the LRU completely, so the
    VM does not waste CPU time scanning them. ramfs, ramdisk,
    SHM_LOCKED shared memory segments and mlock()ed VMA pages
    are keept on the unevictable list.

    This patch:

    isolate_lru_page logically belongs to be in vmscan.c than migrate.c.

    It is tough, because we don't need that function without memory migration
    so there is a valid argument to have it in migrate.c. However a
    subsequent patch needs to make use of it in the core mm, so we can happily
    move it to vmscan.c.

    Also, make the function a little more generic by not requiring that it
    adds an isolated page to a given list. Callers can do that.

    Note that we now have '__isolate_lru_page()', that does
    something quite different, visible outside of vmscan.c
    for use with memory controller. Methinks we need to
    rationalize these names/purposes. --lts

    [akpm@linux-foundation.org: fix mm/memory_hotplug.c build]
    Signed-off-by: Nick Piggin
    Signed-off-by: Rik van Riel
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • There is nothing architecture specific about remove_memory().
    remove_memory() function is common for all architectures which support
    hotplug memory remove. Instead of duplicating it in every architecture,
    collapse them into arch neutral function.

    [akpm@linux-foundation.org: fix the export]
    Signed-off-by: Badari Pulavarty
    Cc: Yasunori Goto
    Cc: Gary Hade
    Cc: Mel Gorman
    Cc: Yasunori Goto
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

25 Jul, 2008

5 commits

  • Memory may be hot-removed on a per-memory-block basis, particularly on
    POWER where the SPARSEMEM section size often matches the memory-block
    size. A user-level agent must be able to identify which sections of
    memory are likely to be removable before attempting the potentially
    expensive operation. This patch adds a file called "removable" to the
    memory directory in sysfs to help such an agent. In this patch, a memory
    block is considered removable if;

    o It contains only MOVABLE pageblocks
    o It contains only pageblocks with free pages regardless of pageblock type

    On the other hand, a memory block starting with a PageReserved() page will
    never be considered removable. Without this patch, the user-agent is
    forced to choose a memory block to remove randomly.

    Sample output of the sysfs files:

    ./memory/memory0/removable: 0
    ./memory/memory1/removable: 0
    ./memory/memory2/removable: 0
    ./memory/memory3/removable: 0
    ./memory/memory4/removable: 0
    ./memory/memory5/removable: 0
    ./memory/memory6/removable: 0
    ./memory/memory7/removable: 1
    ./memory/memory8/removable: 0
    ./memory/memory9/removable: 0
    ./memory/memory10/removable: 0
    ./memory/memory11/removable: 0
    ./memory/memory12/removable: 0
    ./memory/memory13/removable: 0
    ./memory/memory14/removable: 0
    ./memory/memory15/removable: 0
    ./memory/memory16/removable: 0
    ./memory/memory17/removable: 1
    ./memory/memory18/removable: 1
    ./memory/memory19/removable: 1
    ./memory/memory20/removable: 1
    ./memory/memory21/removable: 1
    ./memory/memory22/removable: 1

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Mel Gorman
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • If zonelist is required to be rebuilt in online_pages(), there is no need
    to recalculate vm_total_pages in that function, as it has been updated in
    the call build_all_zonelists().

    Signed-off-by: Kent Liu
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Yasunori Goto
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Liu
     
  • - Change some naming
    * Magic -> types
    * MIX_INFO -> MIX_SECTION_INFO
    * Change definition of bootmem type from direct hex value

    - __free_pages_bootmem() becomes __meminit.

    Signed-off-by: Yasunori Goto
    Cc: Andy Whitcroft
    Cc: Badari Pulavarty
    Cc: Yinghai Lu
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • Make the needlessly global register_page_bootmem_info_section() static.

    Signed-off-by: Adrian Bunk
    Acked-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • free_area_init_node() gets passed in the node id as well as the node
    descriptor. This is redundant as the function can trivially get the node
    descriptor itself by means of NODE_DATA() and the node's id.

    I checked all the users and NODE_DATA() seems to be usable everywhere
    from where this function is called.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

15 May, 2008

2 commits

  • Trying to online a new memory section that was added via memory hotplug
    sometimes results in crashes when the new pages are added via __free_page.
    Reason for that is that the pageblock bitmap isn't initialized and hence
    contains random stuff. That means that get_pageblock_migratetype()
    returns also random stuff and therefore

    list_add(&page->lru,
    &zone->free_area[order].free_list[migratetype]);

    in __free_one_page() tries to do a list_add to something that isn't even
    necessarily a list.

    This happens since 86051ca5eaf5e560113ec7673462804c54284456 ("mm: fix
    usemap initialization") which makes sure that the pageblock bitmap gets
    only initialized for pages present in a zone. Unfortunately for hot-added
    memory the zones "grow" after the memmap and the pageblock memmap have
    been initialized. Which means that the new pages have an unitialized
    bitmap. To solve this the calls to grow_zone_span() and grow_pgdat_span()
    are moved to __add_zone() just before the initialization happens.

    The patch also moves the two functions since __add_zone() is the only
    caller and I didn't want to add a forward declaration.

    Signed-off-by: Heiko Carstens
    Cc: Andy Whitcroft
    Cc: Dave Hansen
    Cc: Gerald Schaefer
    Cc: KAMEZAWA Hiroyuki
    Cc: Yasunori Goto
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Add a check to online_pages() to test for failure of
    walk_memory_resource(). This fixes a condition where a failure
    of walk_memory_resource() can lead to online_pages() returning
    success without the requested pages being onlined.

    Signed-off-by: Geoff Levand
    Cc: Yasunori Goto
    Cc: KAMEZAWA Hiroyuki
    Cc: Dave Hansen
    Cc: Keith Mannthey
    Cc: Christoph Lameter
    Cc: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geoff Levand