13 Jan, 2012

1 commit

  • This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
    mode that avoids writing back pages to backing storage. Async compaction
    maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
    For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
    used.

    This avoids sync compaction stalling for an excessive length of time,
    particularly when copying files to a USB stick where there might be a
    large number of dirty pages backed by a filesystem that does not support
    ->writepages.

    [aarcange@redhat.com: This patch is heavily based on Andrea's work]
    [akpm@linux-foundation.org: fix fs/nfs/write.c build]
    [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

31 Oct, 2011

1 commit


26 Jul, 2011

1 commit

  • This patch contains online_page_callback and apropriate functions for
    registering/unregistering online page callbacks. It allows to do some
    machine specific tasks during online page stage which is required to
    implement memory hotplug in virtual machines. Currently this patch is
    required by latest memory hotplug support for Xen balloon driver patch
    which will be posted soon.

    Additionally, originial online_page() function was splited into
    following functions doing "atomic" operations:

    - __online_page_set_limits() - set new limits for memory management code,
    - __online_page_increment_counters() - increment totalram_pages and totalhigh_pages,
    - __online_page_free() - free page to allocator.

    It was done to:
    - not duplicate existing code,
    - ease hotplug code devolpment by usage of well defined interface,
    - avoid stupid bugs which are unavoidable when the same code
    (by design) is developed in many places.

    [akpm@linux-foundation.org: use explicit indirect-call syntax]
    Signed-off-by: Daniel Kiper
    Reviewed-by: Konrad Rzeszutek Wilk
    Cc: Ian Campbell
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Kiper
     

23 Jun, 2011

2 commits

  • Commit 959ecc48fc75 ("mm/memory_hotplug.c: fix building of node hotplug
    zonelist") does not protect the build_all_zonelists() call with
    zonelists_mutex as needed. This can lead to races in constructing
    zonelist ordering if a concurrent build is underway. Protecting this
    with lock_memory_hotplug() is insufficient since zonelists can be
    rebuild though sysfs as well.

    Signed-off-by: David Rientjes
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • The error handling in mem_online_node() is incorrect: hotadd_new_pgdat()
    returns NULL if the new pgdat could not have been allocated and a pointer
    to it otherwise.

    mem_online_node() should fail if hotadd_new_pgdat() fails, not the
    inverse. This fixes an issue when memoryless nodes are not onlined and
    their sysfs interface is not registered when their first cpu is brought
    up.

    The bug was introduced by commit cf23422b9d76 ("cpu/mem hotplug: enable
    CPUs online before local memory online") iow v2.6.35.

    Signed-off-by: David Rientjes
    Reviewed-by: KOSAKI Motohiro
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    David Rientjes
     

16 Jun, 2011

1 commit

  • During memory hotplug we refresh zonelists when we online a page in a new
    zone. It means that the node's zonelist is not initialized until pages
    are onlined. So for example, "nid" passed by MEM_GOING_ONLINE notifier
    will point to NODE_DATA(nid) which has no zone fallback list. Moreover,
    if we hot-add cpu-only nodes, alloc_pages() will do no fallback.

    This patch makes a zonelist when a new pgdata is available.

    Note: in production, at fujitsu, memory should be onlined before cpu
    and our server didn't have any memory-less nodes and had no problems.

    But recent changes in MEM_GOING_ONLINE+page_cgroup
    will access not initialized zonelist of node.
    Anyway, there are memory-less node and we need some care.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

25 May, 2011

4 commits

  • online_pages() is only compiled for CONFIG_MEMORY_HOTPLUG_SPARSE, so there
    is no need to support CONFIG_FLATMEM code within it.

    This patch removes code that is never used.

    Signed-off-by: Daniel Kiper
    Acked-by: Dave Hansen
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Kiper
     
  • isolate_lru_page() must be called only with stable reference to page. So,
    let's grab normal page reference.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Currently, memory hotplug calls setup_per_zone_wmarks() and
    calculate_zone_inactive_ratio(), but doesn't call
    setup_per_zone_lowmem_reserve().

    It means the number of reserved pages aren't updated even if memory hot
    plug occur. This patch fixes it.

    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: KAMEZAWA Hiroyuki
    Acked-by: Mel Gorman
    Reviewed-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Commit bce7394a3e ("page-allocator: reset wmark_min and inactive ratio of
    zone when hotplug happens") introduced invalid section references. Now,
    setup_per_zone_inactive_ratio() is marked __init and then it can't be
    referenced from memory hotplug code.

    This patch marks it as __meminit and also marks caller as __ref.

    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Minchan Kim
    Cc: Yasunori Goto
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

15 Apr, 2011

1 commit

  • If CONFIG_FLATMEM is enabled pfn is calculated in online_page() more than
    once. It is possible to optimize that and use value established at
    beginning of that function.

    Signed-off-by: Daniel Kiper
    Acked-by: Dave Hansen
    Cc: KAMEZAWA Hiroyuki
    Cc: Wu Fengguang
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Acked-by: David Rientjes
    Reviewed-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Kiper
     

31 Mar, 2011

1 commit


15 Jan, 2011

1 commit


14 Jan, 2011

3 commits

  • PG_buddy can be converted to _mapcount == -2. So the PG_compound_lock can
    be added to page->flags without overflowing (because of the sparse section
    bits increasing) with CONFIG_X86_PAE=y and CONFIG_X86_PAT=y. This also
    has to move the memory hotplug code from _mapcount to lru.next to avoid
    any risk of clashes. We can't use lru.next for PG_buddy removal, but
    memory hotplug can use lru.next even more easily than the mapcount
    instead.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • With the introduction of the boolean sync parameter, the API looks a
    little inconsistent as offlining is still an int. Convert offlining to a
    bool for the sake of being tidy.

    Signed-off-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • …ompaction in the faster path

    Migration synchronously waits for writeback if the initial passes fails.
    Callers of memory compaction do not necessarily want this behaviour if the
    caller is latency sensitive or expects that synchronous migration is not
    going to have a significantly better success rate.

    This patch adds a sync parameter to migrate_pages() allowing the caller to
    indicate if wait_on_page_writeback() is allowed within migration or not.
    For reclaim/compaction, try_to_compact_pages() is first called
    asynchronously, direct reclaim runs and then try_to_compact_pages() is
    called synchronously as there is a greater expectation that it'll succeed.

    [akpm@linux-foundation.org: build/merge fix]
    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Rik van Riel <riel@redhat.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Andy Whitcroft <apw@shadowen.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

11 Jan, 2011

1 commit

  • Now, memory_hotplug_(un)lock() is used for add/remove/offline pages
    for avoiding races with hibernation. But this should be held in
    online_pages(), too. It seems asymmetric.

    There are cases where one has to avoid a race with memory hotplug
    notifier and his own local code, and hotplug v.s. hotplug.
    This will add a generic solution for avoiding races. In other view,
    having lock here has no big impacts. online pages is tend to be
    done by udev script at el against each memory section one by one.

    Then, it's better to have lock here, too.

    Cc: # 2.6.37
    Reviewed-by: Christoph Lameter
    Acked-by: David Rientjes
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Pekka Enberg

    KAMEZAWA Hiroyuki
     

03 Dec, 2010

1 commit

  • Presently hwpoison is using lock_system_sleep() to prevent a race with
    memory hotplug. However lock_system_sleep() is a no-op if
    CONFIG_HIBERNATION=n. Therefore we need a new lock.

    Signed-off-by: KOSAKI Motohiro
    Cc: Andi Kleen
    Cc: Kamezawa Hiroyuki
    Suggested-by: Hugh Dickins
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

27 Oct, 2010

6 commits

  • Simple code for reducing list_empty(&source) check.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • If not_managed is true all pages will be putback to lru, so break the loop
    earlier to skip other pages isolate.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • Reported-by: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Now, sysfs interface of memory hotplug shows whether the section is
    removable or not. But it checks only migrateype of pages and doesn't
    check details of cluster of pages.

    Next, memory hotplug's set_migratetype_isolate() has the same kind of
    check, too.

    This patch adds the function __count_unmovable_pages() and makes above 2
    checks to use the same logic. Then, is_removable and hotremove code uses
    the same logic. No changes in the hotremove logic itself.

    TODO: need to find a way to check RECLAMABLE. But, considering bit,
    calling shrink_slab() against a range before starting memory hotremove
    sounds better. If so, this patch's logic doesn't need to be changed.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Reported-by: Michal Hocko
    Cc: Wu Fengguang
    Cc: Mel Gorman
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Presently update_nr_listpages() doesn't have a role. That's because lists
    passed is always empty just after calling migrate_pages. The
    migrate_pages cleans up page list which have failed to migrate before
    returning by aaa994b3.

    [PATCH] page migration: handle freeing of pages in migrate_pages()

    Do not leave pages on the lists passed to migrate_pages(). Seems that we will
    not need any postprocessing of pages. This will simplify the handling of
    pages by the callers of migrate_pages().

    At that time, we thought we don't need any postprocessing of pages. But
    the situation is changed. The compaction need to know the number of
    failed to migrate for COMPACTPAGEFAILED stat

    This patch makes new rule for caller of migrate_pages to call
    putback_lru_pages. So caller need to clean up the lists so it has a
    chance to postprocess the pages. [suggested by Christoph Lameter]

    Signed-off-by: Minchan Kim
    Cc: Hugh Dickins
    Cc: Andi Kleen
    Reviewed-by: Mel Gorman
    Reviewed-by: Wu Fengguang
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • scan_lru_pages returns pfn. So, it's type should be "unsigned long"
    not "int".

    Note: I guess this has been work until now because memory hotplug tester's
    machine has not very big memory....
    physical address < 32bit << PAGE_SHIFT.

    Reported-by: KOSAKI Motohiro
    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: KOSAKI Motohiro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

19 Oct, 2010

1 commit

  • lru_add_drain_all() uses schedule_on_each_cpu() which is synchronous.
    There is no reason to call flush_scheduled_work() after
    lru_add_drain_all(). Drop the spurious calls.

    This is to prepare for the deprecation and removal of
    flush_scheduled_work().

    Signed-off-by: Tejun Heo
    Acked-by: KAMEZAWA Hiroyuki
    Reviewed-by: Minchan Kim
    Acked-by: Mel Gorman

    Tejun Heo
     

10 Sep, 2010

1 commit

  • next_active_pageblock() is for finding next _used_ freeblock. It skips
    several blocks when it finds there are a chunk of free pages lager than
    pageblock. But it has 2 bugs.

    1. We have no lock. page_order(page) - pageblock_order can be minus.
    2. pageblocks_stride += is wrong. it should skip page_order(p) of pages.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Michal Hocko
    Cc: Wu Fengguang
    Cc: Mel Gorman
    Cc: KAMEZAWA Hiroyuki
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

25 May, 2010

3 commits

  • Add global mutex zonelists_mutex to fix the possible race:

    CPU0 CPU1 CPU2
    (1) zone->present_pages += online_pages;
    (2) build_all_zonelists();
    (3) alloc_page();
    (4) free_page();
    (5) build_all_zonelists();
    (6) __build_all_zonelists();
    (7) zone->pageset = alloc_percpu();

    In step (3,4), zone->pageset still points to boot_pageset, so bad
    things may happen if 2+ nodes are in this state. Even if only 1 node
    is accessing the boot_pageset, (3) may still consume too much memory
    to fail the memory allocations in step (7).

    Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
    since there is a new fresh memory block added in step(6).

    [haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
    Signed-off-by: Haicheng Li
    Signed-off-by: Wu Fengguang
    Reviewed-by: Andi Kleen
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Haicheng Li
     
  • For each new populated zone of hotadded node, need to update its pagesets
    with dynamically allocated per_cpu_pageset struct for all possible CPUs:

    1) Detach zone->pageset from the shared boot_pageset
    at end of __build_all_zonelists().

    2) Use mutex to protect zone->pageset when it's still
    shared in onlined_pages()

    Otherwises, multiple zones of different nodes would share same boot strapping
    boot_pageset for same CPU, which will finally cause below kernel panic:

    ------------[ cut here ]------------
    kernel BUG at mm/page_alloc.c:1239!
    invalid opcode: 0000 [#1] SMP
    ...
    Call Trace:
    [] __alloc_pages_nodemask+0x131/0x7b0
    [] alloc_pages_current+0x87/0xd0
    [] __page_cache_alloc+0x67/0x70
    [] __do_page_cache_readahead+0x120/0x260
    [] ra_submit+0x21/0x30
    [] ondemand_readahead+0x166/0x2c0
    [] page_cache_async_readahead+0x80/0xa0
    [] generic_file_aio_read+0x364/0x670
    [] nfs_file_read+0xca/0x130
    [] do_sync_read+0xfa/0x140
    [] vfs_read+0xb5/0x1a0
    [] sys_read+0x51/0x80
    [] system_call_fastpath+0x16/0x1b
    RIP [] get_page_from_freelist+0x883/0x900
    RSP
    ---[ end trace 4bda28328b9990db ]

    [akpm@linux-foundation.org: merge fix]
    Signed-off-by: Haicheng Li
    Signed-off-by: Wu Fengguang
    Reviewed-by: Andi Kleen
    Reviewed-by: Christoph Lameter
    Cc: Mel Gorman
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Haicheng Li
     
  • Enable users to online CPUs even if the CPUs belongs to a numa node which
    doesn't have onlined local memory.

    The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
    in system boot/init period, or at the time of local memory online. For a
    numa node without onlined local memory, its zonelists are not initialized
    at present. As a result, any memory allocation operations executed by
    CPUs within this node will fail. In fact, an out-of-memory error is
    triggered when attempt to online CPUs before memory comes to online.

    This patch tries to create zonelists for such numa nodes, so that the
    memory allocation for this node can be fallback'ed to other nodes.

    [akpm@linux-foundation.org: remove unneeded export]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: minskey guo
    Cc: Minchan Kim
    Cc: Yasunori Goto
    Cc: Andi Kleen
    Cc: Christoph Lameter
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    minskey guo
     

13 Mar, 2010

1 commit

  • - introduce dump_page() to print the page info for debugging some error
    condition.

    - convert three mm users: bad_page(), print_bad_pte() and memory offline
    failure.

    - print an extra field: the symbolic names of page->flags

    Example dump_page() output:

    [ 157.521694] page:ffffea0000a7cba8 count:2 mapcount:1 mapping:ffff88001c901791 index:0x147
    [ 157.525570] page flags: 0x100000000100068(uptodate|lru|active|swapbacked)

    Signed-off-by: Wu Fengguang
    Cc: Ingo Molnar
    Cc: Alex Chiang
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

07 Mar, 2010

1 commit

  • A memmap is a directory in sysfs which includes 3 text files: start, end
    and type. For example:

    start: 0x100000
    end: 0x7e7b1cff
    type: System RAM

    Interface firmware_map_add was not called explicitly. Remove it and add
    function firmware_map_add_hotplug as hotplug interface of memmap.

    Each memory entry has a memmap in sysfs, When we hot-add new memory, sysfs
    does not export memmap entry for it. We add a call in function add_memory
    to function firmware_map_add_hotplug.

    Add a new function add_sysfs_fw_map_entry() to create memmap entry, it
    will be called when initialize memmap and hot-add memory.

    [akpm@linux-foundation.org: un-kernedoc a no longer kerneldoc comment]
    Signed-off-by: Shaohui Zheng
    Acked-by: Andi Kleen
    Acked-by: Yasunori Goto
    Reviewed-by: Wu Fengguang
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@linux-foundation.org
     

16 Dec, 2009

5 commits

  • __free_pages_bootmem() is a __meminit function - which has been called
    from put_pages_bootmem thus causes a section mismatch warning.

    We were warned by the following warning:

    LD mm/built-in.o
    WARNING: mm/built-in.o(.text+0x26b22): Section mismatch in reference
    from the function put_page_bootmem() to the function
    .meminit.text:__free_pages_bootmem()
    The function put_page_bootmem() references
    the function __meminit __free_pages_bootmem().
    This is often because put_page_bootmem lacks a __meminit
    annotation or the annotation of __free_pages_bootmem is wrong.

    Signed-off-by: Rakib Mullick
    Cc: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rakib Mullick
     
  • It has no references outside memory_hotplug.c.

    Cc: "Rafael J. Wysocki"
    Cc: Andi Kleen
    Cc: Gerald Schaefer
    Cc: KOSAKI Motohiro
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The previous patch enables page migration of ksm pages, but that soon gets
    into trouble: not surprising, since we're using the ksm page lock to lock
    operations on its stable_node, but page migration switches the page whose
    lock is to be used for that. Another layer of locking would fix it, but
    do we need that yet?

    Do we actually need page migration of ksm pages? Yes, memory hotremove
    needs to offline sections of memory: and since we stopped allocating ksm
    pages with GFP_HIGHUSER, they will tend to be GFP_HIGHUSER_MOVABLE
    candidates for migration.

    But KSM is currently unconscious of NUMA issues, happily merging pages
    from different NUMA nodes: at present the rule must be, not to use
    MADV_MERGEABLE where you care about NUMA. So no, NUMA page migration of
    ksm pages does not make sense yet.

    So, to complete support for ksm swapping we need to make hotremove safe.
    ksm_memory_callback() take ksm_thread_mutex when MEM_GOING_OFFLINE and
    release it when MEM_OFFLINE or MEM_CANCEL_OFFLINE. But if mapped pages
    are freed before migration reaches them, stable_nodes may be left still
    pointing to struct pages which have been removed from the system: the
    stable_node needs to identify a page by pfn rather than page pointer, then
    it can safely prune them when MEM_OFFLINE.

    And make NUMA migration skip PageKsm pages where it skips PageReserved.
    But it's only when we reach unmap_and_move() that the page lock is taken
    and we can be sure that raised pagecount has prevented a PageAnon from
    being upgraded: so add offlining arg to migrate_pages(), to migrate ksm
    page when offlining (has sufficient locking) but reject it otherwise.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • When memory is hot-removed, its node must be cleared in N_HIGH_MEMORY if
    there are no present pages left.

    In such a situation, kswapd must also be stopped since it has nothing left
    to do.

    Signed-off-by: David Rientjes
    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: Yasunori Goto
    Cc: Mel Gorman
    Cc: Rafael J. Wysocki
    Cc: Rik van Riel
    Cc: KAMEZAWA Hiroyuki
    Cc: Lee Schermerhorn
    Cc: Mel Gorman
    Cc: Randy Dunlap
    Cc: Nishanth Aravamudan
    Cc: Andi Kleen
    Cc: David Rientjes
    Cc: Adam Litke
    Cc: Andy Whitcroft
    Cc: Eric Whitney
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Christoph pointed out inc_zone_page_state(NR_ISOLATED) should be placed
    in right after isolate_page().

    This patch does it.

    Reviewed-by: Christoph Lameter
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

18 Nov, 2009

2 commits

  • Allow memory hotplug and hibernation in the same kernel

    Memory hotplug and hibernation were exclusive in Kconfig. This is
    obviously a problem for distribution kernels who want to support both in
    the same image.

    After some discussions with Rafael and others the only problem is with
    parallel memory hotadd or removal while a hibernation operation is in
    process. It was also working for s390 before.

    This patch removes the Kconfig level exclusion, and simply makes the
    memory add / remove functions grab the pm_mutex to exclude against
    hibernation.

    Fixes a regression - old kernels didn't exclude memory hotadd and
    hibernation.

    Signed-off-by: Andi Kleen
    Cc: Gerald Schaefer
    Cc: KOSAKI Motohiro
    Cc: Yasunori Goto
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • With CONFIG_MEMORY_HOTPLUG I got following warning:

    WARNING: vmlinux.o(.text+0x1276b0): Section mismatch in reference from
    the function hotadd_new_pgdat() to the function
    .meminit.text:free_area_init_node()
    The function hotadd_new_pgdat() references
    the function __meminit free_area_init_node().
    This is often because hotadd_new_pgdat lacks a __meminit
    annotation or the annotation of free_area_init_node is wrong.

    Use __ref to fix this.

    Signed-off-by: Hidetoshi Seto
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidetoshi Seto
     

23 Sep, 2009

1 commit

  • Originally, walk_memory_resource() was introduced to traverse all memory
    of "System RAM" for detecting memory hotplug/unplug range. For doing so,
    flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
    memory hotplug.

    But for using other purpose, /proc/kcore, this may includes some firmware
    area marked as IORESOURCE_BUSY | IORESOUCE_MEM. This patch makes the
    check strict to find out busy "System RAM".

    Note: PPC64 keeps their own walk_memory_resouce(), which walk through
    ppc64's lmb informaton. Because old kclist_add() is called per lmb, this
    patch makes no difference in behavior, finally.

    And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
    Because pfn_valid() just show "there is memmap or not* and cannot be used
    for "there is physical memory or not", this function is useful in generic
    to scan physical memory range.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Cc: Américo Wang
    Cc: David Rientjes
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

22 Sep, 2009

1 commit

  • Sizing of memory allocations shouldn't depend on the number of physical
    pages found in a system, as that generally includes (perhaps a huge amount
    of) non-RAM pages. The amount of what actually is usable as storage
    should instead be used as a basis here.

    In line with that, the memory hotplug code should update num_physpages in
    a way that it retains its original (post-boot) meaning; in particular,
    decreasing the value should at best be done with great care - this patch
    doesn't try to ever decrease this value at all as it doesn't really seem
    meaningful to do so.

    Signed-off-by: Jan Beulich
    Acked-by: Rusty Russell
    Cc: Yasunori Goto
    Cc: Badari Pulavarty
    Cc: Minchan Kim
    Cc: Mel Gorman
    Cc: Dave Hansen
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich