24 Feb, 2013

1 commit

  • No functional change, but the only purpose of the offlining argument to
    migrate_pages() etc, was to ensure that __unmap_and_move() could migrate a
    KSM page for memory hotremove (which took ksm_thread_mutex) but not for
    other callers. Now all cases are safe, remove the arg.

    Signed-off-by: Hugh Dickins
    Cc: Rik van Riel
    Cc: Petr Holasek
    Cc: Andrea Arcangeli
    Cc: Izik Eidus
    Cc: Gerald Schaefer
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

17 Dec, 2012

1 commit

  • Pull Automatic NUMA Balancing bare-bones from Mel Gorman:
    "There are three implementations for NUMA balancing, this tree
    (balancenuma), numacore which has been developed in tip/master and
    autonuma which is in aa.git.

    In almost all respects balancenuma is the dumbest of the three because
    its main impact is on the VM side with no attempt to be smart about
    scheduling. In the interest of getting the ball rolling, it would be
    desirable to see this much merged for 3.8 with the view to building
    scheduler smarts on top and adapting the VM where required for 3.9.

    The most recent set of comparisons available from different people are

    mel: https://lkml.org/lkml/2012/12/9/108
    mingo: https://lkml.org/lkml/2012/12/7/331
    tglx: https://lkml.org/lkml/2012/12/10/437
    srikar: https://lkml.org/lkml/2012/12/10/397

    The results are a mixed bag. In my own tests, balancenuma does
    reasonably well. It's dumb as rocks and does not regress against
    mainline. On the other hand, Ingo's tests shows that balancenuma is
    incapable of converging for this workloads driven by perf which is bad
    but is potentially explained by the lack of scheduler smarts. Thomas'
    results show balancenuma improves on mainline but falls far short of
    numacore or autonuma. Srikar's results indicate we all suffer on a
    large machine with imbalanced node sizes.

    My own testing showed that recent numacore results have improved
    dramatically, particularly in the last week but not universally.
    We've butted heads heavily on system CPU usage and high levels of
    migration even when it shows that overall performance is better.
    There are also cases where it regresses. Of interest is that for
    specjbb in some configurations it will regress for lower numbers of
    warehouses and show gains for higher numbers which is not reported by
    the tool by default and sometimes missed in treports. Recently I
    reported for numacore that the JVM was crashing with
    NullPointerExceptions but currently it's unclear what the source of
    this problem is. Initially I thought it was in how numacore batch
    handles PTEs but I'm no longer think this is the case. It's possible
    numacore is just able to trigger it due to higher rates of migration.

    These reports were quite late in the cycle so I/we would like to start
    with this tree as it contains much of the code we can agree on and has
    not changed significantly over the last 2-3 weeks."

    * tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits)
    mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
    mm/rmap: Convert the struct anon_vma::mutex to an rwsem
    mm: migrate: Account a transhuge page properly when rate limiting
    mm: numa: Account for failed allocations and isolations as migration failures
    mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
    mm: numa: Add THP migration for the NUMA working set scanning fault case.
    mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node
    mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG
    mm: sched: numa: Control enabling and disabling of NUMA balancing
    mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate
    mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely tasknode relationships
    mm: numa: migrate: Set last_nid on newly allocated page
    mm: numa: split_huge_page: Transfer last_nid on tail page
    mm: numa: Introduce last_nid to the page frame
    sched: numa: Slowly increase the scanning period as NUMA faults are handled
    mm: numa: Rate limit setting of pte_numa if node is saturated
    mm: numa: Rate limit the amount of memory that is migrated between nodes
    mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting
    mm: numa: Migrate pages handled during a pmd_numa hinting fault
    mm: numa: Migrate on reference policy
    ...

    Linus Torvalds
     

12 Dec, 2012

3 commits

  • The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
    hacks around putback_lru_pages() in order to allow ballooned pages to be
    re-inserted on balloon page list as if a ballooned page was like a LRU page.

    As ballooned pages are not legitimate LRU pages, this patch introduces
    putback_movable_pages() to properly cope with cases where the isolated
    pageset contains ballooned pages and LRU pages, thus fixing the mentioned
    inelegant hack around putback_lru_pages().

    Signed-off-by: Rafael Aquini
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Andi Kleen
    Cc: Konrad Rzeszutek Wilk
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     
  • Memory fragmentation introduced by ballooning might reduce significantly
    the number of 2MB contiguous memory blocks that can be used within a guest,
    thus imposing performance penalties associated with the reduced number of
    transparent huge pages that could be used by the guest workload.

    This patch introduces a common interface to help a balloon driver on
    making its page set movable to compaction, and thus allowing the system
    to better leverage the compation efforts on memory defragmentation.

    [akpm@linux-foundation.org: use PAGE_FLAGS_CHECK_AT_PREP, s/__balloon_page_flags/page_flags_cleared/, small cleanups]
    [rientjes@google.com: allow balloon compaction for any system with memory compaction enabled, which is the defconfig]
    Signed-off-by: Rafael Aquini
    Acked-by: Mel Gorman
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Konrad Rzeszutek Wilk
    Cc: Minchan Kim
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     
  • Memory fragmentation introduced by ballooning might reduce significantly
    the number of 2MB contiguous memory blocks that can be used within a
    guest, thus imposing performance penalties associated with the reduced
    number of transparent huge pages that could be used by the guest workload.

    This patch-set follows the main idea discussed at 2012 LSFMMS session:
    "Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
    to introduce the required changes to the virtio_balloon driver, as well as
    the changes to the core compaction & migration bits, in order to make
    those subsystems aware of ballooned pages and allow memory balloon pages
    become movable within a guest, thus avoiding the aforementioned
    fragmentation issue

    Following are numbers that prove this patch benefits on allowing
    compaction to be more effective at memory ballooned guests.

    Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite,
    running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB
    chunks, at every minute (inflating/deflating), while test was running:

    ===BEGIN stress-highalloc

    STRESS-HIGHALLOC
    highalloc-3.7 highalloc-3.7
    rc4-clean rc4-patch
    Pass 1 55.00 ( 0.00%) 62.00 ( 7.00%)
    Pass 2 54.00 ( 0.00%) 62.00 ( 8.00%)
    while Rested 75.00 ( 0.00%) 80.00 ( 5.00%)

    MMTests Statistics: duration
    3.7 3.7
    rc4-clean rc4-patch
    User 1207.59 1207.46
    System 1300.55 1299.61
    Elapsed 2273.72 2157.06

    MMTests Statistics: vmstat
    3.7 3.7
    rc4-clean rc4-patch
    Page Ins 3581516 2374368
    Page Outs 11148692 10410332
    Swap Ins 80 47
    Swap Outs 3641 476
    Direct pages scanned 37978 33826
    Kswapd pages scanned 1828245 1342869
    Kswapd pages reclaimed 1710236 1304099
    Direct pages reclaimed 32207 31005
    Kswapd efficiency 93% 97%
    Kswapd velocity 804.077 622.546
    Direct efficiency 84% 91%
    Direct velocity 16.703 15.682
    Percentage direct scans 2% 2%
    Page writes by reclaim 79252 9704
    Page writes file 75611 9228
    Page writes anon 3641 476
    Page reclaim immediate 16764 11014
    Page rescued immediate 0 0
    Slabs scanned 2171904 2152448
    Direct inode steals 385 2261
    Kswapd inode steals 659137 609670
    Kswapd skipped wait 1 69
    THP fault alloc 546 631
    THP collapse alloc 361 339
    THP splits 259 263
    THP fault fallback 98 50
    THP collapse fail 20 17
    Compaction stalls 747 499
    Compaction success 244 145
    Compaction failures 503 354
    Compaction pages moved 370888 474837
    Compaction move failure 77378 65259

    ===END stress-highalloc

    This patch:

    Introduce MIGRATEPAGE_SUCCESS as the default return code for
    address_space_operations.migratepage() method and documents the expected
    return code for the same method in failure cases.

    Signed-off-by: Rafael Aquini
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Andi Kleen
    Cc: Konrad Rzeszutek Wilk
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     

11 Dec, 2012

5 commits

  • Commit "Add THP migration for the NUMA working set scanning fault case"
    breaks the build because HPAGE_PMD_SHIFT and HPAGE_PMD_MASK defined to
    explode without CONFIG_TRANSPARENT_HUGEPAGE:

    mm/migrate.c: In function 'migrate_misplaced_transhuge_page_put':
    mm/migrate.c:1549: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
    mm/migrate.c:1564: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
    mm/migrate.c:1566: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
    mm/migrate.c:1573: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
    mm/migrate.c:1606: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
    mm/migrate.c:1648: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed

    CONFIG_NUMA_BALANCING allows compilation without enabling transparent
    hugepages, so define the dummy function for such a configuration and only
    define migrate_misplaced_transhuge_page_put() when transparent hugepages
    are enabled.

    Signed-off-by: David Rientjes
    Signed-off-by: Mel Gorman

    Mel Gorman
     
  • Note: This is very heavily based on a patch from Peter Zijlstra with
    fixes from Ingo Molnar, Hugh Dickins and Johannes Weiner. That patch
    put a lot of migration logic into mm/huge_memory.c where it does
    not belong. This version puts tries to share some of the migration
    logic with migrate_misplaced_page. However, it should be noted
    that now migrate.c is doing more with the pagetable manipulation
    than is preferred. The end result is barely recognisable so as
    before, the signed-offs had to be removed but will be re-added if
    the original authors are ok with it.

    Add THP migration for the NUMA working set scanning fault case.

    It uses the page lock to serialize. No migration pte dance is
    necessary because the pte is already unmapped when we decide
    to migrate.

    [dhillf@gmail.com: Fix memory leak on isolation failure]
    [dhillf@gmail.com: Fix transfer of last_nid information]
    Signed-off-by: Mel Gorman

    Mel Gorman
     
  • If there are a large number of NUMA hinting faults and all of them
    are resulting in migrations it may indicate that memory is just
    bouncing uselessly around. NUMA balancing cost is likely exceeding
    any benefit from locality. Rate limit the PTE updates if the node
    is migration rate-limited. As noted in the comments, this distorts
    the NUMA faulting statistics.

    Signed-off-by: Mel Gorman

    Mel Gorman
     
  • Note: This was originally based on Peter's patch "mm/migrate: Introduce
    migrate_misplaced_page()" but borrows extremely heavily from Andrea's
    "autonuma: memory follows CPU algorithm and task/mm_autonuma stats
    collection". The end result is barely recognisable so signed-offs
    had to be dropped. If original authors are ok with it, I'll
    re-add the signed-off-bys.

    Add migrate_misplaced_page() which deals with migrating pages from
    faults.

    Based-on-work-by: Lee Schermerhorn
    Based-on-work-by: Peter Zijlstra
    Based-on-work-by: Andrea Arcangeli
    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel

    Peter Zijlstra
     
  • The pgmigrate_success and pgmigrate_fail vmstat counters tells the user
    about migration activity but not the type or the reason. This patch adds
    a tracepoint to identify the type of page migration and why the page is
    being migrated.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel

    Mel Gorman
     

01 Aug, 2012

1 commit

  • Since we migrate only one hugepage, don't use linked list for passing the
    page around. Directly pass the page that need to be migrated as argument.
    This also removes the usage of page->lru in the migrate path.

    Signed-off-by: Aneesh Kumar K.V
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: David Rientjes
    Cc: Hillf Danton
    Reviewed-by: Michal Hocko
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     

22 Mar, 2012

1 commit


24 Jan, 2012

1 commit

  • sparc64 allmodconfig:

    In file included from include/linux/compat.h:15,
    from /usr/src/25/arch/sparc/include/asm/siginfo.h:19,
    from include/linux/signal.h:5,
    from include/linux/sched.h:73,
    from arch/sparc/kernel/asm-offsets.c:13:
    include/linux/fs.h:618: warning: parameter has incomplete type

    It seems that my sparc64 compiler (gcc-3.4.5) doesn't like the forward
    declaration of enums.

    Fix this by moving the "enum migrate_mode" definition into its own header
    file.

    Acked-by: Mel Gorman
    Cc: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

13 Jan, 2012

2 commits

  • This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
    mode that avoids writing back pages to backing storage. Async compaction
    maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
    For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
    used.

    This avoids sync compaction stalling for an excessive length of time,
    particularly when copying files to a USB stick where there might be a
    large number of dirty pages backed by a filesystem that does not support
    ->writepages.

    [aarcange@redhat.com: This patch is heavily based on Andrea's work]
    [akpm@linux-foundation.org: fix fs/nfs/write.c build]
    [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Asynchronous compaction is used when allocating transparent hugepages to
    avoid blocking for long periods of time. Due to reports of stalling,
    there was a debate on disabling synchronous compaction but this severely
    impacted allocation success rates. Part of the reason was that many dirty
    pages are skipped in asynchronous compaction by the following check;

    if (PageDirty(page) && !sync &&
    mapping->a_ops->migratepage != migrate_page)
    rc = -EBUSY;

    This skips over all mapping aops using buffer_migrate_page() even though
    it is possible to migrate some of these pages without blocking. This
    patch updates the ->migratepage callback with a "sync" parameter. It is
    the responsibility of the callback to fail gracefully if migration would
    block.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

14 Jan, 2011

2 commits

  • With the introduction of the boolean sync parameter, the API looks a
    little inconsistent as offlining is still an int. Convert offlining to a
    bool for the sake of being tidy.

    Signed-off-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • …ompaction in the faster path

    Migration synchronously waits for writeback if the initial passes fails.
    Callers of memory compaction do not necessarily want this behaviour if the
    caller is latency sensitive or expects that synchronous migration is not
    going to have a significantly better success rate.

    This patch adds a sync parameter to migrate_pages() allowing the caller to
    indicate if wait_on_page_writeback() is allowed within migration or not.
    For reclaim/compaction, try_to_compact_pages() is first called
    asynchronously, direct reclaim runs and then try_to_compact_pages() is
    called synchronously as there is a greater expectation that it'll succeed.

    [akpm@linux-foundation.org: build/merge fix]
    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Rik van Riel <riel@redhat.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Andy Whitcroft <apw@shadowen.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

08 Oct, 2010

2 commits

  • migrate_huge_page_move_mapping() is declared as "extern int ..."
    in include/linux/migrate.h for !CONFIG_MIGRATION,
    which causes the build error like below:

    mm/mprotect.o: In function `migrate_huge_page_move_mapping':
    mprotect.c:(.text+0x0): multiple definition of `migrate_huge_page_move_mapping'
    mm/shmem.o:shmem.c:(.text+0x0): first defined here
    mm/rmap.o: In function `migrate_huge_page_move_mapping':
    rmap.c:(.text+0x0): multiple definition of `migrate_huge_page_move_mapping'
    mm/shmem.o:shmem.c:(.text+0x0): first defined here

    Reported-by: Stephen Rothwell
    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     
  • This patch extends page migration code to support hugepage migration.
    One of the potential users of this feature is soft offlining which
    is triggered by memory corrected errors (added by the next patch.)

    Todo:
    - there are other users of page migration such as memory policy,
    memory hotplug and memocy compaction.
    They are not ready for hugepage support for now.

    ChangeLog since v4:
    - define migrate_huge_pages()
    - remove changes on isolation/putback_lru_page()

    ChangeLog since v2:
    - refactor isolate/putback_lru_page() to handle hugepage
    - add comment about race on unmap_and_move_huge_page()

    ChangeLog since v1:
    - divide migration code path for hugepage
    - define routine checking migration swap entry for hugetlb
    - replace "goto" with "if/else" in remove_migration_pte()

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Jun'ichi Nomura
    Acked-by: Mel Gorman
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     

25 May, 2010

2 commits

  • This patch is the core of a mechanism which compacts memory in a zone by
    relocating movable pages towards the end of the zone.

    A single compaction run involves a migration scanner and a free scanner.
    Both scanners operate on pageblock-sized areas in the zone. The migration
    scanner starts at the bottom of the zone and searches for all movable
    pages within each area, isolating them onto a private list called
    migratelist. The free scanner starts at the top of the zone and searches
    for suitable areas and consumes the free pages within making them
    available for the migration scanner. The pages isolated for migration are
    then migrated to the newly isolated free pages.

    [aarcange@redhat.com: Fix unsafe optimisation]
    [mel@csn.ul.ie: do not schedule work on other CPUs for compaction]
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: KOSAKI Motohiro
    Cc: Christoph Lameter
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • putback_lru_page() never can fail. So it doesn't matter count of "the
    number of pages put back".

    In addition, users of this functions don't use return value.

    Let's remove unnecessary code.

    Signed-off-by: Minchan Kim
    Reviewed-by: Rik van Riel
    Reviewed-by: KOSAKI Motohiro
    Reviewed-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

16 Dec, 2009

1 commit

  • The previous patch enables page migration of ksm pages, but that soon gets
    into trouble: not surprising, since we're using the ksm page lock to lock
    operations on its stable_node, but page migration switches the page whose
    lock is to be used for that. Another layer of locking would fix it, but
    do we need that yet?

    Do we actually need page migration of ksm pages? Yes, memory hotremove
    needs to offline sections of memory: and since we stopped allocating ksm
    pages with GFP_HIGHUSER, they will tend to be GFP_HIGHUSER_MOVABLE
    candidates for migration.

    But KSM is currently unconscious of NUMA issues, happily merging pages
    from different NUMA nodes: at present the rule must be, not to use
    MADV_MERGEABLE where you care about NUMA. So no, NUMA page migration of
    ksm pages does not make sense yet.

    So, to complete support for ksm swapping we need to make hotremove safe.
    ksm_memory_callback() take ksm_thread_mutex when MEM_GOING_OFFLINE and
    release it when MEM_OFFLINE or MEM_CANCEL_OFFLINE. But if mapped pages
    are freed before migration reaches them, stable_nodes may be left still
    pointing to struct pages which have been removed from the system: the
    stable_node needs to identify a page by pfn rather than page pointer, then
    it can safely prune them when MEM_OFFLINE.

    And make NUMA migration skip PageKsm pages where it skips PageReserved.
    But it's only when we reach unmap_and_move() that the page lock is taken
    and we can be sure that raised pagecount has prevented a PageAnon from
    being upgraded: so add offlining arg to migrate_pages(), to migrate ksm
    page when offlining (has sufficient locking) but reject it otherwise.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

07 Jan, 2009

1 commit

  • #ifdef in *.c file decrease source readability a bit. removing is better.

    This patch doesn't have any functional change.

    Signed-off-by: KOSAKI Motohiro
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

20 Oct, 2008

1 commit

  • On large memory systems, the VM can spend way too much time scanning
    through pages that it cannot (or should not) evict from memory. Not only
    does it use up CPU time, but it also provokes lock contention and can
    leave large systems under memory presure in a catatonic state.

    This patch series improves VM scalability by:

    1) putting filesystem backed, swap backed and unevictable pages
    onto their own LRUs, so the system only scans the pages that it
    can/should evict from memory

    2) switching to two handed clock replacement for the anonymous LRUs,
    so the number of pages that need to be scanned when the system
    starts swapping is bound to a reasonable number

    3) keeping unevictable pages off the LRU completely, so the
    VM does not waste CPU time scanning them. ramfs, ramdisk,
    SHM_LOCKED shared memory segments and mlock()ed VMA pages
    are keept on the unevictable list.

    This patch:

    isolate_lru_page logically belongs to be in vmscan.c than migrate.c.

    It is tough, because we don't need that function without memory migration
    so there is a valid argument to have it in migrate.c. However a
    subsequent patch needs to make use of it in the core mm, so we can happily
    move it to vmscan.c.

    Also, make the function a little more generic by not requiring that it
    adds an isolated page to a given list. Callers can do that.

    Note that we now have '__isolate_lru_page()', that does
    something quite different, visible outside of vmscan.c
    for use with memory controller. Methinks we need to
    rationalize these names/purposes. --lts

    [akpm@linux-foundation.org: fix mm/memory_hotplug.c build]
    Signed-off-by: Nick Piggin
    Signed-off-by: Rik van Riel
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

25 Jul, 2008

1 commit

  • We'd like to support CONFIG_MEMORY_HOTREMOVE on s390, which depends on
    CONFIG_MIGRATION. So far, CONFIG_MIGRATION is only available with NUMA
    support.

    This patch makes CONFIG_MIGRATION selectable for architectures that define
    ARCH_ENABLE_MEMORY_HOTREMOVE. When MIGRATION is enabled w/o NUMA, the
    kernel won't compile because migrate_vmas() does not know about
    vm_ops->migrate() and vma_migratable() does not know about policy_zone.
    To fix this, those two functions can be restricted to '#ifdef CONFIG_NUMA'
    because they are not being used w/o NUMA. vma_migratable() is moved over
    from migrate.h to mempolicy.h.

    [kosaki.motohiro@jp.fujitsu.com: build fix]
    Acked-by: Christoph Lameter
    Signed-off-by: Gerald Schaefer
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: KOSAKI Motorhiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerald Schaefer
     

08 May, 2007

1 commit

  • Address spaces contain an allocation flag that specifies restriction on the
    zone for pages placed in the mapping. I.e. some device may require pages
    to be allocated from a DMA zone. Block devices may not be able to use
    pages from HIGHMEM.

    Memory policies and the common use of page migration works only on the
    highest zone. If the address space does not allow allocation from the
    highest zone then the pages in the address space are not migratable simply
    because we can only allocate memory for a specified node if we allow
    allocation for the highest zone on each node.

    Acked-by: Hugh Dickins
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

05 Mar, 2007

1 commit

  • Currently we do not check for vma flags if sys_move_pages is called to move
    individual pages. If sys_migrate_pages is called to move pages then we
    check for vm_flags that indicate a non migratable vma but that still
    includes VM_LOCKED and we can migrate mlocked pages.

    Extract the vma_migratable check from mm/mempolicy.c, fix it and put it
    into migrate.h so that is can be used from both locations.

    Problem was spotted by Lee Schermerhorn

    Signed-off-by: Christoph Lameter
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

26 Jun, 2006

1 commit

  • Hooks for calling vma specific migration functions

    With this patch a vma may define a vma->vm_ops->migrate function. That
    function may perform page migration on its own (some vmas may not contain page
    structs and therefore cannot be handled by regular page migration. Pages in a
    vma may require special preparatory treatment before migration is possible
    etc) . Only mmap_sem is held when the migration function is called. The
    migrate() function gets passed two sets of nodemasks describing the source and
    the target of the migration. The flags parameter either contains

    MPOL_MF_MOVE which means that only pages used exclusively by
    the specified mm should be moved

    or

    MPOL_MF_MOVE_ALL which means that pages shared with other processes
    should also be moved.

    The migration function returns 0 on success or an error condition. An error
    condition will prevent regular page migration from occurring.

    On its own this patch cannot be included since there are no users for this
    functionality. But it seems that the uncached allocator will need this
    functionality at some point.

    Signed-off-by: Christoph Lameter
    Cc: Hugh Dickins
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

23 Jun, 2006

5 commits

  • move_pages() is used to move individual pages of a process. The function can
    be used to determine the location of pages and to move them onto the desired
    node. move_pages() returns status information for each page.

    long move_pages(pid, number_of_pages_to_move,
    addresses_of_pages[],
    nodes[] or NULL,
    status[],
    flags);

    The addresses of pages is an array of void * pointing to the
    pages to be moved.

    The nodes array contains the node numbers that the pages should be moved
    to. If a NULL is passed instead of an array then no pages are moved but
    the status array is updated. The status request may be used to determine
    the page state before issuing another move_pages() to move pages.

    The status array will contain the state of all individual page migration
    attempts when the function terminates. The status array is only valid if
    move_pages() completed successfullly.

    Possible page states in status[]:

    0..MAX_NUMNODES The page is now on the indicated node.

    -ENOENT Page is not present

    -EACCES Page is mapped by multiple processes and can only
    be moved if MPOL_MF_MOVE_ALL is specified.

    -EPERM The page has been mlocked by a process/driver and
    cannot be moved.

    -EBUSY Page is busy and cannot be moved. Try again later.

    -EFAULT Invalid address (no VMA or zero page).

    -ENOMEM Unable to allocate memory on target node.

    -EIO Unable to write back page. The page must be written
    back in order to move it since the page is dirty and the
    filesystem does not provide a migration function that
    would allow the moving of dirty pages.

    -EINVAL A dirty page cannot be moved. The filesystem does not provide
    a migration function and has no ability to write back pages.

    The flags parameter indicates what types of pages to move:

    MPOL_MF_MOVE Move pages that are only mapped by the process.

    MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes.
    Requires sufficient capabilities.

    Possible return codes from move_pages()

    -ENOENT No pages found that would require moving. All pages
    are either already on the target node, not present, had an
    invalid address or could not be moved because they were
    mapped by multiple processes.

    -EINVAL Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt
    to migrate pages in a kernel thread.

    -EPERM MPOL_MF_MOVE_ALL specified without sufficient priviledges.
    or an attempt to move a process belonging to another user.

    -EACCES One of the target nodes is not allowed by the current cpuset.

    -ENODEV One of the target nodes is not online.

    -ESRCH Process does not exist.

    -E2BIG Too many pages to move.

    -ENOMEM Not enough memory to allocate control array.

    -EFAULT Parameters could not be accessed.

    A test program for move_pages() may be found with the patches
    on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3

    From: Christoph Lameter

    Detailed results for sys_move_pages()

    Pass a pointer to an integer to get_new_page() that may be used to
    indicate where the completion status of a migration operation should be
    placed. This allows sys_move_pags() to report back exactly what happened to
    each page.

    Wish there would be a better way to do this. Looks a bit hacky.

    Signed-off-by: Christoph Lameter
    Cc: Hugh Dickins
    Cc: Jes Sorensen
    Cc: KAMEZAWA Hiroyuki
    Cc: Lee Schermerhorn
    Cc: Andi Kleen
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Instead of passing a list of new pages, pass a function to allocate a new
    page. This allows the correct placement of MPOL_INTERLEAVE pages during page
    migration. It also further simplifies the callers of migrate pages.
    migrate_pages() becomes similar to migrate_pages_to() so drop
    migrate_pages_to(). The batching of new page allocations becomes unnecessary.

    Signed-off-by: Christoph Lameter
    Cc: Hugh Dickins
    Cc: Jes Sorensen
    Cc: KAMEZAWA Hiroyuki
    Cc: Lee Schermerhorn
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Do not leave pages on the lists passed to migrate_pages(). Seems that we will
    not need any postprocessing of pages. This will simplify the handling of
    pages by the callers of migrate_pages().

    Signed-off-by: Christoph Lameter
    Cc: Hugh Dickins
    Cc: Jes Sorensen
    Cc: KAMEZAWA Hiroyuki
    Cc: Lee Schermerhorn
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Change handling of address spaces.

    Pass a pointer to the address space in which the page is migrated to all
    migration function. This avoids repeatedly having to retrieve the address
    space pointer from the page and checking it for validity. The old page
    mapping will change once migration has gone to a certain step, so it is less
    confusing to have the pointer always available.

    Move the setting of the mapping and index for the new page into
    migrate_pages().

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Remove the export for migrate_page_remove_references() and migrate_page_copy()
    that are unlikely to be used directly by filesystems implementing migration.
    The export was useful when buffer_migrate_page() lived in fs/buffer.c but it
    has now been moved to migrate.c in the migration reorg.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

26 Apr, 2006

1 commit


01 Apr, 2006

1 commit


22 Mar, 2006

1 commit

  • Centralize the page migration functions in anticipation of additional
    tinkering. Creates a new file mm/migrate.c

    1. Extract buffer_migrate_page() from fs/buffer.c

    2. Extract central migration code from vmscan.c

    3. Extract some components from mempolicy.c

    4. Export pageout() and remove_from_swap() from vmscan.c

    5. Make it possible to configure NUMA systems without page migration
    and non-NUMA systems with page migration.

    I had to so some #ifdeffing in mempolicy.c that may need a cleanup.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter