13 Jan, 2012

3 commits

  • This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
    mode that avoids writing back pages to backing storage. Async compaction
    maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
    For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
    used.

    This avoids sync compaction stalling for an excessive length of time,
    particularly when copying files to a USB stick where there might be a
    large number of dirty pages backed by a filesystem that does not support
    ->writepages.

    [aarcange@redhat.com: This patch is heavily based on Andrea's work]
    [akpm@linux-foundation.org: fix fs/nfs/write.c build]
    [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Asynchronous compaction is used when allocating transparent hugepages to
    avoid blocking for long periods of time. Due to reports of stalling,
    there was a debate on disabling synchronous compaction but this severely
    impacted allocation success rates. Part of the reason was that many dirty
    pages are skipped in asynchronous compaction by the following check;

    if (PageDirty(page) && !sync &&
    mapping->a_ops->migratepage != migrate_page)
    rc = -EBUSY;

    This skips over all mapping aops using buffer_migrate_page() even though
    it is possible to migrate some of these pages without blocking. This
    patch updates the ->migratepage callback with a "sync" parameter. It is
    the responsibility of the callback to fail gracefully if migration would
    block.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • This is a preparation before removing a flag PCG_ACCT_LRU in page_cgroup
    and reducing atomic ops/complexity in memcg LRU handling.

    In some cases, pages are added to lru before charge to memcg and pages
    are not classfied to memory cgroup at lru addtion. Now, the lru where
    the page should be added is determined a bit in page_cgroup->flags and
    pc->mem_cgroup. I'd like to remove the check of flag.

    To handle the case pc->mem_cgroup may contain stale pointers if pages
    are added to LRU before classification. This patch resets
    pc->mem_cgroup to root_mem_cgroup before lru additions.

    [akpm@linux-foundation.org: fix CONFIG_CGROUP_MEM_CONT=n build]
    [hughd@google.com: fix CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_CGROUP_MEM_RES_CTLR_SWAP=n build]
    [akpm@linux-foundation.org: ksm.c needs memcontrol.h, per Michal]
    [hughd@google.com: stop oops in mem_cgroup_reset_owner()]
    [hughd@google.com: fix page migration to reset_owner]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Miklos Szeredi
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: Ying Han
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

11 Jan, 2012

3 commits


09 Dec, 2011

1 commit


07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

01 Nov, 2011

1 commit

  • unmap_and_move() is one a big messy function. Clean it up.

    Signed-off-by: Minchan Kim
    Reviewed-by: KOSAKI Motohiro
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Michal Hocko
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

31 Oct, 2011

1 commit


20 Oct, 2011

1 commit

  • I don't usually pay much attention to the stale "? " addresses in
    stack backtraces, but this lucky report from Pawel Sikora hints that
    mremap's move_ptes() has inadequate locking against page migration.

    3.0 BUG_ON(!PageLocked(p)) in migration_entry_to_page():
    kernel BUG at include/linux/swapops.h:105!
    RIP: 0010:[] []
    migration_entry_wait+0x156/0x160
    [] handle_pte_fault+0xae1/0xaf0
    [] ? __pte_alloc+0x42/0x120
    [] ? do_huge_pmd_anonymous_page+0xab/0x310
    [] handle_mm_fault+0x181/0x310
    [] ? vma_adjust+0x537/0x570
    [] do_page_fault+0x11d/0x4e0
    [] ? do_mremap+0x2d5/0x570
    [] page_fault+0x1f/0x30

    mremap's down_write of mmap_sem, together with i_mmap_mutex or lock,
    and pagetable locks, were good enough before page migration (with its
    requirement that every migration entry be found) came in, and enough
    while migration always held mmap_sem; but not enough nowadays, when
    there's memory hotremove and compaction.

    The danger is that move_ptes() lets a migration entry dodge around
    behind remove_migration_pte()'s back, so it's in the old location when
    looking at the new, then in the new location when looking at the old.

    Either mremap's move_ptes() must additionally take anon_vma lock(), or
    migration's remove_migration_pte() must stop peeking for is_swap_entry()
    before it takes pagetable lock.

    Consensus chooses the latter: we prefer to add overhead to migration
    than to mremapping, which gets used by JVMs and by exec stack setup.

    Reported-and-tested-by: Paweł Sikora
    Signed-off-by: Hugh Dickins
    Acked-by: Andrea Arcangeli
    Acked-by: Mel Gorman
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

17 Jun, 2011

1 commit

  • swapcache will reach the below code path in migrate_page_move_mapping,
    and swapcache is accounted as NR_FILE_PAGES but it's not accounted as
    NR_SHMEM.

    Hugh pointed out we must use PageSwapCache instead of comparing
    mapping to &swapper_space, to avoid build failure with CONFIG_SWAP=n.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Hugh Dickins
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

25 May, 2011

1 commit

  • Convert page_lock_anon_vma() over to use refcounts. This is done to
    prepare for the conversion of anon_vma from spinlock to mutex.

    Sadly this inceases the cost of page_lock_anon_vma() from one to two
    atomics, a follow up patch addresses this, lets keep that simple for now.

    Signed-off-by: Peter Zijlstra
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: KOSAKI Motohiro
    Acked-by: Hugh Dickins
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: Mel Gorman
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

31 Mar, 2011

1 commit


24 Mar, 2011

1 commit

  • Remove initialization of vaiable in caller of memory cgroup function.
    Actually, it's return value of memcg function but it's initialized in
    caller.

    Some memory cgroup uses following style to bring the result of start
    function to the end function for avoiding races.

    mem_cgroup_start_A(&(*ptr))
    /* Something very complicated can happen here. */
    mem_cgroup_end_A(*ptr)

    In some calls, *ptr should be initialized to NULL be caller. But it's
    ugly. This patch fixes that *ptr is initialized by _start function.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Johannes Weiner
    Acked-by: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

23 Mar, 2011

3 commits

  • __GFP_NO_KSWAPD allocations are usually very expensive and not mandatory
    to succeed as they have graceful fallback. Waiting for I/O in those,
    tends to be overkill in terms of latencies, so we can reduce their latency
    by disabling sync migrate.

    Unfortunately, even with async migration it's still possible for the
    process to be blocked waiting for a request slot (e.g. get_request_wait
    in the block layer) when ->writepage is called. To prevent
    __GFP_NO_KSWAPD blocking, this patch prevents ->writepage being called on
    dirty page cache for asynchronous migration.

    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=31142

    [mel@csn.ul.ie: Avoid writebacks for NFS, retry locked pages, use bool]
    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Mel Gorman
    Cc: Arthur Marsh
    Cc: Clemens Ladisch
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Minchan Kim
    Reported-by: Alex Villacis Lasso
    Tested-by: Alex Villacis Lasso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • The normal code pattern used in the kernel is: get/put.

    Signed-off-by: Peter Zijlstra
    Reviewed-by: KAMEZAWA Hiroyuki
    Acked-by: Hugh Dickins
    Reviewed-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • This function basically does:

    remove_from_page_cache(old);
    page_cache_release(old);
    add_to_page_cache_locked(new);

    Except it does this atomically, so there's no possibility for the "add" to
    fail because of a race.

    If memory cgroups are enabled, then the memory cgroup charge is also moved
    from the old page to the new.

    This function is currently used by fuse to move pages into the page cache
    on read, instead of copying the page contents.

    [minchan.kim@gmail.com: add freepage() hook to replace_page_cache_page()]
    Signed-off-by: Miklos Szeredi
    Acked-by: Rik van Riel
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

26 Feb, 2011

1 commit

  • The move_pages() usage of find_task_by_vpid() requires rcu_read_lock() to
    prevent free_pid() from reclaiming the pid.

    Without this patch, RCU warnings are printed in v2.6.38-rc4 move_pages()
    with:

    CONFIG_LOCKUP_DETECTOR=y
    CONFIG_PREEMPT=y
    CONFIG_LOCKDEP=y
    CONFIG_PROVE_LOCKING=y
    CONFIG_PROVE_RCU=y

    Previously, migrate_pages() went through a similar transformation
    replacing usage of tasklist_lock with rcu read lock:

    commit 55cfaa3cbdd29c4919ecb5fb8965c310f357e48c
    Author: Zeng Zhaoming
    Date: Thu Dec 2 14:31:13 2010 -0800

    mm/mempolicy.c: add rcu read lock to protect pid structure

    commit 1e50df39f6e2c3a4a3394df62baa8a213df16c54
    Author: KOSAKI Motohiro
    Date: Thu Jan 13 15:46:14 2011 -0800

    mempolicy: remove tasklist_lock from migrate_pages

    Signed-off-by: Greg Thelen
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: KAMEZAWA Hiroyuki
    Cc: "Paul E. McKenney"
    Cc: Tetsuo Handa
    Cc: Sergey Senozhatsky
    Cc: Oleg Nesterov
    Cc: Zeng Zhaoming
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Thelen
     

03 Feb, 2011

2 commits

  • If migrate_huge_page by memory-failure fails , it calls put_page in itself
    to decrease page reference and caller of migrate_huge_page also calls
    putback_lru_pages. It can do double free of page so it can make page
    corruption on page holder.

    In addtion, clean of pages on caller is consistent behavior with
    migrate_pages by cf608ac19c ("mm: compaction: fix COMPACTPAGEFAILED
    counting").

    Signed-off-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • In some cases migrate_pages could return zero while still leaving a few
    pages in the pagelist (and some caller wouldn't notice it has to call
    putback_lru_pages after commit cf608ac19c9 ("mm: compaction: fix
    COMPACTPAGEFAILED counting")).

    Add one missing putback_lru_pages not added by commit cf608ac19c95 ("mm:
    compaction: fix COMPACTPAGEFAILED counting").

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Minchan Kim
    Reviewed-by: Minchan Kim
    Cc: Christoph Lameter
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

26 Jan, 2011

1 commit

  • Callers of migrate_pages should putback_lru_pages to return pages
    isolated to LRU or free list. Now comment is rather confusing. It says
    caller always have to call it.

    It is more clear to point out that the caller has to call it if
    migrate_pages's return value isn't zero.

    Signed-off-by: Minchan Kim
    Cc: Christoph Lameter
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

14 Jan, 2011

8 commits

  • In the current implementation mem_cgroup_end_migration() decides whether
    the page migration has succeeded or not by checking "oldpage->mapping".

    But if we are tring to migrate a shmem swapcache, the page->mapping of it
    is NULL from the begining, so the check would be invalid. As a result,
    mem_cgroup_end_migration() assumes the migration has succeeded even if
    it's not, so "newpage" would be freed while it's not uncharged.

    This patch fixes it by passing mem_cgroup_end_migration() the result of
    the page migration.

    Signed-off-by: Daisuke Nishimura
    Reviewed-by: Minchan Kim
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Cc: Minchan Kim
    Reviewed-by: Johannes Weiner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • 2.6.37 added an unmap_and_move_huge_page() for memory failure recovery,
    but its anon_vma handling was still based around the 2.6.35 conventions.
    Update it to use page_lock_anon_vma, get_anon_vma, page_unlock_anon_vma,
    drop_anon_vma in the same way as we're now changing unmap_and_move().

    I don't particularly like to propose this for stable when I've not seen
    its problems in practice nor tested the solution: but it's clearly out of
    synch at present.

    Signed-off-by: Hugh Dickins
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Naoya Horiguchi
    Cc: "Jun'ichi Nomura"
    Cc: Andi Kleen
    Cc: [2.6.37, 2.6.36]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Increased usage of page migration in mmotm reveals that the anon_vma
    locking in unmap_and_move() has been deficient since 2.6.36 (or even
    earlier). Review at the time of f18194275c39835cb84563500995e0d503a32d9a
    ("mm: fix hang on anon_vma->root->lock") missed the issue here: the
    anon_vma to which we get a reference may already have been freed back to
    its slab (it is in use when we check page_mapped, but that can change),
    and so its anon_vma->root may be switched at any moment by reuse in
    anon_vma_prepare.

    Perhaps we could fix that with a get_anon_vma_unless_zero(), but let's
    not: just rely on page_lock_anon_vma() to do all the hard thinking for us,
    then we don't need any rcu read locking over here.

    In removing the rcu_unlock label: since PageAnon is a bit in
    page->mapping, it's impossible for a !page->mapping page to be anon; but
    insert VM_BUG_ON in case the implementation ever changes.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Hugh Dickins
    Reviewed-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Naoya Horiguchi
    Cc: "Jun'ichi Nomura"
    Cc: Andi Kleen
    Cc: [2.6.37, 2.6.36]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • …lot during file page migration

    migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for
    anonymous pages, as introduced by git commit
    989f89c57e6361e7d16fbd9572b5da7d313b073d ("fix rcu_read_lock() in page
    migraton"). The point of the RCU protection there is part of getting a
    stable reference to anon_vma and is only held for anon pages as file pages
    are locked which is sufficient protection against freeing.

    However, while a file page's mapping is being migrated, the radix tree is
    double checked to ensure it is the expected page. This uses
    radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held
    triggering the following warning.

    [ 173.674290] ===================================================
    [ 173.676016] [ INFO: suspicious rcu_dereference_check() usage. ]
    [ 173.676016] ---------------------------------------------------
    [ 173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
    [ 173.676016]
    [ 173.676016] other info that might help us debug this:
    [ 173.676016]
    [ 173.676016]
    [ 173.676016] rcu_scheduler_active = 1, debug_locks = 0
    [ 173.676016] 1 lock held by hugeadm/2899:
    [ 173.676016] #0: (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab
    [ 173.676016]
    [ 173.676016] stack backtrace:
    [ 173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild
    [ 173.676016] Call Trace:
    [ 173.676016] [<c128cc01>] ? printk+0x14/0x1b
    [ 173.676016] [<c1063502>] lockdep_rcu_dereference+0x7d/0x86
    [ 173.676016] [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab
    [ 173.676016] [<c10e41ad>] migrate_page+0x23/0x39
    [ 173.676016] [<c10e491b>] buffer_migrate_page+0x22/0x107
    [ 173.676016] [<c10e48f9>] ? buffer_migrate_page+0x0/0x107
    [ 173.676016] [<c10e425d>] move_to_new_page+0x9a/0x1ae
    [ 173.676016] [<c10e47e6>] migrate_pages+0x1e7/0x2fa

    This patch introduces radix_tree_deref_slot_protected() which calls
    rcu_dereference_protected(). Users of it must pass in the
    mapping->tree_lock that is protecting this dereference. Holding the tree
    lock protects against parallel updaters of the radix tree meaning that
    rcu_dereference_protected is allowable.

    [akpm@linux-foundation.org: remove unneeded casts]
    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Cc: Minchan Kim <minchan.kim@gmail.com>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Milton Miller <miltonm@bga.com>
    Cc: Nick Piggin <nickpiggin@yahoo.com.au>
    Cc: Wu Fengguang <fengguang.wu@intel.com>
    Cc: <stable@kernel.org> [2.6.37.early]
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     
  • No pmd_trans_huge should ever materialize in migration ptes areas, because
    we split the hugepage before migration ptes are instantiated.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • With the introduction of the boolean sync parameter, the API looks a
    little inconsistent as offlining is still an int. Convert offlining to a
    bool for the sake of being tidy.

    Signed-off-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • …ompaction in the faster path

    Migration synchronously waits for writeback if the initial passes fails.
    Callers of memory compaction do not necessarily want this behaviour if the
    caller is latency sensitive or expects that synchronous migration is not
    going to have a significantly better success rate.

    This patch adds a sync parameter to migrate_pages() allowing the caller to
    indicate if wait_on_page_writeback() is allowed within migration or not.
    For reclaim/compaction, try_to_compact_pages() is first called
    asynchronously, direct reclaim runs and then try_to_compact_pages() is
    called synchronously as there is a greater expectation that it'll succeed.

    [akpm@linux-foundation.org: build/merge fix]
    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Rik van Riel <riel@redhat.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Andy Whitcroft <apw@shadowen.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     
  • Lumpy reclaim is disruptive. It reclaims a large number of pages and
    ignores the age of the pages it reclaims. This can incur significant
    stalls and potentially increase the number of major faults.

    Compaction has reached the point where it is considered reasonably stable
    (meaning it has passed a lot of testing) and is a potential candidate for
    displacing lumpy reclaim. This patch introduces an alternative to lumpy
    reclaim whe compaction is available called reclaim/compaction. The basic
    operation is very simple - instead of selecting a contiguous range of
    pages to reclaim, a number of order-0 pages are reclaimed and then
    compaction is later by either kswapd (compact_zone_order()) or direct
    compaction (__alloc_pages_direct_compact()).

    [akpm@linux-foundation.org: fix build]
    [akpm@linux-foundation.org: use conventional task_struct naming]
    Signed-off-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

23 Dec, 2010

1 commit


27 Oct, 2010

3 commits

  • The vma returned by find_vma does not necessarily include the target
    address. If this happens the code tries to follow a page outside of any
    vma and returns ENOENT instead of EFAULT.

    Signed-off-by: Gleb Natapov
    Acked-by: Christoph Lameter
    Cc: Minchan Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gleb Natapov
     
  • Presently update_nr_listpages() doesn't have a role. That's because lists
    passed is always empty just after calling migrate_pages. The
    migrate_pages cleans up page list which have failed to migrate before
    returning by aaa994b3.

    [PATCH] page migration: handle freeing of pages in migrate_pages()

    Do not leave pages on the lists passed to migrate_pages(). Seems that we will
    not need any postprocessing of pages. This will simplify the handling of
    pages by the callers of migrate_pages().

    At that time, we thought we don't need any postprocessing of pages. But
    the situation is changed. The compaction need to know the number of
    failed to migrate for COMPACTPAGEFAILED stat

    This patch makes new rule for caller of migrate_pages to call
    putback_lru_pages. So caller need to clean up the lists so it has a
    chance to postprocess the pages. [suggested by Christoph Lameter]

    Signed-off-by: Minchan Kim
    Cc: Hugh Dickins
    Cc: Andi Kleen
    Reviewed-by: Mel Gorman
    Reviewed-by: Wu Fengguang
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This removes more dead code that was somehow missed by commit 0d99519efef
    (writeback: remove unused nonblocking and congestion checks). There are
    no behavior change except for the removal of two entries from one of the
    ext4 tracing interface.

    The nonblocking checks in ->writepages are no longer used because the
    flusher now prefer to block on get_request_wait() than to skip inodes on
    IO congestion. The latter will lead to more seeky IO.

    The nonblocking checks in ->writepage are no longer used because it's
    redundant with the WB_SYNC_NONE check.

    We no long set ->nonblocking in VM page out and page migration, because
    a) it's effectively redundant with WB_SYNC_NONE in current code
    b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
    that would skip some dirty inodes on congestion and page out others, which
    is unfair in terms of LRU age.

    Inspired by Christoph Hellwig. Thanks!

    Signed-off-by: Wu Fengguang
    Cc: Theodore Ts'o
    Cc: David Howells
    Cc: Sage Weil
    Cc: Steve French
    Cc: Chris Mason
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     

11 Oct, 2010

1 commit

  • 31bit s390 doesn't have huge pages and failed with:

    > mm/migrate.c: In function 'remove_migration_pte':
    > mm/migrate.c:143:3: error: implicit declaration of function 'pte_mkhuge'
    > mm/migrate.c:143:7: error: incompatible types when assigning to type 'pte_t' from type 'int'

    Put that code into a ifdef.

    Reported by Heiko Carstens

    Signed-off-by: Andi Kleen

    Andi Kleen
     

08 Oct, 2010

1 commit

  • This patch extends page migration code to support hugepage migration.
    One of the potential users of this feature is soft offlining which
    is triggered by memory corrected errors (added by the next patch.)

    Todo:
    - there are other users of page migration such as memory policy,
    memory hotplug and memocy compaction.
    They are not ready for hugepage support for now.

    ChangeLog since v4:
    - define migrate_huge_pages()
    - remove changes on isolation/putback_lru_page()

    ChangeLog since v2:
    - refactor isolate/putback_lru_page() to handle hugepage
    - add comment about race on unmap_and_move_huge_page()

    ChangeLog since v1:
    - divide migration code path for hugepage
    - define routine checking migration swap entry for hugetlb
    - replace "goto" with "if/else" in remove_migration_pte()

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Jun'ichi Nomura
    Acked-by: Mel Gorman
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     

10 Aug, 2010

3 commits

  • KSM reference counts can cause an anon_vma to exist after the processe it
    belongs to have already exited. Because the anon_vma lock now lives in
    the root anon_vma, we need to ensure that the root anon_vma stays around
    until after all the "child" anon_vmas have been freed.

    The obvious way to do this is to have a "child" anon_vma take a reference
    to the root in anon_vma_fork. When the anon_vma is freed at munmap or
    process exit, we drop the refcount in anon_vma_unlink and possibly free
    the root anon_vma.

    The KSM anon_vma reference count function also needs to be modified to
    deal with the possibility of freeing 2 levels of anon_vma. The easiest
    way to do this is to break out the KSM magic and make it generic.

    When compiling without CONFIG_KSM, this code is compiled out.

    Signed-off-by: Rik van Riel
    Tested-by: Larry Woodman
    Acked-by: Larry Woodman
    Reviewed-by: Minchan Kim
    Cc: KAMEZAWA Hiroyuki
    Acked-by: Mel Gorman
    Acked-by: Linus Torvalds
    Tested-by: Dave Young
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Always (and only) lock the root (oldest) anon_vma whenever we do something
    in an anon_vma. The recently introduced anon_vma scalability is due to
    the rmap code scanning only the VMAs that need to be scanned. Many common
    operations still took the anon_vma lock on the root anon_vma, so always
    taking that lock is not expected to introduce any scalability issues.

    However, always taking the same lock does mean we only need to take one
    lock, which means rmap_walk on pages from any anon_vma in the vma is
    excluded from occurring during an munmap, expand_stack or other operation
    that needs to exclude rmap_walk and similar functions.

    Also add the proper locking to vma_adjust.

    Signed-off-by: Rik van Riel
    Tested-by: Larry Woodman
    Acked-by: Larry Woodman
    Reviewed-by: Minchan Kim
    Reviewed-by: KAMEZAWA Hiroyuki
    Acked-by: Mel Gorman
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Subsitute a direct call of spin_lock(anon_vma->lock) with an inline
    function doing exactly the same.

    This makes it easier to do the substitution to the root anon_vma lock in a
    following patch.

    We will deal with the handful of special locks (nested, dec_and_lock, etc)
    separately.

    Signed-off-by: Rik van Riel
    Acked-by: Mel Gorman
    Acked-by: KAMEZAWA Hiroyuki
    Tested-by: Larry Woodman
    Acked-by: Larry Woodman
    Reviewed-by: Minchan Kim
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     

28 May, 2010

1 commit

  • FILE_MAPPED per memcg of migrated file cache is not properly updated,
    because our hook in page_add_file_rmap() can't know to which memcg
    FILE_MAPPED should be counted.

    Basically, this patch is for fixing the bug but includes some big changes
    to fix up other messes.

    Now, at migrating mapped file, events happen in following sequence.

    1. allocate a new page.
    2. get memcg of an old page.
    3. charge ageinst a new page before migration. But at this point,
    no changes to new page's page_cgroup, no commit for the charge.
    (IOW, PCG_USED bit is not set.)
    4. page migration replaces radix-tree, old-page and new-page.
    5. page migration remaps the new page if the old page was mapped.
    6. Here, the new page is unlocked.
    7. memcg commits the charge for newpage, Mark the new page's page_cgroup
    as PCG_USED.

    Because "commit" happens after page-remap, we can count FILE_MAPPED
    at "5", because we should avoid to trust page_cgroup->mem_cgroup.
    if PCG_USED bit is unset.
    (Note: memcg's LRU removal code does that but LRU-isolation logic is used
    for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is
    not on LRU or page_cgroup->mem_cgroup is NULL.)

    We can lose file_mapped accounting information at 5 because FILE_MAPPED
    is updated only when mapcount changes 0->1. So we should catch it.

    BTW, historically, above implemntation comes from migration-failure
    of anonymous page. Because we charge both of old page and new page
    with mapcount=0, we can't catch
    - the page is really freed before remap.
    - migration fails but it's freed before remap
    or .....corner cases.

    New migration sequence with memcg is:

    1. allocate a new page.
    2. mark PageCgroupMigration to the old page.
    3. charge against a new page onto the old page's memcg. (here, new page's pc
    is marked as PageCgroupUsed.)
    4. page migration replaces radix-tree, page table, etc...
    5. At remapping, new page's page_cgroup is now makrked as "USED"
    We can catch 0->1 event and FILE_MAPPED will be properly updated.

    And we can catch SWAPOUT event after unlock this and freeing this
    page by unmap() can be caught.

    7. Clear PageCgroupMigration of the old page.

    So, FILE_MAPPED will be correctly updated.

    Then, for what MIGRATION flag is ?
    Without it, at migration failure, we may have to charge old page again
    because it may be fully unmapped. "charge" means that we have to dive into
    memory reclaim or something complated. So, it's better to avoid
    charge it again. Before this patch, __commit_charge() was working for
    both of the old/new page and fixed up all. But this technique has some
    racy condtion around FILE_MAPPED and SWAPOUT etc...
    Now, the kernel use MIGRATION flag and don't uncharge old page until
    the end of migration.

    I hope this change will make memcg's page migration much simpler. This
    page migration has caused several troubles. Worth to add a flag for
    simplification.

    Reviewed-by: Daisuke Nishimura
    Tested-by: Daisuke Nishimura
    Reported-by: Daisuke Nishimura
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Christoph Lameter
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@linux-foundation.org