09 Oct, 2012

2 commits

  • When a large VMA (anon or private file mapping) is first touched, which
    will populate its anon_vma field, and then split into many regions through
    the use of mprotect(), the original anon_vma ends up linking all of the
    vmas on a linked list. This can cause rmap to become inefficient, as we
    have to walk potentially thousands of irrelevent vmas before finding the
    one a given anon page might fall into.

    By replacing the same_anon_vma linked list with an interval tree (where
    each avc's interval is determined by its vma's start and last pgoffs), we
    can make rmap efficient for this use case again.

    While the change is large, all of its pieces are fairly simple.

    Most places that were walking the same_anon_vma list were looking for a
    known pgoff, so they can just use the anon_vma_interval_tree_foreach()
    interval tree iterator instead. The exception here is ksm, where the
    page's index is not known. It would probably be possible to rework ksm so
    that the index would be known, but for now I have decided to keep things
    simple and just walk the entirety of the interval tree there.

    When updating vma's that already have an anon_vma assigned, we must take
    care to re-index the corresponding avc's on their interval tree. This is
    done through the use of anon_vma_interval_tree_pre_update_vma() and
    anon_vma_interval_tree_post_update_vma(), which remove the avc's from
    their interval tree before the update and re-insert them after the update.
    The anon_vma stays locked during the update, so there is no chance that
    rmap would miss the vmas that are being updated.

    Signed-off-by: Michel Lespinasse
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Daniel Santos
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Implement an interval tree as a replacement for the VMA prio_tree. The
    algorithms are similar to lib/interval_tree.c; however that code can't be
    directly reused as the interval endpoints are not explicitly stored in the
    VMA. So instead, the common algorithm is moved into a template and the
    details (node type, how to get interval endpoints from the node, etc) are
    filled in using the C preprocessor.

    Once the interval tree functions are available, using them as a
    replacement to the VMA prio tree is a relatively simple, mechanical job.

    Signed-off-by: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Hillf Danton
    Cc: Peter Zijlstra
    Cc: Catalin Marinas
    Cc: Andrea Arcangeli
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     

01 Aug, 2012

2 commits

  • Sanity:

    CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
    CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
    CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
    CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

    [mhocko@suse.cz: fix missed bits]
    Cc: Glauber Costa
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc: Tejun Heo
    Cc: Aneesh Kumar K.V
    Cc: David Rientjes
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Since we migrate only one hugepage, don't use linked list for passing the
    page around. Directly pass the page that need to be migrated as argument.
    This also removes the usage of page->lru in the migrate path.

    Signed-off-by: Aneesh Kumar K.V
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: David Rientjes
    Cc: Hillf Danton
    Reviewed-by: Michal Hocko
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     

31 Jul, 2012

1 commit

  • Commit a6bc32b89922 ("mm: compaction: introduce sync-light migration for
    use by compaction") changed the declaration of migrate_pages() and
    migrate_huge_pages().

    But it missed changing the argument of migrate_huge_pages() in
    soft_offline_huge_page(). In this case, we should call
    migrate_huge_pages() with MIGRATE_SYNC.

    Additionally, there is a mismatch between type the of argument and the
    function declaration for migrate_pages().

    Signed-off-by: Joonsoo Kim
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Acked-by: David Rientjes
    Cc: "Aneesh Kumar K.V"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

12 Jul, 2012

1 commit

  • In commit dad1743e5993f1 ("x86/mce: Only restart instruction after machine
    check recovery if it is safe") we fixed mce_notify_process() to force a
    signal to the current process if it was not restartable (RIPV bit not
    set in MCG_STATUS). But doing it here means that the process doesn't
    get told the virtual address of the fault via siginfo_t->si_addr. This
    would prevent application level recovery from the fault.

    Make a new MF_MUST_KILL flag bit for memory_failure() et al. to use so
    that we will provide the right information with the signal.

    Signed-off-by: Tony Luck
    Acked-by: Borislav Petkov
    Cc: stable@kernel.org # 3.4+

    Tony Luck
     

30 May, 2012

1 commit


21 May, 2012

1 commit

  • This commit changes various functions that change pages and
    pageblocks migrate type between MIGRATE_ISOLATE and
    MIGRATE_MOVABLE in such a way as to allow to work with
    MIGRATE_CMA migrate type.

    Signed-off-by: Michal Nazarewicz
    Signed-off-by: Marek Szyprowski
    Reviewed-by: KAMEZAWA Hiroyuki
    Tested-by: Rob Clark
    Tested-by: Ohad Ben-Cohen
    Tested-by: Benjamin Gaignard
    Tested-by: Robert Nelson
    Tested-by: Barry Song

    Michal Nazarewicz
     

23 Mar, 2012

1 commit

  • Pull MCE changes from Ingo Molnar.

    * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mce: Fix return value of mce_chrdev_read() when erst is disabled
    x86/mce: Convert static array of pointers to per-cpu variables
    x86/mce: Replace hard coded hex constants with symbolic defines
    x86/mce: Recognise machine check bank signature for data path error
    x86/mce: Handle "action required" errors
    x86/mce: Add mechanism to safely save information in MCE handler
    x86/mce: Create helper function to save addr/misc when needed
    HWPOISON: Add code to handle "action required" errors.
    HWPOISON: Clean up memory_failure() vs. __memory_failure()

    Linus Torvalds
     

22 Mar, 2012

1 commit

  • Andrea Arcangeli pointed out to me that a check in __memory_failure()
    which was intended to prevent THP tail pages from being checked for the
    absence of the PG_lru flag (something that is always the case), was also
    preventing THP head pages from being checked.

    A THP head page could actually benefit from the call to shake_page() by
    ending up being put back to a LRU, provided it had been waiting in a
    pagevec array.

    Andrea suggested that the "!PageTransCompound(p)" in the if-statement
    should be replaced by a "!PageTransTail(p)", thus allowing THP head pages
    to be checked and possibly shaken.

    Signed-off-by: Dean Nelson
    Cc: Jin Dongming
    Reviewed-by: Andrea Arcangeli
    Cc: Andi Kleen
    Cc: Hidetoshi Seto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dean Nelson
     

26 Jan, 2012

1 commit


13 Jan, 2012

1 commit

  • This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
    mode that avoids writing back pages to backing storage. Async compaction
    maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
    For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
    used.

    This avoids sync compaction stalling for an excessive length of time,
    particularly when copying files to a USB stick where there might be a
    large number of dirty pages backed by a filesystem that does not support
    ->writepages.

    [aarcange@redhat.com: This patch is heavily based on Andrea's work]
    [akpm@linux-foundation.org: fix fs/nfs/write.c build]
    [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

04 Jan, 2012

2 commits

  • Add new flag bit "MF_ACTION_REQUIRED" to be used by machine check
    code to force a signal with si_code = BUS_MCEERR_AR in the case
    where the error occurs in processor execution context. Pass the
    flags argument along call chain:
    memory_failure()
    hwpoison_user_mappings()
    kill_procs()
    kill_proc()

    Drop the "_ao" suffix from kill_procs_ao() and kill_proc_ao() since
    they can now handle "action required" as well as "action optional" errors.

    Acked-by: Borislav Petkov
    Signed-off-by: Tony Luck

    Tony Luck
     
  • There is only one caller of memory_failure(), all other users call
    __memory_failure() and pass in the flags argument explicitly. The
    lone user of memory_failure() will soon need to pass flags too.

    Add flags argument to the callsite in mce.c. Delete the old memory_failure()
    function, and then rename __memory_failure() without the leading "__".

    Provide clearer message when action optional memory errors are ignored.

    Acked-by: Borislav Petkov
    Signed-off-by: Tony Luck

    Tony Luck
     

07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

01 Nov, 2011

1 commit

  • Commit fb46e73520940b ("HWPOISON: Convert pr_debugs to pr_info) authored
    by Andi Kleen converted a number of pr_debug()s to pr_info()s.

    About the same time additional code with pr_debug()s was added by two
    other commits 8c6c2ecb4466 ("HWPOSION, hugetlb: recover from free hugepage
    error when !MF_COUNT_INCREASED") and d950b95882f3d ("HWPOISON, hugetlb:
    soft offlining for hugepage"). And these pr_debug()s failed to get
    converted to pr_info()s.

    This patch converts them as well. And does some minor related whitespace
    cleanup.

    Signed-off-by: Dean Nelson
    Reviewed-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dean Nelson
     

31 Oct, 2011

1 commit


03 Aug, 2011

1 commit

  • memory_failure() is the entry point for HWPoison memory error
    recovery. It must be called in process context. But commonly
    hardware memory errors are notified via MCE or NMI, so some delayed
    execution mechanism must be used. In MCE handler, a work queue + ring
    buffer mechanism is used.

    In addition to MCE, now APEI (ACPI Platform Error Interface) GHES
    (Generic Hardware Error Source) can be used to report memory errors
    too. To add support to APEI GHES memory recovery, a mechanism similar
    to that of MCE is implemented. memory_failure_queue() is the new
    entry point that can be called in IRQ context. The next step is to
    make MCE handler uses this interface too.

    Signed-off-by: Huang Ying
    Cc: Andi Kleen
    Cc: Wu Fengguang
    Cc: Andrew Morton
    Signed-off-by: Len Brown

    Huang Ying
     

28 Jun, 2011

1 commit


16 Jun, 2011

1 commit

  • Pages isolated for migration are accounted with the vmstat counters
    NR_ISOLATE_[ANON|FILE]. Callers of migrate_pages() are expected to
    increment these counters when pages are isolated from the LRU. Once the
    pages have been migrated, they are put back on the LRU or freed and the
    isolated count is decremented.

    Memory failure is not properly accounting for pages it isolates causing
    the NR_ISOLATED counters to be negative. On SMP builds, this goes
    unnoticed as negative counters are treated as 0 due to expected per-cpu
    drift. On UP builds, the counter is treated by too_many_isolated() as a
    large value causing processes to enter D state during page reclaim or
    compaction. This patch accounts for pages isolated by memory failure
    correctly.

    [mel@csn.ul.ie: rewrote changelog]
    Reviewed-by: Andrea Arcangeli
    Signed-off-by: Minchan Kim
    Cc: Andi Kleen
    Acked-by: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

25 May, 2011

4 commits

  • Change each shrinker's API by consolidating the existing parameters into
    shrink_control struct. This will simplify any further features added w/o
    touching each file of shrinker.

    [akpm@linux-foundation.org: fix build]
    [akpm@linux-foundation.org: fix warning]
    [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
    [akpm@linux-foundation.org: fix xfs warning]
    [akpm@linux-foundation.org: update gfs2]
    Signed-off-by: Ying Han
    Cc: KOSAKI Motohiro
    Cc: Minchan Kim
    Acked-by: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Johannes Weiner
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Steven Whitehouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ying Han
     
  • Consolidate the existing parameters to shrink_slab() into a new
    shrink_control struct. This is needed later to pass the same struct to
    shrinkers.

    Signed-off-by: Ying Han
    Cc: KOSAKI Motohiro
    Cc: Minchan Kim
    Acked-by: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Johannes Weiner
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ying Han
     
  • Drop first page reference only after calling isolate_lru_page() to keep
    page stable reference while isolating.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Straightforward conversion of i_mmap_lock to a mutex.

    Signed-off-by: Peter Zijlstra
    Acked-by: Hugh Dickins
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

31 Mar, 2011

1 commit


25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

23 Mar, 2011

1 commit

  • Now we renamed remove_from_page_cache with delete_from_page_cache. As
    consistency of __remove_from_swap_cache and remove_from_swap_cache, we
    change internal page cache handling function name, too.

    Signed-off-by: Minchan Kim
    Cc: Christoph Hellwig
    Acked-by: Hugh Dickins
    Acked-by: Mel Gorman
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Johannes Weiner
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

18 Mar, 2011

1 commit


10 Mar, 2011

1 commit

  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     

03 Feb, 2011

5 commits

  • When a tail page of THP is poisoned, memory-failure will do nothing except
    setting PG_hwpoison, while the expected behavior is that the process, who
    is using the poisoned tail page, should be killed.

    The above problem is caused by lru check of the poisoned tail page of THP.
    Because PG_lru flag is only set on the head page of THP, the check always
    consider the poisoned tail page as NON lru page.

    So the lru check for the tail page of THP should be avoided, as like as
    hugetlb.

    This patch adds !PageTransCompound() before lru check for THP, because of
    the check (!PageHuge() && !PageTransCompound()) the whole branch could be
    optimized away at build time when both hugetlbfs and THP are set with "N"
    (or in archs not supporting either of those).

    [akpm@linux-foundation.org: fix unrelated typo in shake_page() comment]
    Signed-off-by: Jin Dongming
    Reviewed-by: Hidetoshi Seto
    Cc: Andrea Arcangeli
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jin Dongming
     
  • When the tail page of THP is poisoned, the head page will be poisoned too.
    And the wrong address, address of head page, will be sent with sigbus
    always.

    So when the poisoned page is used by Guest OS which is running on KVM,
    after the address changing(hva->gpa) by qemu, the unexpected process on
    Guest OS will be killed by sigbus.

    What we expected is that the process using the poisoned tail page could be
    killed on Guest OS, but not that the process using the healthy head page
    is killed.

    Since it is not good to poison the healthy page, avoid poisoning other
    than the page which is really poisoned.
    (While we poison all pages in a huge page in case of hugetlb,
    we can do this for THP thanks to split_huge_page().)

    Here we fix two parts:
    1. Isolate the poisoned page only to make sure
    the reported address is the address of poisoned page.
    2. make the poisoned page work as the poisoned regular page.

    [akpm@linux-foundation.org: fix spello in comment]
    Signed-off-by: Jin Dongming
    Reviewed-by: Hidetoshi Seto
    Cc: Andrea Arcangeli
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jin Dongming
     
  • The poisoned THP is now split with split_huge_page() in
    collect_procs_anon(). If kmalloc() is failed in collect_procs(),
    split_huge_page() could not be called. And the work after
    split_huge_page() for collecting the processes using poisoned page will
    not be done, too. So the processes using the poisoned page could not be
    killed.

    The condition becomes worse when CONFIG_DEBUG_VM == "Y". Because the
    poisoned THP could not be split, system panic will be caused by
    VM_BUG_ON(PageTransHuge(page)) in try_to_unmap().

    This patch does:
    1. move split_huge_page() to the place before collect_procs().
    This can be sure the failure of splitting THP is caused by itself.
    2. when splitting THP is failed, stop the operations after it.
    This can avoid unexpected system panic or non sense works.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Jin Dongming
    Reviewed-by: Hidetoshi Seto
    Cc: Andrea Arcangeli
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jin Dongming
     
  • If migrate_huge_page by memory-failure fails , it calls put_page in itself
    to decrease page reference and caller of migrate_huge_page also calls
    putback_lru_pages. It can do double free of page so it can make page
    corruption on page holder.

    In addtion, clean of pages on caller is consistent behavior with
    migrate_pages by cf608ac19c ("mm: compaction: fix COMPACTPAGEFAILED
    counting").

    Signed-off-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • In some cases migrate_pages could return zero while still leaving a few
    pages in the pagelist (and some caller wouldn't notice it has to call
    putback_lru_pages after commit cf608ac19c9 ("mm: compaction: fix
    COMPACTPAGEFAILED counting")).

    Add one missing putback_lru_pages not added by commit cf608ac19c95 ("mm:
    compaction: fix COMPACTPAGEFAILED counting").

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Minchan Kim
    Reviewed-by: Minchan Kim
    Cc: Christoph Lameter
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

14 Jan, 2011

4 commits

  • Read compound_trans_order safe. Noop for CONFIG_TRANSPARENT_HUGEPAGE=n.

    Signed-off-by: Andrea Arcangeli
    Cc: Daisuke Nishimura
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • hugetlbfs was changed to allow memory failure to migrate the hugetlbfs
    pages and that broke THP as split_huge_page was then called on hugetlbfs
    pages too.

    compound_head/order was also run unsafe on THP pages that can be splitted
    at any time.

    All compound_head() invocations in memory-failure.c that are run on pages
    that aren't pinned and that can be freed and reused from under us (while
    compound_head is running) are buggy because compound_head can return a
    dangling pointer, but I'm not fixing this as this is a generic
    memory-failure bug not specific to THP but it applies to hugetlbfs too, so
    I can fix it later after THP is merged upstream.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Paging logic that splits the page before it is unmapped and added to swap
    to ensure backwards compatibility with the legacy swap code. Eventually
    swap should natively pageout the hugepages to increase performance and
    decrease seeking and fragmentation of swap space. swapoff can just skip
    over huge pmd as they cannot be part of swap yet. In add_to_swap be
    careful to split the page only if we got a valid swap entry so we don't
    split hugepages with a full swap.

    In theory we could split pages before isolating them during the lru scan,
    but for khugepaged to be safe, I'm relying on either mmap_sem write mode,
    or PG_lock taken, so split_huge_page has to run either with mmap_sem
    read/write mode or PG_lock taken. Calling it from isolate_lru_page would
    make locking more complicated, in addition to that split_huge_page would
    deadlock if called by __isolate_lru_page because it has to take the lru
    lock to add the tail pages.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Mel Gorman
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • …ompaction in the faster path

    Migration synchronously waits for writeback if the initial passes fails.
    Callers of memory compaction do not necessarily want this behaviour if the
    caller is latency sensitive or expects that synchronous migration is not
    going to have a significantly better success rate.

    This patch adds a sync parameter to migrate_pages() allowing the caller to
    indicate if wait_on_page_writeback() is allowed within migration or not.
    For reclaim/compaction, try_to_compact_pages() is first called
    asynchronously, direct reclaim runs and then try_to_compact_pages() is
    called synchronously as there is a greater expectation that it'll succeed.

    [akpm@linux-foundation.org: build/merge fix]
    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Rik van Riel <riel@redhat.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Andy Whitcroft <apw@shadowen.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

03 Dec, 2010

1 commit

  • Presently hwpoison is using lock_system_sleep() to prevent a race with
    memory hotplug. However lock_system_sleep() is a no-op if
    CONFIG_HIBERNATION=n. Therefore we need a new lock.

    Signed-off-by: KOSAKI Motohiro
    Cc: Andi Kleen
    Cc: Kamezawa Hiroyuki
    Suggested-by: Hugh Dickins
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

27 Oct, 2010

1 commit

  • Presently update_nr_listpages() doesn't have a role. That's because lists
    passed is always empty just after calling migrate_pages. The
    migrate_pages cleans up page list which have failed to migrate before
    returning by aaa994b3.

    [PATCH] page migration: handle freeing of pages in migrate_pages()

    Do not leave pages on the lists passed to migrate_pages(). Seems that we will
    not need any postprocessing of pages. This will simplify the handling of
    pages by the callers of migrate_pages().

    At that time, we thought we don't need any postprocessing of pages. But
    the situation is changed. The compaction need to know the number of
    failed to migrate for COMPACTPAGEFAILED stat

    This patch makes new rule for caller of migrate_pages to call
    putback_lru_pages. So caller need to clean up the lists so it has a
    chance to postprocess the pages. [suggested by Christoph Lameter]

    Signed-off-by: Minchan Kim
    Cc: Hugh Dickins
    Cc: Andi Kleen
    Reviewed-by: Mel Gorman
    Reviewed-by: Wu Fengguang
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim