27 Jul, 2016

1 commit

  • We have allowed migration for only LRU pages until now and it was enough
    to make high-order pages. But recently, embedded system(e.g., webOS,
    android) uses lots of non-movable pages(e.g., zram, GPU memory) so we
    have seen several reports about troubles of small high-order allocation.
    For fixing the problem, there were several efforts (e,g,. enhance
    compaction algorithm, SLUB fallback to 0-order page, reserved memory,
    vmalloc and so on) but if there are lots of non-movable pages in system,
    their solutions are void in the long run.

    So, this patch is to support facility to change non-movable pages with
    movable. For the feature, this patch introduces functions related to
    migration to address_space_operations as well as some page flags.

    If a driver want to make own pages movable, it should define three
    functions which are function pointers of struct
    address_space_operations.

    1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);

    What VM expects on isolate_page function of driver is to return *true*
    if driver isolates page successfully. On returing true, VM marks the
    page as PG_isolated so concurrent isolation in several CPUs skip the
    page for isolation. If a driver cannot isolate the page, it should
    return *false*.

    Once page is successfully isolated, VM uses page.lru fields so driver
    shouldn't expect to preserve values in that fields.

    2. int (*migratepage) (struct address_space *mapping,
    struct page *newpage, struct page *oldpage, enum migrate_mode);

    After isolation, VM calls migratepage of driver with isolated page. The
    function of migratepage is to move content of the old page to new page
    and set up fields of struct page newpage. Keep in mind that you should
    indicate to the VM the oldpage is no longer movable via
    __ClearPageMovable() under page_lock if you migrated the oldpage
    successfully and returns 0. If driver cannot migrate the page at the
    moment, driver can return -EAGAIN. On -EAGAIN, VM will retry page
    migration in a short time because VM interprets -EAGAIN as "temporal
    migration failure". On returning any error except -EAGAIN, VM will give
    up the page migration without retrying in this time.

    Driver shouldn't touch page.lru field VM using in the functions.

    3. void (*putback_page)(struct page *);

    If migration fails on isolated page, VM should return the isolated page
    to the driver so VM calls driver's putback_page with migration failed
    page. In this function, driver should put the isolated page back to the
    own data structure.

    4. non-lru movable page flags

    There are two page flags for supporting non-lru movable page.

    * PG_movable

    Driver should use the below function to make page movable under
    page_lock.

    void __SetPageMovable(struct page *page, struct address_space *mapping)

    It needs argument of address_space for registering migration family
    functions which will be called by VM. Exactly speaking, PG_movable is
    not a real flag of struct page. Rather than, VM reuses page->mapping's
    lower bits to represent it.

    #define PAGE_MAPPING_MOVABLE 0x2
    page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;

    so driver shouldn't access page->mapping directly. Instead, driver
    should use page_mapping which mask off the low two bits of page->mapping
    so it can get right struct address_space.

    For testing of non-lru movable page, VM supports __PageMovable function.
    However, it doesn't guarantee to identify non-lru movable page because
    page->mapping field is unified with other variables in struct page. As
    well, if driver releases the page after isolation by VM, page->mapping
    doesn't have stable value although it has PAGE_MAPPING_MOVABLE (Look at
    __ClearPageMovable). But __PageMovable is cheap to catch whether page
    is LRU or non-lru movable once the page has been isolated. Because LRU
    pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
    good for just peeking to test non-lru movable pages before more
    expensive checking with lock_page in pfn scanning to select victim.

    For guaranteeing non-lru movable page, VM provides PageMovable function.
    Unlike __PageMovable, PageMovable functions validates page->mapping and
    mapping->a_ops->isolate_page under lock_page. The lock_page prevents
    sudden destroying of page->mapping.

    Driver using __SetPageMovable should clear the flag via
    __ClearMovablePage under page_lock before the releasing the page.

    * PG_isolated

    To prevent concurrent isolation among several CPUs, VM marks isolated
    page as PG_isolated under lock_page. So if a CPU encounters PG_isolated
    non-lru movable page, it can skip it. Driver doesn't need to manipulate
    the flag because VM will set/clear it automatically. Keep in mind that
    if driver sees PG_isolated page, it means the page have been isolated by
    VM so it shouldn't touch page.lru field. PG_isolated is alias with
    PG_reclaim flag so driver shouldn't use the flag for own purpose.

    [opensource.ganesh@gmail.com: mm/compaction: remove local variable is_lru]
    Link: http://lkml.kernel.org/r/20160618014841.GA7422@leo-test
    Link: http://lkml.kernel.org/r/1464736881-24886-3-git-send-email-minchan@kernel.org
    Signed-off-by: Gioh Kim
    Signed-off-by: Minchan Kim
    Signed-off-by: Ganesh Mahendran
    Acked-by: Vlastimil Babka
    Cc: Sergey Senozhatsky
    Cc: Rik van Riel
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: Rafael Aquini
    Cc: Jonathan Corbet
    Cc: John Einar Reitan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

16 Apr, 2015

1 commit

  • Currently we take a naive approach to page flags on compound pages - we
    set the flag on the page without consideration if the flag makes sense
    for tail page or for compound page in general. This patchset try to
    sort this out by defining per-flag policy on what need to be done if
    page-flag helper operate on compound page.

    The last patch in the patchset also sanitizes usege of page->mapping for
    tail pages. We don't define the meaning of page->mapping for tail
    pages. Currently it's always NULL, which can be inconsistent with head
    page and potentially lead to problems.

    For now I caught one case of illegal usage of page flags or ->mapping:
    sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
    It leads to setting dirty bit on tail pages and access to tail_page's
    ->mapping. I don't see any bad behaviour caused by this, but worth
    fixing anyway.

    This patchset makes more sense if you take my THP refcounting into
    account: we will see more compound pages mapped with PTEs and we need to
    define behaviour of flags on compound pages to avoid bugs.

    This patch (of 16):

    We have page-flags helper function declarations/definitions spread over
    several header files. Let's consolidate them in .

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Acked-by: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Naoya Horiguchi
    Cc: Steve Capper
    Cc: "Aneesh Kumar K.V"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jerome Marchand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

22 Jan, 2014

3 commits

  • Now, we have an infrastructure in rmap_walk() to handle difference from
    variants of rmap traversing functions.

    So, just use it in page_referenced().

    In this patch, I change following things.

    1. remove some variants of rmap traversing functions.
    cf> page_referenced_ksm, page_referenced_anon,
    page_referenced_file

    2. introduce new struct page_referenced_arg and pass it to
    page_referenced_one(), main function of rmap_walk, in order to count
    reference, to store vm_flags and to check finish condition.

    3. mechanical change to use rmap_walk() in page_referenced().

    [liwanp@linux.vnet.ibm.com: fix BUG at rmap_walk]
    Signed-off-by: Joonsoo Kim
    Reviewed-by: Naoya Horiguchi
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Ingo Molnar
    Cc: Hillf Danton
    Signed-off-by: Wanpeng Li
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • Now, we have an infrastructure in rmap_walk() to handle difference from
    variants of rmap traversing functions.

    So, just use it in try_to_munlock().

    In this patch, I change following things.

    1. remove some variants of rmap traversing functions.
    cf> try_to_unmap_ksm, try_to_unmap_anon, try_to_unmap_file
    2. mechanical change to use rmap_walk() in try_to_munlock().
    3. copy and paste comments.

    Signed-off-by: Joonsoo Kim
    Reviewed-by: Naoya Horiguchi
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Ingo Molnar
    Cc: Hillf Danton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • In each rmap traverse case, there is some difference so that we need
    function pointers and arguments to them in order to handle these

    For this purpose, struct rmap_walk_control is introduced in this patch,
    and will be extended in following patch. Introducing and extending are
    separate, because it clarify changes.

    Signed-off-by: Joonsoo Kim
    Reviewed-by: Naoya Horiguchi
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Ingo Molnar
    Cc: Hillf Danton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

24 Feb, 2013

1 commit

  • Switching merge_across_nodes after running KSM is liable to oops on stale
    nodes still left over from the previous stable tree. It's not something
    that people will often want to do, but it would be lame to demand a reboot
    when they're trying to determine which merge_across_nodes setting is best.

    How can this happen? We only permit switching merge_across_nodes when
    pages_shared is 0, and usually set run 2 to force that beforehand, which
    ought to unmerge everything: yet oopses still occur when you then run 1.

    Three causes:

    1. The old stable tree (built according to the inverse
    merge_across_nodes) has not been fully torn down. A stable node
    lingers until get_ksm_page() notices that the page it references no
    longer references it: but the page is not necessarily freed as soon as
    expected, particularly when swapcache.

    Fix this with a pass through the old stable tree, applying
    get_ksm_page() to each of the remaining nodes (most found stale and
    removed immediately), with forced removal of any left over. Unless the
    page is still mapped: I've not seen that case, it shouldn't occur, but
    better to WARN_ON_ONCE and EBUSY than BUG.

    2. __ksm_enter() has a nice little optimization, to insert the new mm
    just behind ksmd's cursor, so there's a full pass for it to stabilize
    (or be removed) before ksmd addresses it. Nice when ksmd is running,
    but not so nice when we're trying to unmerge all mms: we were missing
    those mms forked and inserted behind the unmerge cursor. Easily fixed
    by inserting at the end when KSM_RUN_UNMERGE.

    3. It is possible for a KSM page to be faulted back from swapcache
    into an mm, just after unmerge_and_remove_all_rmap_items() scanned past
    it. Fix this by copying on fault when KSM_RUN_UNMERGE: but that is
    private to ksm.c, so dissolve the distinction between
    ksm_might_need_to_copy() and ksm_does_need_to_copy(), doing it all in
    the one call into ksm.c.

    A long outstanding, unrelated bugfix sneaks in with that third fix:
    ksm_does_need_to_copy() would copy from a !PageUptodate page (implying I/O
    error when read in from swap) to a page which it then marks Uptodate. Fix
    this case by not copying, letting do_swap_page() discover the error.

    Signed-off-by: Hugh Dickins
    Cc: Rik van Riel
    Cc: Petr Holasek
    Cc: Andrea Arcangeli
    Cc: Izik Eidus
    Cc: Gerald Schaefer
    Cc: KOSAKI Motohiro
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

10 Sep, 2010

1 commit

  • The pte_same check is reliable only if the swap entry remains pinned (by
    the page lock on swapcache). We've also to ensure the swapcache isn't
    removed before we take the lock as try_to_free_swap won't care about the
    page pin.

    One of the possible impacts of this patch is that a KSM-shared page can
    point to the anon_vma of another process, which could exit before the page
    is freed.

    This can leave a page with a pointer to a recycled anon_vma object, or
    worse, a pointer to something that is no longer an anon_vma.

    [riel@redhat.com: changelog help]
    Signed-off-by: Andrea Arcangeli
    Acked-by: Hugh Dickins
    Reviewed-by: Rik van Riel
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

10 Aug, 2010

1 commit

  • The new anon-vma code, was suboptimal and it lead to erratic invocation of
    ksm_does_need_to_copy. That leads to host hangs or guest vnc lockup, or
    weird behavior. It's unclear why ksm_does_need_to_copy is unstable but
    the point is that when KSM is not in use, ksm_does_need_to_copy must never
    run or we bounce pages for no good reason. I suspect the same hangs will
    happen with KVM swaps. But this at least fixes the regression in the
    new-anon-vma code and it only let KSM bugs triggers when KSM is in use.

    The code in do_swap_page likely doesn't cope well with a not-swapcache,
    especially the memcg code.

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Rik van Riel
    Cc: Izik Eidus
    Cc: Avi Kivity
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

16 Dec, 2009

5 commits

  • Commit 5ad6468801d28c4d4ac9f48ec19297817c915f6a "ksm: let shared pages
    be swappable" breaks the build on m68knommu and I suspect on any nommu:

    In file included from kernel/fork.c:52:
    include/linux/ksm.h:129: warning: 'enum ttu_flags' declared inside parameter list
    include/linux/ksm.h:129: warning: its scope is only this definition or declaration, which is probably not what you want
    include/linux/ksm.h:129: error: parameter 2 ('flags') has incomplete type
    make[1]: *** [kernel/fork.o] Error 1
    make[1]: *** Waiting for unfinished jobs....

    Let's fix that with CONFIG_MMU around most of the !CONFIG_KSM declarations.

    Reported-by: Steven King
    Signed-off-by: Hugh Dickins
    Tested-by: Steven King
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • A side-effect of making ksm pages swappable is that they have to be placed
    on the LRUs: which then exposes them to isolate_lru_page() and hence to
    page migration.

    Add rmap_walk() for remove_migration_ptes() to use: rmap_walk_anon() and
    rmap_walk_file() in rmap.c, but rmap_walk_ksm() in ksm.c. Perhaps some
    consolidation with existing code is possible, but don't attempt that yet
    (try_to_unmap needs to handle nonlinears, but migration pte removal does
    not).

    rmap_walk() is sadly less general than it appears: rmap_walk_anon(), like
    remove_anon_migration_ptes() which it replaces, avoids calling
    page_lock_anon_vma(), because that includes a page_mapped() test which
    fails when all migration ptes are in place. That was valid when NUMA page
    migration was introduced (holding mmap_sem provided the missing guarantee
    that anon_vma's slab had not already been destroyed), but I believe not
    valid in the memory hotremove case added since.

    For now do the same as before, and consider the best way to fix that
    unlikely race later on. When fixed, we can probably use rmap_walk() on
    hwpoisoned ksm pages too: for now, they remain among hwpoison's various
    exceptions (its PageKsm test comes before the page is locked, but its
    page_lock_anon_vma fails safely if an anon gets upgraded).

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Initial implementation for swapping out KSM's shared pages: add
    page_referenced_ksm() and try_to_unmap_ksm(), which rmap.c calls when
    faced with a PageKsm page.

    Most of what's needed can be got from the rmap_items listed from the
    stable_node of the ksm page, without discovering the actual vma: so in
    this patch just fake up a struct vma for page_referenced_one() or
    try_to_unmap_one(), then refine that in the next patch.

    Add VM_NONLINEAR to ksm_madvise()'s list of exclusions: it has always been
    implicit there (being only set with VM_SHARED, already excluded), but
    let's make it explicit, to help justify the lack of nonlinear unmap.

    Rely on the page lock to protect against concurrent modifications to that
    page's node of the stable tree.

    The awkward part is not swapout but swapin: do_swap_page() and
    page_add_anon_rmap() now have to allow for new possibilities - perhaps a
    ksm page still in swapcache, perhaps a swapcache page associated with one
    location in one anon_vma now needed for another location or anon_vma.
    (And the vma might even be no longer VM_MERGEABLE when that happens.)

    ksm_might_need_to_copy() checks for that case, and supplies a duplicate
    page when necessary, simply leaving it to a subsequent pass of ksmd to
    rediscover the identity and merge them back into one ksm page.
    Disappointingly primitive: but the alternative would have to accumulate
    unswappable info about the swapped out ksm pages, limiting swappability.

    Remove page_add_ksm_rmap(): page_add_anon_rmap() now has to allow for the
    particular case it was handling, so just use it instead.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Add a pointer to the ksm page into struct stable_node, holding a reference
    to the page while the node exists. Put a pointer to the stable_node into
    the ksm page's ->mapping.

    Then we don't need get_ksm_page() while traversing the stable tree: the
    page to compare against is sure to be present and correct, even if it's no
    longer visible through any of its existing rmap_items.

    And we can handle the forked ksm page case more efficiently: no need to
    memcmp our way through the tree to find its match.

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit set
    in page->mapping, with the higher bits a pointer to the anon_vma; and have
    defined PageKsm(page) as that with NULL anon_vma.

    But KSM swapping will need to store a pointer there: so in preparation for
    that, now define PAGE_MAPPING_FLAGS as the low two bits, including
    PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some
    other use for the bit emerges).

    Declare page_rmapping(page) to return the pointer part of page->mapping,
    and page_anon_vma(page) to return the anon_vma pointer when that's what it
    is. Use these in a few appropriate places: notably, unuse_vma() has been
    testing page->mapping, but is better to be testing page_anon_vma() (cases
    may be added in which flag bits are set without any pointer).

    Signed-off-by: Hugh Dickins
    Cc: Izik Eidus
    Cc: Andrea Arcangeli
    Cc: Nick Piggin
    Cc: KOSAKI Motohiro
    Reviewed-by: Rik van Riel
    Cc: Lee Schermerhorn
    Cc: Andi Kleen
    Cc: KAMEZAWA Hiroyuki
    Cc: Wu Fengguang
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

22 Sep, 2009

5 commits

  • A few cleanups, given the munlock fix: the comment on ksm_test_exit() no
    longer applies, and it can be made private to ksm.c; there's no more
    reference to mmu_gather or tlb.h, and mmap.c doesn't need ksm.h.

    Signed-off-by: Hugh Dickins
    Acked-by: Izik Eidus
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Rawhide users have reported hang at startup when cryptsetup is run: the
    same problem can be simply reproduced by running a program int main() {
    mlockall(MCL_CURRENT | MCL_FUTURE); return 0; }

    The problem is that exit_mmap() applies munlock_vma_pages_all() to
    clean up VM_LOCKED areas, and its current implementation (stupidly)
    tries to fault in absent pages, for example where PROT_NONE prevented
    them being faulted in when mlocking. Whereas the "ksm: fix oom
    deadlock" patch, knowing there's a race by which KSM might try to fault
    in pages after exit_mmap() had finally zapped the range, backs out of
    such faults doing nothing when its ksm_test_exit() notices mm_users 0.

    So revert that part of "ksm: fix oom deadlock" which moved the
    ksm_exit() call from before exit_mmap() to the middle of exit_mmap();
    and remove those ksm_test_exit() checks from the page fault paths, so
    allowing the munlocking to proceed without interference.

    ksm_exit, if there are rmap_items still chained on this mm slot, takes
    mmap_sem write side: so preventing KSM from working on an mm while
    exit_mmap runs. And KSM will bail out as soon as it notices that
    mm_users is already zero, thanks to its internal ksm_test_exit checks.
    So that when a task is killed by OOM killer or the user, KSM will not
    indefinitely prevent it from running exit_mmap to release its memory.

    This does break a part of what "ksm: fix oom deadlock" was trying to
    achieve. When unmerging KSM (echo 2 >/sys/kernel/mm/ksm), and even
    when ksmd itself has to cancel a KSM page, it is possible that the
    first OOM-kill victim would be the KSM process being faulted: then its
    memory won't be freed until a second victim has been selected (freeing
    memory for the unmerging fault to complete).

    But the OOM killer is already liable to kill a second victim once the
    intended victim's p->mm goes to NULL: so there's not much point in
    rejecting this KSM patch before fixing that OOM behaviour. It is very
    much more important to allow KSM users to boot up, than to haggle over
    an unlikely and poorly supported OOM case.

    We also intend to fix munlocking to not fault pages: at which point
    this patch _could_ be reverted; though that would be controversial, so
    we hope to find a better solution.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Justin M. Forbes
    Acked-for-now-by: Hugh Dickins
    Cc: Izik Eidus
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • There's a now-obvious deadlock in KSM's out-of-memory handling:
    imagine ksmd or KSM_RUN_UNMERGE handling, holding ksm_thread_mutex,
    trying to allocate a page to break KSM in an mm which becomes the
    OOM victim (quite likely in the unmerge case): it's killed and goes
    to exit, and hangs there waiting to acquire ksm_thread_mutex.

    Clearly we must not require ksm_thread_mutex in __ksm_exit, simple
    though that made everything else: perhaps use mmap_sem somehow?
    And part of the answer lies in the comments on unmerge_ksm_pages:
    __ksm_exit should also leave all the rmap_item removal to ksmd.

    But there's a fundamental problem, that KSM relies upon mmap_sem to
    guarantee the consistency of the mm it's dealing with, yet exit_mmap
    tears down an mm without taking mmap_sem. And bumping mm_users won't
    help at all, that just ensures that the pages the OOM killer assumes
    are on their way to being freed will not be freed.

    The best answer seems to be, to move the ksm_exit callout from just
    before exit_mmap, to the middle of exit_mmap: after the mm's pages
    have been freed (if the mmu_gather is flushed), but before its page
    tables and vma structures have been freed; and down_write,up_write
    mmap_sem there to serialize with KSM's own reliance on mmap_sem.

    But KSM then needs to be careful, whenever it downs mmap_sem, to
    check that the mm is not already exiting: there's a danger of using
    find_vma on a layout that's being torn apart, or writing into page
    tables which have been freed for reuse; and even do_anonymous_page
    and __do_fault need to check they're not being called by break_ksm
    to reinstate a pte after zap_pte_range has zapped that page table.

    Though it might be clearer to add an exiting flag, set while holding
    mmap_sem in __ksm_exit, that wouldn't cover the issue of reinstating
    a zapped pte. All we need is to check whether mm_users is 0 - but
    must remember that ksmd may detect that before __ksm_exit is reached.
    So, ksm_test_exit(mm) added to comment such checks on mm->mm_users.

    __ksm_exit now has to leave clearing up the rmap_items to ksmd,
    that needs ksm_thread_mutex; but shift the exiting mm just after the
    ksm_scan cursor so that it will soon be dealt with. __ksm_enter raise
    mm_count to hold the mm_struct, ksmd's exit processing (exactly like
    its processing when it finds all VM_MERGEABLEs unmapped) mmdrop it,
    similar procedure for KSM_RUN_UNMERGE (which has stopped ksmd).

    But also give __ksm_exit a fast path: when there's no complication
    (no rmap_items attached to mm and it's not at the ksm_scan cursor),
    it can safely do all the exiting work itself. This is not just an
    optimization: when ksmd is not running, the raised mm_count would
    otherwise leak mm_structs.

    Signed-off-by: Hugh Dickins
    Acked-by: Izik Eidus
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • KSM will need to identify its kernel merged pages unambiguously, and
    /proc/kpageflags will probably like to do so too.

    Since KSM will only be substituting anonymous pages, statistics are best
    preserved by making a PageKsm page a special PageAnon page: one with no
    anon_vma.

    But KSM then needs its own page_add_ksm_rmap() - keep it in ksm.h near
    PageKsm; and do_wp_page() must COW them, unlike singly mapped PageAnons.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Chris Wright
    Signed-off-by: Izik Eidus
    Cc: Wu Fengguang
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: Wu Fengguang
    Cc: Balbir Singh
    Cc: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Lee Schermerhorn
    Cc: Avi Kivity
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • This patch presents the mm interface to a dummy version of ksm.c, for
    better scrutiny of that interface: the real ksm.c follows later.

    When CONFIG_KSM is not set, madvise(2) reject MADV_MERGEABLE and
    MADV_UNMERGEABLE with EINVAL, since that seems more helpful than
    pretending that they can be serviced. But when CONFIG_KSM=y, accept them
    even if KSM is not currently running, and even on areas which KSM will not
    touch (e.g. hugetlb or shared file or special driver mappings).

    Like other madvices, report ENOMEM despite success if any area in the
    range is unmapped, and use EAGAIN to report out of memory.

    Define vma flag VM_MERGEABLE to identify an area on which KSM may try
    merging pages: leave it to ksm_madvise() to decide whether to set it.
    Define mm flag MMF_VM_MERGEABLE to identify an mm which might contain
    VM_MERGEABLE areas, to minimize callouts when forking or exiting.

    Based upon earlier patches by Chris Wright and Izik Eidus.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Chris Wright
    Signed-off-by: Izik Eidus
    Cc: Michael Kerrisk
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: Wu Fengguang
    Cc: Balbir Singh
    Cc: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Lee Schermerhorn
    Cc: Avi Kivity
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins