26 Sep, 2014

2 commits

  • commit 2329d3751b082b4fd354f334a88662d72abac52d upstream.

    In mm/swap.c, __lru_cache_add() is exported, but actually there are no
    users outside this file.

    This patch unexports __lru_cache_add(), and makes it static. It also
    exports lru_cache_add_file(), as it is use by cifs and fuse, which can
    loaded as modules.

    Signed-off-by: Jianyu Zhan
    Cc: Minchan Kim
    Cc: Johannes Weiner
    Cc: Shaohua Li
    Cc: Bob Liu
    Cc: Seth Jennings
    Cc: Joonsoo Kim
    Cc: Rafael Aquini
    Cc: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Khalid Aziz
    Cc: Christoph Hellwig
    Reviewed-by: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Mel Gorman
    Signed-off-by: Jiri Slaby

    Jianyu Zhan
     
  • commit 0cd6144aadd2afd19d1aca880153530c52957604 upstream.

    shmem mappings already contain exceptional entries where swap slot
    information is remembered.

    To be able to store eviction information for regular page cache, prepare
    every site dealing with the radix trees directly to handle entries other
    than pages.

    The common lookup functions will filter out non-page entries and return
    NULL for page cache holes, just as before. But provide a raw version of
    the API which returns non-page entries as well, and switch shmem over to
    use it.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Mel Gorman
    Signed-off-by: Jiri Slaby

    Johannes Weiner
     

03 Apr, 2014

1 commit

  • commit 668f9abbd4334e6c29fa8acd71635c4f9101caa7 upstream.

    Commit bf6bddf1924e ("mm: introduce compaction and migration for
    ballooned pages") introduces page_count(page) into memory compaction
    which dereferences page->first_page if PageTail(page).

    This results in a very rare NULL pointer dereference on the
    aforementioned page_count(page). Indeed, anything that does
    compound_head(), including page_count() is susceptible to racing with
    prep_compound_page() and seeing a NULL or dangling page->first_page
    pointer.

    This patch uses Andrea's implementation of compound_trans_head() that
    deals with such a race and makes it the default compound_head()
    implementation. This includes a read memory barrier that ensures that
    if PageTail(head) is true that we return a head page that is neither
    NULL nor dangling. The patch then adds a store memory barrier to
    prep_compound_page() to ensure page->first_page is set.

    This is the safest way to ensure we see the head page that we are
    expecting, PageTail(page) is already in the unlikely() path and the
    memory barriers are unfortunately required.

    Hugetlbfs is the exception, we don't enforce a store memory barrier
    during init since no race is possible.

    Signed-off-by: David Rientjes
    Cc: Holger Kiehl
    Cc: Christoph Lameter
    Cc: Rafael Aquini
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Jiri Slaby

    David Rientjes
     

07 Feb, 2014

1 commit

  • commit 27c73ae759774e63313c1fbfeb17ba076cea64c5 upstream.

    Commit 7cb2ef56e6a8 ("mm: fix aio performance regression for database
    caused by THP") can cause dereference of a dangling pointer if
    split_huge_page runs during PageHuge() if there are updates to the
    tail_page->private field.

    Also it is repeating compound_head twice for hugetlbfs and it is running
    compound_head+compound_trans_head for THP when a single one is needed in
    both cases.

    The new code within the PageSlab() check doesn't need to verify that the
    THP page size is never bigger than the smallest hugetlbfs page size, to
    avoid memory corruption.

    A longstanding theoretical race condition was found while fixing the
    above (see the change right after the skip_unlock label, that is
    relevant for the compound_lock path too).

    By re-establishing the _mapcount tail refcounting for all compound
    pages, this also fixes the below problem:

    echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

    BUG: Bad page state in process bash pfn:59a01
    page:ffffea000139b038 count:0 mapcount:10 mapping: (null) index:0x0
    page flags: 0x1c00000000008000(tail)
    Modules linked in:
    CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    Call Trace:
    dump_stack+0x55/0x76
    bad_page+0xd5/0x130
    free_pages_prepare+0x213/0x280
    __free_pages+0x36/0x80
    update_and_free_page+0xc1/0xd0
    free_pool_huge_page+0xc2/0xe0
    set_max_huge_pages.part.58+0x14c/0x220
    nr_hugepages_store_common.isra.60+0xd0/0xf0
    nr_hugepages_store+0x13/0x20
    kobj_attr_store+0xf/0x20
    sysfs_write_file+0x189/0x1e0
    vfs_write+0xc5/0x1f0
    SyS_write+0x55/0xb0
    system_call_fastpath+0x16/0x1b

    Signed-off-by: Khalid Aziz
    Signed-off-by: Andrea Arcangeli
    Tested-by: Khalid Aziz
    Cc: Pravin Shelar
    Cc: Greg Kroah-Hartman
    Cc: Ben Hutchings
    Cc: Christoph Lameter
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Andi Kleen
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Cc: Guillaume Morin
    Signed-off-by: Greg Kroah-Hartman

    Andrea Arcangeli
     

13 Sep, 2013

1 commit

  • make lru_add_drain_all() only selectively interrupt the cpus that have
    per-cpu free pages that can be drained.

    This is important in nohz mode where calling mlockall(), for example,
    otherwise will interrupt every core unnecessarily.

    This is important on workloads where nohz cores are handling 10 Gb traffic
    in userspace. Those CPUs do not enter the kernel and place pages into LRU
    pagevecs and they really, really don't want to be interrupted, or they
    drop packets on the floor.

    Signed-off-by: Chris Metcalf
    Reviewed-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     

12 Sep, 2013

1 commit

  • I am working with a tool that simulates oracle database I/O workload.
    This tool (orion to be specific -
    )
    allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag. It then
    does aio into these pages from flash disks using various common block
    sizes used by database. I am looking at performance with two of the most
    common block sizes - 1M and 64K. aio performance with these two block
    sizes plunged after Transparent HugePages was introduced in the kernel.
    Here are performance numbers:

    pre-THP 2.6.39 3.11-rc5
    1M read 8384 MB/s 5629 MB/s 6501 MB/s
    64K read 7867 MB/s 4576 MB/s 4251 MB/s

    I have narrowed the performance impact down to the overheads introduced by
    THP in __get_page_tail() and put_compound_page() routines. perf top shows
    >40% of cycles being spent in these two routines. Every time direct I/O
    to hugetlbfs pages starts, kernel calls get_page() to grab a reference to
    the pages and calls put_page() when I/O completes to put the reference
    away. THP introduced significant amount of locking overhead to get_page()
    and put_page() when dealing with compound pages because hugepages can be
    split underneath get_page() and put_page(). It added this overhead
    irrespective of whether it is dealing with hugetlbfs pages or transparent
    hugepages. This resulted in 20%-45% drop in aio performance when using
    hugetlbfs pages.

    Since hugetlbfs pages can not be split, there is no reason to go through
    all the locking overhead for these pages from what I can see. I added
    code to __get_page_tail() and put_compound_page() to bypass all the
    locking code when working with hugetlbfs pages. This improved performance
    significantly. Performance numbers with this patch:

    pre-THP 3.11-rc5 3.11-rc5 + Patch
    1M read 8384 MB/s 6501 MB/s 8371 MB/s
    64K read 7867 MB/s 4251 MB/s 6510 MB/s

    Performance with 64K read is still lower than what it was before THP, but
    still a 53% improvement. It does mean there is more work to be done but I
    will take a 53% improvement for now.

    Please take a look at the following patch and let me know if it looks
    reasonable.

    [akpm@linux-foundation.org: tweak comments]
    Signed-off-by: Khalid Aziz
    Cc: Pravin B Shelar
    Cc: Christoph Lameter
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Minchan Kim
    Cc: Andi Kleen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Khalid Aziz
     

01 Aug, 2013

2 commits

  • active/inactive lru lists can contain unevicable pages (i.e. ramfs pages
    that have been placed on the LRU lists when first allocated), but these
    pages must not have PageUnevictable set - otherwise shrink_[in]active_list
    goes crazy:

    kernel BUG at /home/space/kas/git/public/linux-next/mm/vmscan.c:1122!

    1090 static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
    1091 struct lruvec *lruvec, struct list_head *dst,
    1092 unsigned long *nr_scanned, struct scan_control *sc,
    1093 isolate_mode_t mode, enum lru_list lru)
    1094 {
    ...
    1108 switch (__isolate_lru_page(page, mode)) {
    1109 case 0:
    ...
    1116 case -EBUSY:
    ...
    1121 default:
    1122 BUG();
    1123 }
    1124 }
    ...
    1130 }

    __isolate_lru_page() returns EINVAL for PageUnevictable(page).

    For lru_add_page_tail(), it means we should not set PageUnevictable()
    for tail pages unless we're sure that it will go to LRU_UNEVICTABLE.
    Let's just copy PG_active and PG_unevictable from head page in
    __split_huge_page_refcount(), it will simplify lru_add_page_tail().

    This will fix one more bug in lru_add_page_tail(): if
    page_evictable(page_tail) is false and PageLRU(page) is true, page_tail
    will go to the same lru as page, but nobody cares to sync page_tail
    active/inactive state with page. So we can end up with inactive page on
    active lru. The patch will fix it as well since we copy PG_active from
    head page.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Dave Hansen
    Cc: Naoya Horiguchi
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • As a result of commit 13f7f78981e4 ("mm: pagevec: defer deciding which
    LRU to add a page to until pagevec drain time"), pages on unevictable
    lists can have both of PageActive and PageUnevictable set. This is not
    only confusing, but also corrupts page migration and
    shrink_[in]active_list.

    This patch fixes the problem by adding ClearPageActive before adding
    pages into unevictable list. It also cleans up VM_BUG_ONs.

    Signed-off-by: Naoya Horiguchi
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     

04 Jul, 2013

5 commits

  • Similar to __pagevec_lru_add, this patch removes the LRU parameter from
    __lru_cache_add and lru_cache_add_lru as the caller does not control the
    exact LRU the page gets added to. lru_cache_add_lru gets renamed to
    lru_cache_add the name is silly without the lru parameter. With the
    parameter removed, it is required that the caller indicate if they want
    the page added to the active or inactive list by setting or clearing
    PageActive respectively.

    [akpm@linux-foundation.org: Suggested the patch]
    [gang.chen@asianux.com: fix used-unintialized warning]
    Signed-off-by: Mel Gorman
    Signed-off-by: Chen Gang
    Cc: Jan Kara
    Cc: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Alexey Lyahkov
    Cc: Andrew Perepechko
    Cc: Robin Dong
    Cc: Theodore Tso
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Bernd Schubert
    Cc: David Howells
    Cc: Trond Myklebust
    Cc: Mel Gorman
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Now that the LRU to add a page to is decided at LRU-add time, remove the
    misleading lru parameter from __pagevec_lru_add. A consequence of this
    is that the pagevec_lru_add_file, pagevec_lru_add_anon and similar
    helpers are misleading as the caller no longer has direct control over
    what LRU the page is added to. Unused helpers are removed by this patch
    and existing users of pagevec_lru_add_file() are converted to use
    lru_cache_add_file() directly and use the per-cpu pagevecs instead of
    creating their own pagevec.

    Signed-off-by: Mel Gorman
    Reviewed-by: Jan Kara
    Reviewed-by: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Alexey Lyahkov
    Cc: Andrew Perepechko
    Cc: Robin Dong
    Cc: Theodore Tso
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Bernd Schubert
    Cc: David Howells
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • If a page is on a pagevec then it is !PageLRU and mark_page_accessed()
    may fail to move a page to the active list as expected. Now that the
    LRU is selected at LRU drain time, mark pages PageActive if they are on
    the local pagevec so it gets moved to the correct list at LRU drain
    time. Using a debugging patch it was found that for a simple git
    checkout based workload that pages were never added to the active file
    list in practice but with this patch applied they are.

    before after
    LRU Add Active File 0 750583
    LRU Add Active Anon 2640587 2702818
    LRU Add Inactive File 8833662 8068353
    LRU Add Inactive Anon 207 200

    Note that only pages on the local pagevec are considered on purpose. A
    !PageLRU page could be in the process of being released, reclaimed,
    migrated or on a remote pagevec that is currently being drained.
    Marking it PageActive is vunerable to races where PageLRU and Active
    bits are checked at the wrong time. Page reclaim will trigger
    VM_BUG_ONs but depending on when the race hits, it could also free a
    PageActive page to the page allocator and trigger a bad_page warning.
    Similarly a potential race exists between a per-cpu drain on a pagevec
    list and an activation on a remote CPU.

    lru_add_drain_cpu
    __pagevec_lru_add
    lru = page_lru(page);
    mark_page_accessed
    if (PageLRU(page))
    activate_page
    else
    SetPageActive
    SetPageLRU(page);
    add_page_to_lru_list(page, lruvec, lru);

    In this case a PageActive page is added to the inactivate list and later
    the inactive/active stats will get skewed. While the PageActive checks
    in vmscan could be removed and potentially dealt with, a skew in the
    statistics would be very difficult to detect. Hence this patch deals
    just with the common case where a page being marked accessed has just
    been added to the local pagevec.

    Signed-off-by: Mel Gorman
    Cc: Jan Kara
    Cc: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Alexey Lyahkov
    Cc: Andrew Perepechko
    Cc: Robin Dong
    Cc: Theodore Tso
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Bernd Schubert
    Cc: David Howells
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • mark_page_accessed() cannot activate an inactive page that is located on
    an inactive LRU pagevec. Hints from filesystems may be ignored as a
    result. In preparation for fixing that problem, this patch removes the
    per-LRU pagevecs and leaves just one pagevec. The final LRU the page is
    added to is deferred until the pagevec is drained.

    This means that fewer pagevecs are available and potentially there is
    greater contention on the LRU lock. However, this only applies in the
    case where there is an almost perfect mix of file, anon, active and
    inactive pages being added to the LRU. In practice I expect that we are
    adding stream of pages of a particular time and that the changes in
    contention will barely be measurable.

    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Jan Kara
    Acked-by: Johannes Weiner
    Cc: Alexey Lyahkov
    Cc: Andrew Perepechko
    Cc: Robin Dong
    Cc: Theodore Tso
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Bernd Schubert
    Cc: David Howells
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Andrew Perepechko reported a problem whereby pages are being prematurely
    evicted as the mark_page_accessed() hint is ignored for pages that are
    currently on a pagevec --
    http://www.spinics.net/lists/linux-ext4/msg37340.html .

    Alexey Lyahkov and Robin Dong have also reported problems recently that
    could be due to hot pages reaching the end of the inactive list too
    quickly and be reclaimed.

    Rather than addressing this on a per-filesystem basis, this series aims
    to fix the mark_page_accessed() interface by deferring what LRU a page
    is added to pagevec drain time and allowing mark_page_accessed() to call
    SetPageActive on a pagevec page.

    Patch 1 adds two tracepoints for LRU page activation and insertion. Using
    these processes it's possible to build a model of pages in the
    LRU that can be processed offline.

    Patch 2 defers making the decision on what LRU to add a page to until when
    the pagevec is drained.

    Patch 3 searches the local pagevec for pages to mark PageActive on
    mark_page_accessed. The changelog explains why only the local
    pagevec is examined.

    Patches 4 and 5 tidy up the API.

    postmark, a dd-based test and fs-mark both single and threaded mode were
    run but none of them showed any performance degradation or gain as a
    result of the patch.

    Using patch 1, I built a *very* basic model of the LRU to examine
    offline what the average age of different page types on the LRU were in
    milliseconds. Of course, capturing the trace distorts the test as it's
    written to local disk but it does not matter for the purposes of this
    test. The average age of pages in milliseconds were

    vanilla deferdrain
    Average age mapped anon: 1454 1250
    Average age mapped file: 127841 155552
    Average age unmapped anon: 85 235
    Average age unmapped file: 73633 38884
    Average age unmapped buffers: 74054 116155

    The LRU activity was mostly files which you'd expect for a dd-based
    workload. Note that the average age of buffer pages is increased by the
    series and it is expected this is due to the fact that the buffer pages
    are now getting added to the active list when drained from the pagevecs.
    Note that the average age of the unmapped file data is decreased as they
    are still added to the inactive list and are reclaimed before the
    buffers.

    There is no guarantee this is a universal win for all workloads and it
    would be nice if the filesystem people gave some thought as to whether
    this decision is generally a win or a loss.

    This patch:

    Using these tracepoints it is possible to model LRU activity and the
    average residency of pages of different types. This can be used to
    debug problems related to premature reclaim of pages of particular
    types.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Jan Kara
    Cc: Johannes Weiner
    Cc: Alexey Lyahkov
    Cc: Andrew Perepechko
    Cc: Robin Dong
    Cc: Theodore Tso
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Bernd Schubert
    Cc: David Howells
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

08 May, 2013

1 commit

  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

30 Apr, 2013

1 commit

  • In page reclaim, huge page is split. split_huge_page() adds tail pages
    to LRU list. Since we are reclaiming a huge page, it's better we
    reclaim all subpages of the huge page instead of just the head page.
    This patch adds split tail pages to shrink page list so the tail pages
    can be reclaimed soon.

    Before this patch, run a swap workload:
    thp_fault_alloc 3492
    thp_fault_fallback 608
    thp_collapse_alloc 6
    thp_collapse_alloc_failed 0
    thp_split 916

    With this patch:
    thp_fault_alloc 4085
    thp_fault_fallback 16
    thp_collapse_alloc 90
    thp_collapse_alloc_failed 0
    thp_split 1272

    fallback allocation is reduced a lot.

    [akpm@linux-foundation.org: fix CONFIG_SWAP=n build]
    Signed-off-by: Shaohua Li
    Acked-by: Rik van Riel
    Acked-by: Minchan Kim
    Acked-by: Johannes Weiner
    Reviewed-by: Wanpeng Li
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

24 Feb, 2013

1 commit

  • When I use several fast SSD to do swap, swapper_space.tree_lock is
    heavily contended. This makes each swap partition have one
    address_space to reduce the lock contention. There is an array of
    address_space for swap. The swap entry type is the index to the array.

    In my test with 3 SSD, this increases the swapout throughput 20%.

    [akpm@linux-foundation.org: revert unneeded change to __add_to_swap_cache]
    Signed-off-by: Shaohua Li
    Cc: Hugh Dickins
    Acked-by: Rik van Riel
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

09 Oct, 2012

2 commits

  • page_evictable(page, vma) is an irritant: almost all its callers pass
    NULL for vma. Remove the vma arg and use mlocked_vma_newpage(vma, page)
    explicitly in the couple of places it's needed. But in those places we
    don't even need page_evictable() itself! They're dealing with a freshly
    allocated anonymous page, which has no "mapping" and cannot be mlocked yet.

    Signed-off-by: Hugh Dickins
    Acked-by: Mel Gorman
    Cc: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Michel Lespinasse
    Cc: Ying Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • When writing a new file with 2048 bytes buffer, such as write(fd, buffer,
    2048), it will call generic_perform_write() twice for every page:

    write_begin
    mark_page_accessed(page)
    write_end

    write_begin
    mark_page_accessed(page)
    write_end

    Pages 1-13 will be added to lru-pvecs in write_begin() and will *NOT* be
    added to active_list even they have be accessed twice because they are not
    PageLRU(page). But when page 14th comes, all pages in lru-pvecs will be
    moved to inactive_list (by __lru_cache_add() ) in first write_begin(), now
    page 14th *is* PageLRU(page). And after second write_end() only page 14th
    will be in active_list.

    In Hadoop environment, we do comes to this situation: after writing a
    file, we find out that only 14th, 28th, 42th... page are in active_list
    and others in inactive_list. Now kswapd works, shrinks the inactive_list,
    the file only have 14th, 28th...pages in memory, the readahead request
    size will be broken to only 52k (13*4k), system's performance falls
    dramatically.

    This problem can also replay by below steps (the machine has 8G memory):

    1. dd if=/dev/zero of=/test/file.out bs=1024 count=1048576
    2. cat another 7.5G file to /dev/null
    3. vmtouch -m 1G -v /test/file.out, it will show:

    /test/file.out
    [oooooooooooooooooooOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 187847/262144

    the 'o' means same pages are in memory but same are not.

    The solution for this problem is simple: the 14th page should be added to
    lru_add_pvecs before mark_page_accessed() just as other pages.

    [akpm@linux-foundation.org: tweak comment]
    [akpm@linux-foundation.org: grab better comment from the v3 patch]
    Signed-off-by: Robin Dong
    Reviewed-by: Minchan Kim
    Cc: KOSAKI Motohiro
    Reviewed-by: Johannes Weiner
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Dong
     

01 Aug, 2012

2 commits

  • The patch "mm: add support for a filesystem to activate swap files and use
    direct_IO for writing swap pages" added support for using direct_IO to
    write swap pages but it is insufficient for highmem pages.

    To support highmem pages, this patch kmaps() the page before calling the
    direct_IO() handler. As direct_IO deals with virtual addresses an
    additional helper is necessary for get_kernel_pages() to lookup the struct
    page for a kmap virtual address.

    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • This patch adds two new APIs get_kernel_pages() and get_kernel_page() that
    may be used to pin a vector of kernel addresses for IO. The initial user
    is expected to be NFS for allowing pages to be written to swap using
    aops->direct_IO(). Strictly speaking, swap-over-NFS only needs to pin one
    page for IO but it makes sense to express the API in terms of a vector and
    add a helper for pinning single pages.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Peter Zijlstra
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Cc: Mark Salter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

30 May, 2012

3 commits

  • Take lruvec further: pass it instead of zone to add_page_to_lru_list() and
    del_page_from_lru_list(); and pagevec_lru_move_fn() pass lruvec down to
    its target functions.

    This cleanup eliminates a swathe of cruft in memcontrol.c, including
    mem_cgroup_lru_add_list(), mem_cgroup_lru_del_list() and
    mem_cgroup_lru_move_lists() - which never actually touched the lists.

    In their place, mem_cgroup_page_lruvec() to decide the lruvec, previously
    a side-effect of add, and mem_cgroup_update_lru_size() to maintain the
    lru_size stats.

    Whilst these are simplifications in their own right, the goal is to bring
    the evaluation of lruvec next to the spin_locking of the lrus, in
    preparation for a future patch.

    Signed-off-by: Hugh Dickins
    Cc: KOSAKI Motohiro
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Michal Hocko
    Acked-by: Konstantin Khlebnikov
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • With mem_cgroup_disabled() now explicit, it becomes clear that the
    zone_reclaim_stat structure actually belongs in lruvec, per-zone when
    memcg is disabled but per-memcg per-zone when it's enabled.

    We can delete mem_cgroup_get_reclaim_stat(), and change
    update_page_reclaim_stat() to update just the one set of stats, the one
    which get_scan_count() will actually use.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Konstantin Khlebnikov
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Michal Hocko
    Reviewed-by: Minchan Kim
    Reviewed-by: Michal Hocko
    Cc: Glauber Costa
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Transparent huge pages can change page->flags (PG_compound_lock) without
    taking Slab lock. Since THP can not break slab pages we can safely access
    compound page without taking compound lock.

    Specifically this patch fixes a race between compound_unlock() and slab
    functions which perform page-flags updates. This can occur when
    get_page()/put_page() is called on a page from slab.

    [akpm@linux-foundation.org: tweak comment text, fix comment layout, fix label indenting]
    Reported-by: Amey Bhide
    Signed-off-by: Pravin B Shelar
    Reviewed-by: Christoph Lameter
    Acked-by: Andrea Arcangeli
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pravin B Shelar
     

22 Mar, 2012

1 commit

  • This cpu hotplug hook was accidentally removed in commit 00a62ce91e55
    ("mm: fix Committed_AS underflow on large NR_CPUS environment")

    The visible effect of this accident: some pages are borrowed in per-cpu
    page-vectors. Truncate can deal with it, but these pages cannot be
    reused while this cpu is offline. So this is like a temporary memory
    leak.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Dave Hansen
    Cc: KOSAKI Motohiro
    Cc: Eric B Munson
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Cc: KAMEZAWA Hiroyuki
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

06 Mar, 2012

1 commit

  • When moving tasks from old memcg (with move_charge_at_immigrate on new
    memcg), followed by removal of old memcg, hit General Protection Fault in
    mem_cgroup_lru_del_list() (called from release_pages called from
    free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
    exit_mmap from mmput from exit_mm from do_exit).

    Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
    been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
    still trying to update its stats, and take page off lru before freeing.

    A task, or a charge, or a page on lru: each secures a memcg against
    removal. In this case, the last task has been moved out of the old memcg,
    and it is exiting: anonymous pages are uncharged one by one from the
    memcg, as they are zapped from its pagetables, so the charge gets down to
    0; but the pages themselves are queued in an mmu_gather for freeing.

    Most of those pages will be on lru (and force_empty is careful to
    lru_add_drain_all, to add pages from pagevec to lru first), but not
    necessarily all: perhaps some have been isolated for page reclaim, perhaps
    some isolated for other reasons. So, force_empty may find no task, no
    charge and no page on lru, and let the removal proceed.

    There would still be no problem if these pages were immediately freed; but
    typically (and the put_page_testzero protocol demands it) they have to be
    added back to lru before they are found freeable, then removed from lru
    and freed. We don't see the issue when adding, because the
    mem_cgroup_iter() loops keep their own reference to the memcg being
    scanned; but when it comes to mem_cgroup_lru_del_list().

    I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
    PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
    of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
    38c5d72f3ebe ("memcg: simplify LRU handling by new rule") mercifully
    removed those convolutions, but left this General Protection Fault.

    But it's surprisingly easy to restore the old behaviour: just check
    PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
    to add), and reset pc to root_mem_cgroup if page is uncharged. A risky
    change? just going back to how it worked before; testing, and an audit of
    uses of pc->mem_cgroup, show no problem.

    And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
    sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
    longer has any purpose, and we can safely revert 4e5f01c2b9b9 ("memcg:
    clear pc->mem_cgroup if necessary").

    Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
    is not strictly necessary: the lru_lock there, with RCU before memcg
    structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
    without that; but it seems cleaner to rely on one dependency less.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Johannes Weiner
    Cc: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

09 Feb, 2012

1 commit

  • Fix CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_SMP=n CONFIG_DEBUG_VM=y
    CONFIG_DEBUG_SPINLOCK=n kernel: spin_is_locked() is then always false,
    and so triggers some BUGs in Transparent HugePage codepaths.

    asm-generic/bug.h mentions this problem, and provides a WARN_ON_SMP(x);
    but being too lazy to add VM_BUG_ON_SMP, BUG_ON_SMP, WARN_ON_SMP_ONCE,
    VM_WARN_ON_SMP_ONCE, just test NR_CPUS != 1 in the existing VM_BUG_ONs.

    Signed-off-by: Hugh Dickins
    Cc: Andrea Arcangeli
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

13 Jan, 2012

8 commits

  • del_page_from_lru() repeats del_page_from_lru_list(), also working out
    which LRU the page was on, clearing the relevant bits. Decouple those
    functions: remove del_page_from_lru() and add page_off_lru().

    Signed-off-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • checkpatch rightly protests

    WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable

    so fix the five offenders in mm/swap.c.

    Signed-off-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • What's so special about ____pagevec_lru_add() that it needs four leading
    underscores? Nothing, it just helped to distinguish from
    __pagevec_lru_add() in 2.6.28 development. Cut two leading underscores.

    Signed-off-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Replace pagevecs in putback_lru_pages() and move_active_pages_to_lru()
    by lists of pages_to_free: then apply Konstantin Khlebnikov's
    free_hot_cold_page_list() to them instead of pagevec_release().

    Which simplifies the flow (no need to drop and retake lock whenever
    pagevec fills up) and reduces stale addresses in stack backtraces
    (which often showed through the pagevecs); but more importantly,
    removes another 120 bytes from the deepest stacks in page reclaim.
    Although I've not recently seen an actual stack overflow here with
    a vanilla kernel, move_active_pages_to_lru() has often featured in
    deep backtraces.

    However, free_hot_cold_page_list() does not handle compound pages
    (nor need it: a Transparent HugePage would have been split by the
    time it reaches the call in shrink_page_list()), but it is possible
    for putback_lru_pages() or move_active_pages_to_lru() to be left
    holding the last reference on a THP, so must exclude the unlikely
    compound case before putting on pages_to_free.

    Remove pagevec_strip(), its work now done in move_active_pages_to_lru().
    The pagevec in scan_mapping_unevictable_pages() remains in mm/vmscan.c,
    but that is never on the reclaim path, and cannot be replaced by a list.

    Signed-off-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Reviewed-by: Konstantin Khlebnikov
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • This patch started off as a cleanup: __split_huge_page_refcounts() has to
    cope with two scenarios, when the hugepage being split is already on LRU,
    and when it is not; but why does it have to split that accounting across
    three different sites? Consolidate it in lru_add_page_tail(), handling
    evictable and unevictable alike, and use standard add_page_to_lru_list()
    when accounting is needed (when the head is not yet on LRU).

    But a recent regression in -next, I guess the removal of PageCgroupAcctLRU
    test from mem_cgroup_split_huge_fixup(), makes this now a necessary fix:
    under load, the MEM_CGROUP_ZSTAT count was wrapping to a huge number,
    messing up reclaim calculations and causing a freeze at rmdir of cgroup.

    Add a VM_BUG_ON to mem_cgroup_lru_del_list() when we're about to wrap that
    count - this has not been the only such incident. Document that
    lru_add_page_tail() is for Transparent HugePages by #ifdef around it.

    Signed-off-by: Hugh Dickins
    Cc: Daisuke Nishimura
    Cc: KAMEZAWA Hiroyuki
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Put the tail subpages of an isolated hugepage under splitting in the lru
    reclaim head as they supposedly should be isolated too next.

    Queues the subpages in physical order in the lru for non isolated
    hugepages under splitting. That might provide some theoretical cache
    benefit to the buddy allocator later.

    Signed-off-by: Shaohua Li
    Signed-off-by: Andrea Arcangeli
    Cc: David Rientjes
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • Now that all code that operated on global per-zone LRU lists is
    converted to operate on per-memory cgroup LRU lists instead, there is no
    reason to keep the double-LRU scheme around any longer.

    The pc->lru member is removed and page->lru is linked directly to the
    per-memory cgroup LRU lists, which removes two pointers from a
    descriptor that exists for every page frame in the system.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Hugh Dickins
    Signed-off-by: Ying Han
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Michal Hocko
    Reviewed-by: Kirill A. Shutemov
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Greg Thelen
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Minchan Kim
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Having a unified structure with a LRU list set for both global zones and
    per-memcg zones allows to keep that code simple which deals with LRU
    lists and does not care about the container itself.

    Once the per-memcg LRU lists directly link struct pages, the isolation
    function and all other list manipulations are shared between the memcg
    case and the global LRU case.

    Signed-off-by: Johannes Weiner
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Michal Hocko
    Reviewed-by: Kirill A. Shutemov
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Ying Han
    Cc: Greg Thelen
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Cc: Minchan Kim
    Cc: Christoph Hellwig
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

11 Jan, 2012

1 commit

  • This patch adds helper free_hot_cold_page_list() to free list of 0-order
    pages. It frees pages directly from list without temporary page-vector.
    It also calls trace_mm_pagevec_free() to simulate pagevec_free()
    behaviour.

    bloat-o-meter:

    add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28)
    function old new delta
    free_hot_cold_page_list - 264 +264
    get_page_from_freelist 2129 2132 +3
    __pagevec_free 243 239 -4
    split_free_page 380 373 -7
    release_pages 606 510 -96
    free_page_list 188 - -188

    Signed-off-by: Konstantin Khlebnikov
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Acked-by: Minchan Kim
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

03 Nov, 2011

1 commit

  • Michel while working on the working set estimation code, noticed that
    calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
    wasn't safe, if the pfn ended up being a tail page of a transparent
    hugepage under splitting by __split_huge_page_refcount().

    He then found the problem could also theoretically materialize with
    page_cache_get_speculative() during the speculative radix tree lookups
    that uses get_page_unless_zero() in SMP if the radix tree page is freed
    and reallocated and get_user_pages is called on it before
    page_cache_get_speculative has a chance to call get_page_unless_zero().

    So the best way to fix the problem is to keep page_tail->_count zero at
    all times. This will guarantee that get_page_unless_zero() can never
    succeed on any tail page. page_tail->_mapcount is guaranteed zero and
    is unused for all tail pages of a compound page, so we can simply
    account the tail page references there and transfer them to
    tail_page->_count in __split_huge_page_refcount() (in addition to the
    head_page->_mapcount).

    While debugging this s/_count/_mapcount/ change I also noticed get_page is
    called by direct-io.c on pages returned by get_user_pages. That wasn't
    entirely safe because the two atomic_inc in get_page weren't atomic. As
    opposed to other get_user_page users like secondary-MMU page fault to
    establish the shadow pagetables would never call any superflous get_page
    after get_user_page returns. It's safer to make get_page universally safe
    for tail pages and to use get_page_foll() within follow_page (inside
    get_user_pages()). get_page_foll() is safe to do the refcounting for tail
    pages without taking any locks because it is run within PT lock protected
    critical sections (PT lock for pte and page_table_lock for
    pmd_trans_huge).

    The standard get_page() as invoked by direct-io instead will now take
    the compound_lock but still only for tail pages. The direct-io paths
    are usually I/O bound and the compound_lock is per THP so very
    finegrined, so there's no risk of scalability issues with it. A simple
    direct-io benchmarks with all lockdep prove locking and spinlock
    debugging infrastructure enabled shows identical performance and no
    overhead. So it's worth it. Ideally direct-io should stop calling
    get_page() on pages returned by get_user_pages(). The spinlock in
    get_page() is already optimized away for no-THP builds but doing
    get_page() on tail pages returned by GUP is generally a rare operation
    and usually only run in I/O paths.

    This new refcounting on page_tail->_mapcount in addition to avoiding new
    RCU critical sections will also allow the working set estimation code to
    work without any further complexity associated to the tail page
    refcounting with THP.

    Signed-off-by: Andrea Arcangeli
    Reported-by: Michel Lespinasse
    Reviewed-by: Michel Lespinasse
    Reviewed-by: Minchan Kim
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Benjamin Herrenschmidt
    Cc: David Gibson
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

31 Oct, 2011

1 commit


25 May, 2011

2 commits

  • The zone->lru_lock is heavily contented in workload where activate_page()
    is frequently used. We could do batch activate_page() to reduce the lock
    contention. The batched pages will be added into zone list when the pool
    is full or page reclaim is trying to drain them.

    For example, in a 4 socket 64 CPU system, create a sparse file and 64
    processes, processes shared map to the file. Each process read access the
    whole file and then exit. The process exit will do unmap_vmas() and cause
    a lot of activate_page() call. In such workload, we saw about 58% total
    time reduction with below patch. Other workloads with a lot of
    activate_page also benefits a lot too.

    Andrew Morton suggested activate_page() and putback_lru_pages() should
    follow the same path to active pages, but this is hard to implement (see
    commit 7a608572a282a ("Revert "mm: batch activate_page() to reduce lock
    contention")). On the other hand, do we really need putback_lru_pages()
    to follow the same path? I tested several FIO/FFSB benchmark (about 20
    scripts for each benchmark) in 3 machines here from 2 sockets to 4
    sockets. My test doesn't show anything significant with/without below
    patch (there is slight difference but mostly some noise which we found
    even without below patch before). Below patch basically returns to the
    same as my first post.

    I tested some microbenchmarks:
    case-anon-cow-rand-mt 0.58%
    case-anon-cow-rand -3.30%
    case-anon-cow-seq-mt -0.51%
    case-anon-cow-seq -5.68%
    case-anon-r-rand-mt 0.23%
    case-anon-r-rand 0.81%
    case-anon-r-seq-mt -0.71%
    case-anon-r-seq -1.99%
    case-anon-rx-rand-mt 2.11%
    case-anon-rx-seq-mt 3.46%
    case-anon-w-rand-mt -0.03%
    case-anon-w-rand -0.50%
    case-anon-w-seq-mt -1.08%
    case-anon-w-seq -0.12%
    case-anon-wx-rand-mt -5.02%
    case-anon-wx-seq-mt -1.43%
    case-fork 1.65%
    case-fork-sleep -0.07%
    case-fork-withmem 1.39%
    case-hugetlb -0.59%
    case-lru-file-mmap-read-mt -0.54%
    case-lru-file-mmap-read 0.61%
    case-lru-file-mmap-read-rand -2.24%
    case-lru-file-readonce -0.64%
    case-lru-file-readtwice -11.69%
    case-lru-memcg -1.35%
    case-mmap-pread-rand-mt 1.88%
    case-mmap-pread-rand -15.26%
    case-mmap-pread-seq-mt 0.89%
    case-mmap-pread-seq -69.72%
    case-mmap-xread-rand-mt 0.71%
    case-mmap-xread-seq-mt 0.38%

    The most significent are:
    case-lru-file-readtwice -11.69%
    case-mmap-pread-rand -15.26%
    case-mmap-pread-seq -69.72%

    which use activate_page a lot. others are basically variations because
    each run has slightly difference.

    In UP case, 'size mm/swap.o'
    before the two patches:
    text data bss dec hex filename
    6466 896 4 7366 1cc6 mm/swap.o
    after the two patches:
    text data bss dec hex filename
    6343 896 4 7243 1c4b mm/swap.o

    Signed-off-by: Shaohua Li
    Cc: KOSAKI Motohiro
    Cc: Hiroyuki Kamezawa
    Cc: Andi Kleen
    Cc: Minchan Kim
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • It's pointless that deactive_page's operates on unevictable pages. This
    patch removes unnecessary overhead which might be a bit problem in case
    that there are many unevictable page in system(ex, mprotect workload)

    [akpm@linux-foundation.org: tidy up comment]
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Minchan Kim
    Reviewed-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim