28 Oct, 2010

10 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300: (44 commits)
    MN10300: Save frame pointer in thread_info struct rather than global var
    MN10300: Change "Matsushita" to "Panasonic".
    MN10300: Create a defconfig for the ASB2364 board
    MN10300: Update the ASB2303 defconfig
    MN10300: ASB2364: Add support for SMSC911X and SMC911X
    MN10300: ASB2364: Handle the IRQ multiplexer in the FPGA
    MN10300: Generic time support
    MN10300: Specify an ELF HWCAP flag for MN10300 Atomic Operations Unit support
    MN10300: Map userspace atomic op regs as a vmalloc page
    MN10300: And Panasonic AM34 subarch and implement SMP
    MN10300: Delete idle_timestamp from irq_cpustat_t
    MN10300: Make various interrupt priority settings configurable
    MN10300: Optimise do_csum()
    MN10300: Implement atomic ops using atomic ops unit
    MN10300: Make the FPU operate in non-lazy mode under SMP
    MN10300: SMP TLB flushing
    MN10300: Use the [ID]PTEL2 registers rather than [ID]PTEL for TLB control
    MN10300: Make the use of PIDR to mark TLB entries controllable
    MN10300: Rename __flush_tlb*() to local_flush_tlb*()
    MN10300: AM34 erratum requires MMUCTR read and write on exception entry
    ...

    Linus Torvalds
     
  • Replace iterated page_cache_release() with release_pages(), which is
    faster and shorter.

    Needs release_pages() to be exported to modules.

    Suggested-by: Andrew Morton
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • This patch extracts the core logic from mem_cgroup_update_file_mapped() as
    mem_cgroup_update_file_stat() and adds a wrapper.

    As a planned future update, memory cgroup has to count dirty pages to
    implement dirty_ratio/limit. And more, the number of dirty pages is
    required to kick flusher thread to start writeback. (Now, no kick.)

    This patch is preparation for it and makes other statistics implementation
    clearer. Just a clean up.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Balbir Singh
    Reviewed-by: Greg Thelen
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • An event counter MEM_CGROUP_ON_MOVE is used for quick check whether file
    stat update can be done in async manner or not. Now, it use percpu
    counter and for_each_possible_cpu to update.

    This patch replaces for_each_possible_cpu to for_each_online_cpu and adds
    necessary synchronization logic at CPU HOTPLUG.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Now, memcgroup's per cpu coutner uses for_each_possible_cpu() to get the
    value. It's better to use for_each_online_cpu() and a cpu hotplug
    handler.

    This patch only handles statistics counter. MEM_CGROUP_ON_MOVE will be
    handled in another patch.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • In memory cgroup management, we sometimes have to walk through
    subhierarchy of cgroup to gather informaiton, or lock something, etc.

    Now, to do that, mem_cgroup_walk_tree() function is provided. It calls
    given callback function per cgroup found. But the bad thing is that it
    has to pass a fixed style function and argument, "void*" and it adds much
    type casting to memcontrol.c.

    To make the code clean, this patch replaces walk_tree() with

    for_each_mem_cgroup_tree(iter, root)

    An iterator style call. The good point is that iterator call doesn't have
    to assume what kind of function is called under it. A bad point is that
    it may cause reference-count leak if a caller use "break" from the loop by
    mistake.

    I think the benefit is larger. The modified code seems straigtforward and
    easy to read because we don't have misterious callbacks and pointer cast.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • At accounting file events per memory cgroup, we need to find memory cgroup
    via page_cgroup->mem_cgroup. Now, we use lock_page_cgroup() for guarantee
    pc->mem_cgroup is not overwritten while we make use of it.

    But, considering the context which page-cgroup for files are accessed,
    we can use alternative light-weight mutual execusion in the most case.

    At handling file-caches, the only race we have to take care of is "moving"
    account, IOW, overwriting page_cgroup->mem_cgroup. (See comment in the
    patch)

    Unlike charge/uncharge, "move" happens not so frequently. It happens only when
    rmdir() and task-moving (with a special settings.)
    This patch adds a race-checker for file-cache-status accounting v.s. account
    moving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added.
    The routine for account move
    1. Increment it before start moving
    2. Call synchronize_rcu()
    3. Decrement it after the end of moving.
    By this, file-status-counting routine can check it needs to call
    lock_page_cgroup(). In most case, I doesn't need to call it.

    Following is a perf data of a process which mmap()/munmap 32MB of file cache
    in a minute.

    Before patch:
    28.25% mmap mmap [.] main
    22.64% mmap [kernel.kallsyms] [k] page_fault
    9.96% mmap [kernel.kallsyms] [k] mem_cgroup_update_file_mapped
    3.67% mmap [kernel.kallsyms] [k] filemap_fault
    3.50% mmap [kernel.kallsyms] [k] unmap_vmas
    2.99% mmap [kernel.kallsyms] [k] __do_fault
    2.76% mmap [kernel.kallsyms] [k] find_get_page

    After patch:
    30.00% mmap mmap [.] main
    23.78% mmap [kernel.kallsyms] [k] page_fault
    5.52% mmap [kernel.kallsyms] [k] mem_cgroup_update_file_mapped
    3.81% mmap [kernel.kallsyms] [k] unmap_vmas
    3.26% mmap [kernel.kallsyms] [k] find_get_page
    3.18% mmap [kernel.kallsyms] [k] __do_fault
    3.03% mmap [kernel.kallsyms] [k] filemap_fault
    2.40% mmap [kernel.kallsyms] [k] handle_mm_fault
    2.40% mmap [kernel.kallsyms] [k] do_page_fault

    This patch reduces memcg's cost to some extent.
    (mem_cgroup_update_file_mapped is called by both of map/unmap)

    Note: It seems some more improvements are required..but no idea.
    maybe removing set/unset flag is required.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Greg Thelen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Presently memory cgroup accounts file-mapped by counter and flag. counter
    is working in the same way with zone_stat but FileMapped flag only exists
    in memcg (for helping move_account).

    This flag can be updated wrongly in a case. Assume CPU0 and CPU1 and a
    thread mapping a page on CPU0, another thread unmapping it on CPU1.

    CPU0 CPU1
    rmv rmap (mapcount 1->0)
    add rmap (mapcount 0->1)
    lock_page_cgroup()
    memcg counter+1 (some delay)
    set MAPPED FLAG.
    unlock_page_cgroup()
    lock_page_cgroup()
    memcg counter-1
    clear MAPPED flag

    In the above sequence counter is properly updated but FLAG is not. This
    means that representing a state by a flag which is maintained by counter
    needs some special care.

    To handle this, when clearing a flag, this patch check mapcount directly
    and clear the flag only when mapcount == 0. (if mapcount >0, someone will
    make it to zero later and flag will be cleared.)

    Reverse case, dec-after-inc cannot be a problem because page_table_lock()
    works well for it. (IOW, to make above sequence, 2 processes should touch
    the same page at once with map/unmap.)

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Cc: Greg Thelen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • It appears i386 uses kmap_atomic infrastructure regardless of
    CONFIG_HIGHMEM which results in a compile error when highmem is disabled.

    Cure this by providing the needed few bits for both CONFIG_HIGHMEM and
    CONFIG_X86_32.

    Signed-off-by: Peter Zijlstra
    Reported-by: Chris Wilson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Save the current exception frame pointer in the thread_info struct rather than
    in a global variable as the latter makes SMP tricky, especially when preemption
    is also enabled.

    This also replaces __frame with current_frame() and rearranges header file
    inclusions to make it all compile.

    Signed-off-by: David Howells
    Acked-by: Akira Takeuchi

    David Howells
     

27 Oct, 2010

30 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
    split invalidate_inodes()
    fs: skip I_FREEING inodes in writeback_sb_inodes
    fs: fold invalidate_list into invalidate_inodes
    fs: do not drop inode_lock in dispose_list
    fs: inode split IO and LRU lists
    fs: switch bdev inode bdi's correctly
    fs: fix buffer invalidation in invalidate_list
    fsnotify: use dget_parent
    smbfs: use dget_parent
    exportfs: use dget_parent
    fs: use RCU read side protection in d_validate
    fs: clean up dentry lru modification
    fs: split __shrink_dcache_sb
    fs: improve DCACHE_REFERENCED usage
    fs: use percpu counter for nr_dentry and nr_dentry_unused
    fs: simplify __d_free
    fs: take dcache_lock inside __d_path
    fs: do not assign default i_ino in new_inode
    fs: introduce a per-cpu last_ino allocator
    new helper: ihold()
    ...

    Linus Torvalds
     
  • PF_FLUSHER is only ever set, not tested, remove it.

    Signed-off-by: Peter Zijlstra
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • After all that's what they are intended for.

    Signed-off-by: Jan Beulich
    Cc: Miklos Szeredi
    Cc: "Eric W. Biederman"
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • Use the new {max,min}3 macros to save some cycles and bytes on the stack.
    This patch substitutes trivial nested macros with their counterpart.

    Signed-off-by: Hagen Paul Pfeifer
    Cc: Joe Perches
    Cc: Ingo Molnar
    Cc: Hartley Sweeten
    Cc: Russell King
    Cc: Benjamin Herrenschmidt
    Cc: Thomas Gleixner
    Cc: Herbert Xu
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hagen Paul Pfeifer
     
  • Simple code for reducing list_empty(&source) check.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • If not_managed is true all pages will be putback to lru, so break the loop
    earlier to skip other pages isolate.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • __test_page_isolated_in_pageblock() returns 1 if all pages in the range
    are isolated, so fix the comment. Variable `pfn' will be initialised in
    the following loop so remove it.

    Signed-off-by: Bob Liu
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Wu Fengguang
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Liu
     
  • page_order() is called by memory hotplug's user interface to check the
    section is removable or not. (is_mem_section_removable())

    It calls page_order() withoug holding zone->lock.
    So, even if the caller does

    if (PageBuddy(page))
    ret = page_order(page) ...
    The caller may hit BUG_ON().

    For fixing this, there are 2 choices.
    1. add zone->lock.
    2. remove BUG_ON().

    is_mem_section_removable() is used for some "advice" and doesn't need to
    be 100% accurate. This is_removable() can be called via user program..
    We don't want to take this important lock for long by user's request. So,
    this patch removes BUG_ON().

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Wu Fengguang
    Acked-by: Michal Hocko
    Acked-by: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Add missing spin_lock() of the page_table_lock before an error return in
    hugetlb_cow(). Callers of hugtelb_cow() expect it to be held upon return.

    Signed-off-by: Dean Nelson
    Cc: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dean Nelson
     
  • The vma returned by find_vma does not necessarily include the target
    address. If this happens the code tries to follow a page outside of any
    vma and returns ENOENT instead of EFAULT.

    Signed-off-by: Gleb Natapov
    Acked-by: Christoph Lameter
    Cc: Minchan Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gleb Natapov
     
  • System management wants to subscribe to changes in swap configuration.
    Make /proc/swaps pollable like /proc/mounts.

    [akpm@linux-foundation.org: document proc_poll_event]
    Signed-off-by: Kay Sievers
    Acked-by: Greg KH
    Cc: Jonathan Corbet
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kay Sievers
     
  • Add vzalloc() and vzalloc_node() to encapsulate the
    vmalloc-then-memset-zero operation.

    Use __GFP_ZERO to zero fill the allocated memory.

    Signed-off-by: Dave Young
    Cc: Christoph Lameter
    Acked-by: Greg Ungerer
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     
  • Reported-by: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This removes following warning from sparse:

    mm/vmstat.c:466:5: warning: symbol 'fragmentation_index' was not declared. Should it be static?

    [akpm@linux-foundation.org: move the include to top-of-file]
    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • s_start() and s_stop() grab/release vmlist_lock but were missing proper
    annotations. Add them.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Rename redundant 'tmp' to fix following sparse warnings:

    mm/vmalloc.c:296:34: warning: symbol 'tmp' shadows an earlier one
    mm/vmalloc.c:293:24: originally declared here

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Make anon_vma_chain_free() static. It is called only in rmap.c and the
    corresponding alloc function is already static.

    Signed-off-by: Namhyung Kim
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • The page_check_address() conditionally grabs *@ptlp in case of returning
    non-NULL. Rename and wrap it using __cond_lock() removes following
    warnings from sparse:

    mm/rmap.c:472:9: warning: context imbalance in 'page_mapped_in_vma' - unexpected unlock
    mm/rmap.c:524:9: warning: context imbalance in 'page_referenced_one' - unexpected unlock
    mm/rmap.c:706:9: warning: context imbalance in 'page_mkclean_one' - unexpected unlock
    mm/rmap.c:1066:9: warning: context imbalance in 'try_to_unmap_one' - unexpected unlock

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • The page_lock_anon_vma() conditionally grabs RCU and anon_vma lock but
    page_unlock_anon_vma() releases them unconditionally. This leads sparse
    to complain about context imbalance. Annotate them.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • The follow_pte() conditionally grabs *@ptlp in case of returning 0.
    Rename and wrap it using __cond_lock() removes following warnings:

    mm/memory.c:2337:9: warning: context imbalance in 'do_wp_page' - unexpected unlock
    mm/memory.c:3142:19: warning: context imbalance in 'handle_mm_fault' - different lock contexts for basic block

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • The do_wp_page() releases @ptl but was missing proper annotation. Add it.
    This removes following warnings from sparse:

    mm/memory.c:2337:9: warning: context imbalance in 'do_wp_page' - unexpected unlock
    mm/memory.c:3142:19: warning: context imbalance in 'handle_mm_fault' - different lock contexts for basic block

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • The get_locked_pte() conditionally grabs 'ptl' in case of returning
    non-NULL. This leads sparse to complain about context imbalance. Rename
    and wrap it using __cond_lock() to make sparse happy.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • This removes following warning from sparse:

    mm/page_alloc.c:1934:9: warning: restricted gfp_t degrades to integer

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • 'end' shadows earlier one and is not necessary at all. Remove it and use
    'pos' instead. This removes following sparse warnings:

    mm/filemap.c:2180:24: warning: symbol 'end' shadows an earlier one
    mm/filemap.c:2132:25: originally declared here

    Signed-off-by: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • This change reduces mmap_sem hold times that are caused by waiting for
    disk transfers when accessing file mapped VMAs.

    It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call
    site wants mmap_sem to be released if blocking on a pending disk transfer.
    In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and
    do_page_fault() will then re-acquire mmap_sem and retry the page fault.

    It is expected that the retry will hit the same page which will now be
    cached, and thus it will complete with a low mmap_sem hold time.

    Tests:

    - microbenchmark: thread A mmaps a large file and does random read accesses
    to the mmaped area - achieves about 55 iterations/s. Thread B does
    mmap/munmap in a loop at a separate location - achieves 55 iterations/s
    before, 15000 iterations/s after.

    - We are seeing related effects in some applications in house, which show
    significant performance regressions when running without this change.

    [akpm@linux-foundation.org: fix warning & crash]
    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Acked-by: Linus Torvalds
    Cc: Nick Piggin
    Reviewed-by: Wu Fengguang
    Cc: Ying Han
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Acked-by: "H. Peter Anvin"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Introduce a single location where filemap_fault() locks the desired page.
    There used to be two such places, depending if the initial find_get_page()
    was successful or not.

    Signed-off-by: Michel Lespinasse
    Acked-by: Rik van Riel
    Acked-by: Linus Torvalds
    Cc: Nick Piggin
    Reviewed-by: Wu Fengguang
    Cc: Ying Han
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michel Lespinasse
     
  • Buggy drivers (e.g. fsl_udc) could call dma_pool_alloc from atomic
    context with GFP_KERNEL. In most instances, the first pool_alloc_page
    call would succeed and the sleeping functions would never be called. This
    allowed the buggy drivers to slip through the cracks.

    Add a might_sleep_if() checking for __GFP_WAIT in flags.

    Signed-off-by: Dima Zavin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dima Zavin
     
  • Since we no longer need to provide KM_type, the whole pte_*map_nested()
    API is now redundant, remove it.

    Signed-off-by: Peter Zijlstra
    Acked-by: Chris Metcalf
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Steven Rostedt
    Cc: Russell King
    Cc: Ralf Baechle
    Cc: David Miller
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Keep the current interface but ignore the KM_type and use a stack based
    approach.

    The advantage is that we get rid of crappy code like:

    #define __KM_PTE \
    (in_nmi() ? KM_NMI_PTE : \
    in_irq() ? KM_IRQ_PTE : \
    KM_PTE0)

    and in general can stop worrying about what context we're in and what kmap
    slots might be appropriate for that.

    The downside is that FRV kmap_atomic() gets more expensive.

    For now we use a CPP trick suggested by Andrew:

    #define kmap_atomic(page, args...) __kmap_atomic(page)

    to avoid having to touch all kmap_atomic() users in a single patch.

    [ not compiled on:
    - mn10300: the arch doesn't actually build with highmem to begin with ]

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c]
    Acked-by: Rik van Riel
    Signed-off-by: Peter Zijlstra
    Acked-by: Chris Metcalf
    Cc: David Howells
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Steven Rostedt
    Cc: Russell King
    Cc: Ralf Baechle
    Cc: David Miller
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Dave Airlie
    Cc: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • When a page has PG_referenced, shrink_page_list() discards it only if it
    is not dirty. This rule works fine if the backing filesystem is a regular
    one. PG_dirty is a good signal that the page was used recently because
    the flusher threads clean pages periodically. In addition, page writeback
    is costlier than simple page discard.

    However, when a page is on tmpfs this heuristic doesn't work because
    flusher threads don't write back tmpfs pages. Consequently tmpfs pages
    always rotate around the lru twice at least and adds unnecessary lru
    churn. Simple tmpfs streaming io shouldn't cause large anonymous page
    swap-out.

    Remove this unncessary reclaim bonus of tmpfs pages.

    Signed-off-by: KOSAKI Motohiro
    Cc: Hugh Dickins
    Reviewed-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro