13 Jan, 2012

1 commit

  • This is a preparation before removing a flag PCG_ACCT_LRU in page_cgroup
    and reducing atomic ops/complexity in memcg LRU handling.

    In some cases, pages are added to lru before charge to memcg and pages
    are not classfied to memory cgroup at lru addtion. Now, the lru where
    the page should be added is determined a bit in page_cgroup->flags and
    pc->mem_cgroup. I'd like to remove the check of flag.

    To handle the case pc->mem_cgroup may contain stale pointers if pages
    are added to LRU before classification. This patch resets
    pc->mem_cgroup to root_mem_cgroup before lru additions.

    [akpm@linux-foundation.org: fix CONFIG_CGROUP_MEM_CONT=n build]
    [hughd@google.com: fix CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_CGROUP_MEM_RES_CTLR_SWAP=n build]
    [akpm@linux-foundation.org: ksm.c needs memcontrol.h, per Michal]
    [hughd@google.com: stop oops in mem_cgroup_reset_owner()]
    [hughd@google.com: fix page migration to reset_owner]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Miklos Szeredi
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: Ying Han
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

04 Jan, 2012

1 commit

  • Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
    kill_bdev as well, so brd doesn't have to open code it. Reduce
    buffer_head.h requirement accordingly.

    Removed a rather large comment from invalidate_bdev, as it looked a bit
    obsolete to bother moving. The small comment replacing it says enough.

    Signed-off-by: Nick Piggin
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Al Viro
     

31 Oct, 2011

1 commit


10 Mar, 2011

1 commit

  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     

14 Jan, 2011

1 commit

  • Paging logic that splits the page before it is unmapped and added to swap
    to ensure backwards compatibility with the legacy swap code. Eventually
    swap should natively pageout the hugepages to increase performance and
    decrease seeking and fragmentation of swap space. swapoff can just skip
    over huge pmd as they cannot be part of swap yet. In add_to_swap be
    careful to split the page only if we got a valid swap entry so we don't
    split hugepages with a full swap.

    In theory we could split pages before isolating them during the lru scan,
    but for khugepaged to be safe, I'm relying on either mmap_sem write mode,
    or PG_lock taken, so split_huge_page has to run either with mmap_sem
    read/write mode or PG_lock taken. Calling it from isolate_lru_page would
    make locking more complicated, in addition to that split_huge_page would
    deadlock if called by __isolate_lru_page because it has to take the lru
    lock to add the tail pages.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Mel Gorman
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

22 Sep, 2009

2 commits

  • After commit 355cfa73 ("mm: modify swap_map and add SWAP_HAS_CACHE flag"),
    only the context which have set SWAP_HAS_CACHE flag by swapcache_prepare()
    or get_swap_page() would call add_to_swap_cache(). So add_to_swap_cache()
    doesn't return -EEXIST any more.

    Even though it doesn't return -EEXIST, it's not good behavior conceptually
    to call swapcache_prepare() in the -EEXIST case, because it means clearing
    SWAP_HAS_CACHE flag while the entry is on swap cache.

    This patch removes redundant codes and comments from callers of it, and
    adds VM_BUG_ON() in error path of add_to_swap_cache() and some comments.

    Signed-off-by: Daisuke Nishimura
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     
  • After commit 355cfa73 ("mm: modify swap_map and add SWAP_HAS_CACHE flag"),
    read_swap_cache_async() will busy-wait while a entry doesn't exist in swap
    cache but it has SWAP_HAS_CACHE flag.

    Such entries can exist on add/delete path of swap cache. On add path,
    add_to_swap_cache() is called soon after SWAP_HAS_CACHE flag is set, and
    on delete path, swapcache_free() will be called (SWAP_HAS_CACHE flag is
    cleared) soon after __delete_from_swap_cache() is called. So, the
    busy-wait works well in most cases.

    But this mechanism can cause soft lockup if add_to_swap_cache() sleeps and
    read_swap_cache_async() tries to swap-in the same entry on the same cpu.

    This patch calls radix_tree_preload() before swapcache_prepare() and
    divides add_to_swap_cache() into two part: radix_tree_preload() part and
    radix_tree_insert() part(define it as __add_to_swap_cache()).

    Signed-off-by: Daisuke Nishimura
    Cc: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     

11 Sep, 2009

1 commit


17 Jun, 2009

4 commits

  • The file argument resulted from address_space's readpage long time ago.

    We don't use it any more. Let's remove unnecessary argement.

    Signed-off-by: Minchan Kim
    Acked-by: Hugh Dickins
    Reviewed-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Hugh removed add_to_swap's gfp_mask argument. (mm: remove gfp_mask from
    add_to_swap) So we have to remove annotation of gfp_mask of the function.

    Signed-off-by: Minchan Kim
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This is a part of the patches for fixing memcg's swap accountinf leak.
    But, IMHO, not a bad patch even if no memcg.

    There are 2 kinds of references to swap.
    - reference from swap entry
    - reference from swap cache

    Then,

    - If there is swap cache && swap's refcnt is 1, there is only swap cache.
    (*) swapcount(entry) == 1 && find_get_page(swapper_space, entry) != NULL

    This counting logic have worked well for a long time. But considering
    that we cannot know there is a _real_ reference or not by swap_map[],
    current usage of counter is not very good.

    This patch adds a flag SWAP_HAS_CACHE and recored information that a swap
    entry has a cache or not. This will remove -1 magic used in swapfile.c
    and be a help to avoid unnecessary find_get_page().

    Signed-off-by: KAMEZAWA Hiroyuki
    Tested-by: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Li Zefan
    Cc: Dhaval Giani
    Cc: YAMAMOTO Takashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • In a following patch, the usage of swap cache is recorded into swap_map.
    This patch is for necessary interface changes to do that.

    2 interfaces:

    - swapcache_prepare()
    - swapcache_free()

    are added for allocating/freeing refcnt from swap-cache to existing swap
    entries. But implementation itself is not changed under this patch. At
    adding swapcache_free(), memcg's hook code is moved under
    swapcache_free(). This is better than using scattered hooks.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Daisuke Nishimura
    Acked-by: Balbir Singh
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Li Zefan
    Cc: Dhaval Giani
    Cc: YAMAMOTO Takashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

29 May, 2009

1 commit

  • mapping->tree_lock can be acquired from interrupt context. Then,
    following dead lock can occur.

    Assume "A" as a page.

    CPU0:
    lock_page_cgroup(A)
    interrupted
    -> take mapping->tree_lock.
    CPU1:
    take mapping->tree_lock
    -> lock_page_cgroup(A)

    This patch tries to fix above deadlock by moving memcg's hook to out of
    mapping->tree_lock. charge/uncharge of pagecache/swapcache is protected
    by page lock, not tree_lock.

    After this patch, lock_page_cgroup() is not called under mapping->tree_lock.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Daisuke Nishimura
    Cc: Balbir Singh
    Cc: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Nishimura
     

09 Jan, 2009

2 commits

  • This patch implements per cgroup limit for usage of memory+swap. However
    there are SwapCache, double counting of swap-cache and swap-entry is
    avoided.

    Mem+Swap controller works as following.
    - memory usage is limited by memory.limit_in_bytes.
    - memory + swap usage is limited by memory.memsw_limit_in_bytes.

    This has following benefits.
    - A user can limit total resource usage of mem+swap.

    Without this, because memory resource controller doesn't take care of
    usage of swap, a process can exhaust all the swap (by memory leak.)
    We can avoid this case.

    And Swap is shared resource but it cannot be reclaimed (goes back to memory)
    until it's used. This characteristic can be trouble when the memory
    is divided into some parts by cpuset or memcg.
    Assume group A and group B.
    After some application executes, the system can be..

    Group A -- very large free memory space but occupy 99% of swap.
    Group B -- under memory shortage but cannot use swap...it's nearly full.

    Ability to set appropriate swap limit for each group is required.

    Maybe someone wonder "why not swap but mem+swap ?"

    - The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
    to move account from memory to swap...there is no change in usage of
    mem+swap.

    In other words, when we want to limit the usage of swap without affecting
    global LRU, mem+swap limit is better than just limiting swap.

    Accounting target information is stored in swap_cgroup which is
    per swap entry record.

    Charge is done as following.
    map
    - charge page and memsw.

    unmap
    - uncharge page/memsw if not SwapCache.

    swap-out (__delete_from_swap_cache)
    - uncharge page
    - record mem_cgroup information to swap_cgroup.

    swap-in (do_swap_page)
    - charged as page and memsw.
    record in swap_cgroup is cleared.
    memsw accounting is decremented.

    swap-free (swap_free())
    - if swap entry is freed, memsw is uncharged by PAGE_SIZE.

    There are people work under never-swap environments and consider swap as
    something bad. For such people, this mem+swap controller extension is just an
    overhead. This overhead is avoided by config or boot option.
    (see Kconfig. detail is not in this patch.)

    TODO:
    - maybe more optimization can be don in swap-in path. (but not very safe.)
    But we just do simple accounting at this stage.

    [nishimura@mxp.nes.nec.co.jp: make resize limit hold mutex]
    [hugh@veritas.com: memswap controller core swapcache fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • SwapCache support for memory resource controller (memcg)

    Before mem+swap controller, memcg itself should handle SwapCache in proper
    way. This is cut-out from it.

    In current memcg, SwapCache is just leaked and the user can create tons of
    SwapCache. This is a leak of account and should be handled.

    SwapCache accounting is done as following.

    charge (anon)
    - charged when it's mapped.
    (because of readahead, charge at add_to_swap_cache() is not sane)
    uncharge (anon)
    - uncharged when it's dropped from swapcache and fully unmapped.
    means it's not uncharged at unmap.
    Note: delete from swap cache at swap-in is done after rmap information
    is established.
    charge (shmem)
    - charged at swap-in. this prevents charge at add_to_page_cache().

    uncharge (shmem)
    - uncharged when it's dropped from swapcache and not on shmem's
    radix-tree.

    at migration, check against 'old page' is modified to handle shmem.

    Comparing to the old version discussed (and caused troubles), we have
    advantages of
    - PCG_USED bit.
    - simple migrating handling.

    So, situation is much easier than several months ago, maybe.

    [hugh@veritas.com: memcg: handle swap caches build fix]
    Reviewed-by: Daisuke Nishimura
    Tested-by: Daisuke Nishimura
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc: Li Zefan
    Cc: Balbir Singh
    Cc: Pavel Emelyanov
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

07 Jan, 2009

3 commits

  • Remove gfp_mask argument from add_to_swap(): it's misleading because its
    only caller, shrink_page_list(), is not atomic at that point; and in due
    course (implementing discard) we'll sometimes want to allocate some memory
    with GFP_NOIO (as is used in swap_writepage) when allocating swap.

    No change to the gfp_mask passed down to add_to_swap_cache(): still use
    __GFP_HIGH without __GFP_WAIT (with nomemalloc and nowarn as before):
    though it's not obvious if that's the best combination to ask for here.

    Signed-off-by: Hugh Dickins
    Cc: Lee Schermerhorn
    Cc: Rik van Riel
    Cc: Nick Piggin
    Cc: KAMEZAWA Hiroyuki
    Cc: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • remove_exclusive_swap_page(): its problem is in living up to its name.

    It doesn't matter if someone else has a reference to the page (raised
    page_count); it doesn't matter if the page is mapped into userspace
    (raised page_mapcount - though that hints it may be worth keeping the
    swap): all that matters is that there be no more references to the swap
    (and no writeback in progress).

    swapoff (try_to_unuse) has been removing pages from swapcache for years,
    with no concern for page count or page mapcount, and we used to have a
    comment in lookup_swap_cache() recognizing that: if you go for a page of
    swapcache, you'll get the right page, but it could have been removed from
    swapcache by the time you get page lock.

    So, give up asking for exclusivity: get rid of
    remove_exclusive_swap_page(), and remove_exclusive_swap_page_ref() and
    remove_exclusive_swap_page_count() which were spawned for the recent LRU
    work: replace them by the simpler try_to_free_swap() which just checks
    page_swapcount().

    Similarly, remove the page_count limitation from free_swap_and_count(),
    but assume that it's worth holding on to the swap if page is mapped and
    swap nowhere near full. Add a vm_swap_full() test in free_swap_cache()?
    It would be consistent, but I think we probably have enough for now.

    Signed-off-by: Hugh Dickins
    Cc: Lee Schermerhorn
    Cc: Rik van Riel
    Cc: Nick Piggin
    Cc: KAMEZAWA Hiroyuki
    Cc: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • The swap code is over-provisioned with BUG_ONs on assorted page flags,
    mostly dating back to 2.3. They're good documentation, and guard against
    developer error, but a waste of space on most systems: change them to
    VM_BUG_ONs, conditional on CONFIG_DEBUG_VM. Just delete the PagePrivate
    ones: they're later, from 2.5.69, but even less interesting now.

    Signed-off-by: Hugh Dickins
    Reviewed-by: Christoph Lameter
    Cc: Nick Piggin
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

20 Oct, 2008

4 commits

  • Setting and clearing the page locked when inserting it into swapcache /
    pagecache when it has no other references can use non-atomic page flags
    operations because no other CPU may be operating on it at this time.

    This saves one atomic operation when inserting a page into pagecache.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Swapin_readahead can read in a lot of data that the processes in memory
    never need. Adding swap cache pages to the inactive list prevents them
    from putting too much pressure on the working set.

    This has the potential to help the programs that are already in memory,
    but it could also be a disadvantage to processes that are trying to get
    swapped in.

    Signed-off-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Split the LRU lists in two, one set for pages that are backed by real file
    systems ("file") and one for pages that are backed by memory and swap
    ("anon"). The latter includes tmpfs.

    The advantage of doing this is that the VM will not have to scan over lots
    of anonymous pages (which we generally do not want to swap out), just to
    find the page cache pages that it should evict.

    This patch has the infrastructure and a basic policy to balance how much
    we scan the anon lists and how much we scan the file lists. The big
    policy changes are in separate patches.

    [lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]
    [kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]
    [kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page]
    [hugh@veritas.com: memcg swapbacked pages active]
    [hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]
    [akpm@linux-foundation.org: fix /proc/vmstat units]
    [nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]
    [kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]
    [kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]
    Signed-off-by: Rik van Riel
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Hugh Dickins
    Signed-off-by: Daisuke Nishimura
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Define page_file_cache() function to answer the question:
    is page backed by a file?

    Originally part of Rik van Riel's split-lru patch. Extracted to make
    available for other, independent reclaim patches.

    Moved inline function to linux/mm_inline.h where it will be needed by
    subsequent "split LRU" and "noreclaim" patches.

    Unfortunately this needs to use a page flag, since the PG_swapbacked state
    needs to be preserved all the way to the point where the page is last
    removed from the LRU. Trying to derive the status from other info in the
    page resulted in wrong VM statistics in earlier split VM patchsets.

    The total number of page flags in use on a 32 bit machine after this patch
    is 19.

    [akpm@linux-foundation.org: fix up out-of-order merge fallout]
    [hugh@veritas.com: splitlru: shmem_getpage SetPageSwapBacked sooner[
    Signed-off-by: Rik van Riel
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: MinChan Kim
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     

21 Aug, 2008

1 commit

  • Adjust m show_swap_cache_info() to show "Free swap" as a
    signed long: the signed format is preferable, because during swapoff
    nr_swap_pages can legitimately go negative, so makes more sense thus
    (it used to be shown redundantly, once as signed and once as unsigned).

    Signed-off-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

05 Aug, 2008

1 commit

  • Converting page lock to new locking bitops requires a change of page flag
    operation naming, so we might as well convert it to something nicer
    (!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).

    This also facilitates lockdeping of page lock.

    Signed-off-by: Nick Piggin
    Acked-by: KOSAKI Motohiro
    Acked-by: Peter Zijlstra
    Acked-by: Andrew Morton
    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

27 Jul, 2008

3 commits

  • Every arch implements its own show_mem() function. Most of them share
    quite some code, some of them are completely identical.

    This series implements a generic version of this function and migrates
    almost all architectures to it.

    This patch:

    Most show_mem() implementations calculate the amount of pages within
    the swapcache every time. Move the output to a more appropriate place
    and use the anyway available total_swapcache_pages variable.

    Signed-off-by: Johannes Weiner
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Haavard Skinnemoen
    Cc: Bryan Wu
    Cc: Chris Zankel
    Cc: Ingo Molnar
    Cc: Jeff Dike
    Cc: David S. Miller
    Cc: Paul Mundt
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: David Howells
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Yoshinori Sato
    Cc: Ralf Baechle
    Cc: Greg Ungerer
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Cc: Hirokazu Takata
    Cc: Mikael Starvik
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • mapping->tree_lock has no read lockers. convert the lock from an rwlock
    to a spinlock.

    Signed-off-by: Nick Piggin
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Hugh Dickins
    Cc: "Paul E. McKenney"
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • If we can be sure that elevating the page_count on a pagecache page will
    pin it, we can speculatively run this operation, and subsequently check to
    see if we hit the right page rather than relying on holding a lock or
    otherwise pinning a reference to the page.

    This can be done if get_page/put_page behaves consistently throughout the
    whole tree (ie. if we "get" the page after it has been used for something
    else, we must be able to free it with a put_page).

    Actually, there is a period where the count behaves differently: when the
    page is free or if it is a constituent page of a compound page. We need
    an atomic_inc_not_zero operation to ensure we don't try to grab the page
    in either case.

    This patch introduces the core locking protocol to the pagecache (ie.
    adds page_cache_get_speculative, and tweaks some update-side code to make
    it work).

    Thanks to Hugh for pointing out an improvement to the algorithm setting
    page_count to zero when we have control of all references, in order to
    hold off speculative getters.

    [kamezawa.hiroyu@jp.fujitsu.com: fix migration_entry_wait()]
    [hugh@veritas.com: fix add_to_page_cache]
    [akpm@linux-foundation.org: repair a comment]
    Signed-off-by: Nick Piggin
    Cc: Jeff Garzik
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Hugh Dickins
    Cc: "Paul E. McKenney"
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Daisuke Nishimura
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Hugh Dickins
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin