26 Sep, 2019
1 commit
-
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7.
- Background
The Android terminology used for forking a new process and starting an app
from scratch is a cold start, while resuming an existing app is a hot
start. While we continually try to improve the performance of cold
starts, hot starts will always be significantly less power hungry as well
as faster so we are trying to make hot start more likely than cold start.To increase hot start, Android userspace manages the order that apps
should be killed in a process called ActivityManagerService.
ActivityManagerService tracks every Android app or service that the user
could be interacting with at any time and translates that into a ranked
list for lmkd(low memory killer daemon). They are likely to be killed by
lmkd if the system has to reclaim memory. In that sense they are similar
to entries in any other cache. Those apps are kept alive for
opportunistic performance improvements but those performance improvements
will vary based on the memory requirements of individual workloads.- Problem
Naturally, cached apps were dominant consumers of memory on the system.
However, they were not significant consumers of swap even though they are
good candidate for swap. Under investigation, swapping out only begins
once the low zone watermark is hit and kswapd wakes up, but the overall
allocation rate in the system might trip lmkd thresholds and cause a
cached process to be killed(we measured performance swapping out vs.
zapping the memory by killing a process. Unsurprisingly, zapping is 10x
times faster even though we use zram which is much faster than real
storage) so kill from lmkd will often satisfy the high zone watermark,
resulting in very few pages actually being moved to swap.- Approach
The approach we chose was to use a new interface to allow userspace to
proactively reclaim entire processes by leveraging platform information.
This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
that are known to be cold from userspace and to avoid races with lmkd by
reclaiming apps as soon as they entered the cached state. Additionally,
it could provide many chances for platform to use much information to
optimize memory efficiency.To achieve the goal, the patchset introduce two new options for madvise.
One is MADV_COLD which will deactivate activated pages and the other is
MADV_PAGEOUT which will reclaim private pages instantly. These new
options complement MADV_DONTNEED and MADV_FREE by adding non-destructive
ways to gain some free memory space. MADV_PAGEOUT is similar to
MADV_DONTNEED in a way that it hints the kernel that memory region is not
currently needed and should be reclaimed immediately; MADV_COLD is similar
to MADV_FREE in a way that it hints the kernel that memory region is not
currently needed and should be reclaimed when memory pressure rises.This patch (of 5):
When a process expects no accesses to a certain memory range, it could
give a hint to kernel that the pages can be reclaimed when memory pressure
happens but data should be preserved for future use. This could reduce
workingset eviction so it ends up increasing performance.This patch introduces the new MADV_COLD hint to madvise(2) syscall.
MADV_COLD can be used by a process to mark a memory range as not expected
to be used in the near future. The hint can help kernel in deciding which
pages to evict early during memory pressure.It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves
active file page -> inactive file LRU
active anon page -> inacdtive anon LRUUnlike MADV_FREE, it doesn't move active anonymous pages to inactive file
LRU's head because MADV_COLD is a little bit different symantic.
MADV_FREE means it's okay to discard when the memory pressure because the
content of the page is *garbage* so freeing such pages is almost zero
overhead since we don't need to swap out and access afterward causes just
minor fault. Thus, it would make sense to put those freeable pages in
inactive file LRU to compete other used-once pages. It makes sense for
implmentaion point of view, too because it's not swapbacked memory any
longer until it would be re-dirtied. Even, it could give a bonus to make
them be reclaimed on swapless system. However, MADV_COLD doesn't mean
garbage so reclaiming them requires swap-out/in in the end so it's bigger
cost. Since we have designed VM LRU aging based on cost-model, anonymous
cold pages would be better to position inactive anon's LRU list, not file
LRU. Furthermore, it would help to avoid unnecessary scanning if system
doesn't have a swap device. Let's start simpler way without adding
complexity at this moment. However, keep in mind, too that it's a caveat
that workloads with a lot of pages cache are likely to ignore MADV_COLD on
anonymous memory because we rarely age anonymous LRU lists.* man-page material
MADV_COLD (since Linux x.x)
Pages in the specified regions will be treated as less-recently-accessed
compared to pages in the system with similar access frequencies. In
contrast to MADV_FREE, the contents of the region are preserved regardless
of subsequent writes to pages.MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP
pages.[akpm@linux-foundation.org: resolve conflicts with hmm.git]
Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org
Signed-off-by: Minchan Kim
Reported-by: kbuild test robot
Acked-by: Michal Hocko
Acked-by: Johannes Weiner
Cc: James E.J. Bottomley
Cc: Richard Henderson
Cc: Ralf Baechle
Cc: Chris Zankel
Cc: Johannes Weiner
Cc: Daniel Colascione
Cc: Dave Hansen
Cc: Hillf Danton
Cc: Joel Fernandes (Google)
Cc: Kirill A. Shutemov
Cc: Oleksandr Natalenko
Cc: Shakeel Butt
Cc: Sonny Rao
Cc: Suren Baghdasaryan
Cc: Tim Murray
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Sep, 2019
2 commits
-
A later patch makes THP deferred split shrinker memcg aware, but it needs
page->mem_cgroup information in THP destructor, which is called after
mem_cgroup_uncharge() now.So move mem_cgroup_uncharge() from __page_cache_release() to compound page
destructor, which is called by both THP and other compound pages except
HugeTLB. And call it in __put_single_page() for single order page.Link: http://lkml.kernel.org/r/1565144277-36240-3-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi
Suggested-by: "Kirill A . Shutemov"
Acked-by: Kirill A. Shutemov
Reviewed-by: Kirill Tkhai
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Shakeel Butt
Cc: David Rientjes
Cc: Qian Cai
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is a cleanup patch that replaces two historical uses of
list_move_tail() with relatively recent add_page_to_lru_list_tail().Link: http://lkml.kernel.org/r/20190716212436.7137-1-yuzhao@google.com
Signed-off-by: Yu Zhao
Cc: Vlastimil Babka
Cc: Michal Hocko
Cc: Jason Gunthorpe
Cc: Dan Williams
Cc: Matthew Wilcox
Cc: Ira Weiny
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Jul, 2019
2 commits
-
The stuff under sysctl describes /sys interface from userspace
point of view. So, add it to the admin-guide and remove the
:orphan: from its index file.Signed-off-by: Mauro Carvalho Chehab
-
Rename the /proc/sys/ documentation files to ReST, using the
README file as a template for an index.rst, adding the other
files there via TOC markup.Despite being written on different times with different
styles, try to make them somewhat coherent with a similar
look and feel, ensuring that they'll look nice as both
raw text file and as via the html output produced by the
Sphinx build system.At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.Signed-off-by: Mauro Carvalho Chehab
02 Jul, 2019
1 commit
-
release_pages() is an optimized version of a loop around put_page().
Unfortunately for devmap pages the logic is not entirely correct in
release_pages(). This is because device pages can be more than type
MEMORY_DEVICE_PUBLIC. There are in fact 4 types, private, public, FS DAX,
and PCI P2PDMA. Some of these have specific needs to "put" the page while
others do not.This logic to handle any special needs is contained in
put_devmap_managed_page(). Therefore all devmap pages should be processed
by this function where we can contain the correct logic for a page put.Handle all device type pages within release_pages() by calling
put_devmap_managed_page() on all devmap pages. If
put_devmap_managed_page() returns true the page has been put and we
continue with the next page. A false return of put_devmap_managed_page()
means the page did not require special processing and should fall to
"normal" processing.This was found via code inspection while determining if release_pages()
and the new put_user_pages() could be interchangeable.[1][1] https://lkml.kernel.org/r/20190523172852.GA27175@iweiny-DESK2.sc.intel.com
Link: https://lkml.kernel.org/r/20190605214922.17684-1-ira.weiny@intel.com
Cc: Jérôme Glisse
Cc: Michal Hocko
Reviewed-by: Dan Williams
Reviewed-by: John Hubbard
Signed-off-by: Ira Weiny
Signed-off-by: Jason Gunthorpe
21 May, 2019
1 commit
-
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the fileThese files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:GPL-2.0-only
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
15 May, 2019
1 commit
-
There is no function named munlock_vma_pages(). Correct it to
munlock_vma_page().Link: http://lkml.kernel.org/r/20190402095609.27181-1-peng.fan@nxp.com
Signed-off-by: Peng Fan
Reviewed-by: Andrew Morton
Reviewed-by: Mukesh Ojha
Acked-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Mar, 2019
1 commit
-
We have common pattern to access lru_lock from a page pointer:
zone_lru_lock(page_zone(page))Which is silly, because it unfolds to this:
&NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)]->zone_pgdat->lru_lock
while we can simply do
&NODE_DATA(page_to_nid(page))->lru_lockRemove zone_lru_lock() function, since it's only complicate things. Use
'page_pgdat(page)->lru_lock' pattern instead.[aryabinin@virtuozzo.com: a slightly better version of __split_huge_page()]
Link: http://lkml.kernel.org/r/20190301121651.7741-1-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/20190228083329.31892-2-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin
Acked-by: Vlastimil Babka
Acked-by: Mel Gorman
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Rik van Riel
Cc: William Kucharski
Cc: John Hubbard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Feb, 2019
1 commit
-
Since for_each_cpu(cpu, mask) added by commit 2d3854a37e8b767a
("cpumask: introduce new API, without changing anything") did not
evaluate the mask argument if NR_CPUS == 1 due to CONFIG_SMP=n,
lru_add_drain_all() is hitting WARN_ON() at __flush_work() added by
commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without
INIT_WORK().") by unconditionally calling flush_work() [1].Workaround this issue by using CONFIG_SMP=n specific lru_add_drain_all
implementation. There is no real need to defer the implementation to
the workqueue as the draining is going to happen on the local cpu. So
alias lru_add_drain_all to lru_add_drain which does all the necessary
work.[akpm@linux-foundation.org: fix various build warnings]
[1] https://lkml.kernel.org/r/18a30387-6aa5-6123-e67c-57579ecc3f38@roeck-us.net
Link: http://lkml.kernel.org/r/20190213124334.GH4525@dhcp22.suse.cz
Signed-off-by: Michal Hocko
Reported-by: Guenter Roeck
Debugged-by: Tetsuo Handa
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Jan, 2019
1 commit
-
Multiple filesystems open code lru_to_page(). Rectify this by moving
the macro from mm_inline (which is specific to lru stuff) to the more
generic mm.h header and start using the macro where appropriate.No functional changes.
Link: http://lkml.kernel.org/r/20181129104810.23361-1-nborisov@suse.com
Link: https://lkml.kernel.org/r/20181129075301.29087-1-nborisov@suse.com
Signed-off-by: Nikolay Borisov
Acked-by: Michal Hocko
Reviewed-by: David Hildenbrand
Reviewed-by: Mike Rapoport
Acked-by: Pankaj gupta
Acked-by: "Yan, Zheng" [ceph]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Dec, 2018
1 commit
-
totalram_pages and totalhigh_pages are made static inline function.
Main motivation was that managed_page_count_lock handling was complicating
things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS
Suggested-by: Michal Hocko
Suggested-by: Vlastimil Babka
Reviewed-by: Konstantin Khlebnikov
Reviewed-by: Pavel Tatashin
Acked-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: David Hildenbrand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Nov, 2018
1 commit
-
lockdep_assert_held() is better suited to checking locking requirements,
since it only checks if the current thread holds the lock regardless of
whether someone else does. This is also a step towards possibly removing
spin_is_locked().Signed-off-by: Lance Roy
Cc: Andrew Morton
Cc: "Kirill A. Shutemov"
Cc: Yang Shi
Cc: Matthew Wilcox
Cc: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jan Kara
Cc: Shakeel Butt
Cc:
Signed-off-by: Paul E. McKenney
29 Oct, 2018
1 commit
-
Pull XArray conversion from Matthew Wilcox:
"The XArray provides an improved interface to the radix tree data
structure, providing locking as part of the API, specifying GFP flags
at allocation time, eliminating preloading, less re-walking the tree,
more efficient iterations and not exposing RCU-protected pointers to
its users.This patch set
1. Introduces the XArray implementation
2. Converts the pagecache to use it
3. Converts memremap to use it
The page cache is the most complex and important user of the radix
tree, so converting it was most important. Converting the memremap
code removes the only other user of the multiorder code, which allows
us to remove the radix tree code that supported it.I have 40+ followup patches to convert many other users of the radix
tree over to the XArray, but I'd like to get this part in first. The
other conversions haven't been in linux-next and aren't suitable for
applying yet, but you can see them in the xarray-conv branch if you're
interested"* 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits)
radix tree: Remove multiorder support
radix tree test: Convert multiorder tests to XArray
radix tree tests: Convert item_delete_rcu to XArray
radix tree tests: Convert item_kill_tree to XArray
radix tree tests: Move item_insert_order
radix tree test suite: Remove multiorder benchmarking
radix tree test suite: Remove __item_insert
memremap: Convert to XArray
xarray: Add range store functionality
xarray: Move multiorder_check to in-kernel tests
xarray: Move multiorder_shrink to kernel tests
xarray: Move multiorder account test in-kernel
radix tree test suite: Convert iteration test to XArray
radix tree test suite: Convert tag_tagged_items to XArray
radix tree: Remove radix_tree_clear_tags
radix tree: Remove radix_tree_maybe_preload_order
radix tree: Remove split/join code
radix tree: Remove radix_tree_update_node_t
page cache: Finish XArray conversion
dax: Convert page fault handlers to XArray
...
27 Oct, 2018
1 commit
-
Remove duplicated include linux/memremap.h
Link: http://lkml.kernel.org/r/20180917131308.16420-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
21 Oct, 2018
1 commit
-
Removes sparse warnings.
Signed-off-by: Matthew Wilcox
30 Sep, 2018
1 commit
-
Introduce xarray value entries and tagged pointers to replace radix
tree exceptional entries. This is a slight change in encoding to allow
the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
value entry). It is also a change in emphasis; exceptional entries are
intimidating and different. As the comment explains, you can choose
to store values or pointers in the xarray and they are both first-class
citizens.Signed-off-by: Matthew Wilcox
Reviewed-by: Josef Bacik
22 May, 2018
1 commit
-
In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
be able to rely on the fact that they will get wakeups on dev_pagemap
page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
generic_dax_page_free() as common indicator / infrastructure for dax
filesytems to require. With this change there are no users of the
MEMORY_DEVICE_HOST designation, so remove it.The HMM sub-system extended dev_pagemap to arrange a callback when a
dev_pagemap managed page is freed. Since a dev_pagemap page is free /
idle when its reference count is 1 it requires an additional branch to
check the page-type at put_page() time. Given put_page() is a hot-path
we do not want to incur that check if HMM is not in use, so a static
branch is used to avoid that overhead when not necessary.Now, the FS_DAX implementation wants to reuse this mechanism for
receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
static-key into a generic mechanism that either HMM or FS_DAX code paths
can enable.For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
However, we still need to support FS_DAX in the FS_DAX_LIMITED case
implemented by the s390/dcssblk driver.Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Michal Hocko
Reported-by: kbuild test robot
Reported-by: Thomas Meyer
Reported-by: Dave Jiang
Cc: "Jérôme Glisse"
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig
Signed-off-by: Dan Williams
06 Apr, 2018
1 commit
-
The 'cold' parameter was removed from release_pages function by commit
c6f92f9fbe7d ("mm: remove cold parameter for release_pages").Update the description to match the code.
Link: http://lkml.kernel.org/r/1519585191-10180-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Feb, 2018
2 commits
-
There was a conflict between the commit e02a9f048ef7 ("mm/swap.c: make
functions and their kernel-doc agree") and the commit f144c390f905 ("mm:
docs: fix parameter names mismatch") that both tried to fix mismatch
betweeen pagevec_lookup_entries() parameter names and their description.Since nr_entries is a better name for the parameter, fix the description
again.Link: http://lkml.kernel.org/r/1518116946-20947-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Acked-by: Randy Dunlap
Reviewed-by: Andrew Morton
Cc: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When a thread mlocks an address space backed either by file pages which
are currently not present in memory or swapped out anon pages (not in
swapcache), a new page is allocated and added to the local pagevec
(lru_add_pvec), I/O is triggered and the thread then sleeps on the page.
On I/O completion, the thread can wake on a different CPU, the mlock
syscall will then sets the PageMlocked() bit of the page but will not be
able to put that page in unevictable LRU as the page is on the pagevec
of a different CPU. Even on drain, that page will go to evictable LRU
because the PageMlocked() bit is not checked on pagevec drain.The page will eventually go to right LRU on reclaim but the LRU stats
will remain skewed for a long time.This patch puts all the pages, even unevictable, to the pagevecs and on
the drain, the pages will be added on their LRUs correctly by checking
their evictability. This resolves the mlocked pages on pagevec of other
CPUs issue because when those pagevecs will be drained, the mlocked file
pages will go to unevictable LRU. Also this makes the race with munlock
easier to resolve because the pagevec drains happen in LRU lock.However there is still one place which makes a page evictable and does
PageLRU check on that page without LRU lock and needs special attention.
TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock().#0: __pagevec_lru_add_fn #1: clear_page_mlock
SetPageLRU() if (!TestClearPageMlocked())
return
smp_mb() //
Acked-by: Vlastimil Babka
Cc: Jérôme Glisse
Cc: Huang Ying
Cc: Tim Chen
Cc: Michal Hocko
Cc: Greg Thelen
Cc: Johannes Weiner
Cc: Balbir Singh
Cc: Minchan Kim
Cc: Shaohua Li
Cc: Jan Kara
Cc: Nicholas Piggin
Cc: Dan Williams
Cc: Mel Gorman
Cc: Hugh Dickins
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Feb, 2018
1 commit
-
There are several places where parameter descriptions do no match the
actual code. Fix it.Link: http://lkml.kernel.org/r/1516700871-22279-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Cc: Jonathan Corbet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Feb, 2018
2 commits
-
Fix some basic kernel-doc notation in mm/swap.c:
- for function lru_cache_add_anon(), make its kernel-doc function name
match its function name and change colon to hyphen following the
function name- for function pagevec_lookup_entries(), change the function parameter
name from nr_pages to nr_entries since that is more descriptive of
what the parameter actually is and then it matches the kernel-doc
comments alsoFix function kernel-doc to match the change in commit 67fd707f4681:
- drop the kernel-doc notation for @nr_pages from
pagevec_lookup_range() and correct the function description for that
changeLink: http://lkml.kernel.org/r/3b42ee3e-04a9-a6ca-6be4-f00752a114fe@infradead.org
Fixes: 67fd707f4681 ("mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()")
Signed-off-by: Randy Dunlap
Reviewed-by: Andrew Morton
Cc: Jan Kara
Cc: Matthew Wilcox
Cc: Hugh Dickins
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Pulling cpu hotplug locks inside the mm core function like
lru_add_drain_all just asks for problems and the recent lockdep splat
[1] just proves this. While the usage in that particular case might be
wrong we should avoid the locking as lru_add_drain_all() is used in many
places. It seems that this is not all that hard to achieve actually.We have done the same thing for drain_all_pages which is analogous by
commit a459eeb7b852 ("mm, page_alloc: do not depend on cpu hotplug locks
inside the allocator"). All we have to care about is to handle- the work item might be executed on a different cpu in worker from
unbound pool so it doesn't run on pinned on the cpu- we have to make sure that we do not race with page_alloc_cpu_dead
calling lru_add_drain_cputhe first part is already handled because the worker calls lru_add_drain
which disables preemption when calling lru_add_drain_cpu on the local
cpu it is draining. The later is true because page_alloc_cpu_dead is
called on the controlling CPU after the hotplugged CPU vanished
completely.[1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com
[add a cpu hotplug locking interaction as per tglx]
Link: http://lkml.kernel.org/r/20171116120535.23765-1-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Thomas Gleixner
Cc: Tejun Heo
Cc: Peter Zijlstra
Cc: Johannes Weiner
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Nov, 2017
8 commits
-
According to Vlastimil Babka, the drained field in pagevec is
potentially misleading because it might be interpreted as draining this
pagevec instead of the percpu lru pagevecs. Rename the field for
clarity.Link: http://lkml.kernel.org/r/20171019093346.ylahzdpzmoriyf4v@techsingularity.net
Signed-off-by: Mel Gorman
Suggested-by: Vlastimil Babka
Cc: Andi Kleen
Cc: Dave Chinner
Cc: Dave Hansen
Cc: Jan Kara
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Most callers users of free_hot_cold_page claim the pages being released
are cache hot. The exception is the page reclaim paths where it is
likely that enough pages will be freed in the near future that the
per-cpu lists are going to be recycled and the cache hotness information
is lost. As no one really cares about the hotness of pages being
released to the allocator, just ditch the parameter.The APIs are renamed to indicate that it's no longer about hot/cold
pages. It should also be less confusing as there are subtle differences
between them. __free_pages drops a reference and frees a page when the
refcount reaches zero. free_hot_cold_page handled pages whose refcount
was already zero which is non-obvious from the name. free_unref_page
should be more obvious.No performance impact is expected as the overhead is marginal. The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.[mgorman@techsingularity.net: add pages to head, not tail]
Link: http://lkml.kernel.org/r/20171019154321.qtpzaeftoyyw4iey@techsingularity.net
Link: http://lkml.kernel.org/r/20171018075952.10627-8-mgorman@techsingularity.net
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Andi Kleen
Cc: Dave Chinner
Cc: Dave Hansen
Cc: Jan Kara
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
All callers of release_pages claim the pages being released are cache
hot. As no one cares about the hotness of pages being released to the
allocator, just ditch the parameter.No performance impact is expected as the overhead is marginal. The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.Link: http://lkml.kernel.org/r/20171018075952.10627-7-mgorman@techsingularity.net
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Andi Kleen
Cc: Dave Chinner
Cc: Dave Hansen
Cc: Jan Kara
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Every pagevec_init user claims the pages being released are hot even in
cases where it is unlikely the pages are hot. As no one cares about the
hotness of pages being released to the allocator, just ditch the
parameter.No performance impact is expected as the overhead is marginal. The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Andi Kleen
Cc: Dave Chinner
Cc: Dave Hansen
Cc: Jan Kara
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When a pagevec is initialised on the stack, it is generally used
multiple times over a range of pages, looking up entries and then
releasing them. On each pagevec_release, the per-cpu deferred LRU
pagevecs are drained on the grounds the page being released may be on
those queues and the pages may be cache hot. In many cases only the
first drain is necessary as it's unlikely that the range of pages being
walked is racing against LRU addition. Even if there is such a race,
the impact is marginal where as constantly redraining the lru pagevecs
costs.This patch ensures that pagevec is only drained once in a given
lifecycle without increasing the cache footprint of the pagevec
structure. Only sparsetruncate tiny is shown here as large files have
many exceptional entries and calls pagecache_release less frequently.sparsetruncate (tiny)
4.14.0-rc4 4.14.0-rc4
batchshadow-v1r1 onedrain-v1r1
Min Time 141.00 ( 0.00%) 141.00 ( 0.00%)
1st-qrtle Time 142.00 ( 0.00%) 142.00 ( 0.00%)
2nd-qrtle Time 142.00 ( 0.00%) 142.00 ( 0.00%)
3rd-qrtle Time 143.00 ( 0.00%) 143.00 ( 0.00%)
Max-90% Time 144.00 ( 0.00%) 144.00 ( 0.00%)
Max-95% Time 146.00 ( 0.00%) 145.00 ( 0.68%)
Max-99% Time 198.00 ( 0.00%) 194.00 ( 2.02%)
Max Time 254.00 ( 0.00%) 208.00 ( 18.11%)
Amean Time 145.12 ( 0.00%) 144.30 ( 0.56%)
Stddev Time 12.74 ( 0.00%) 9.62 ( 24.49%)
Coeff Time 8.78 ( 0.00%) 6.67 ( 24.06%)
Best99%Amean Time 144.29 ( 0.00%) 143.82 ( 0.32%)
Best95%Amean Time 142.68 ( 0.00%) 142.31 ( 0.26%)
Best90%Amean Time 142.52 ( 0.00%) 142.19 ( 0.24%)
Best75%Amean Time 142.26 ( 0.00%) 141.98 ( 0.20%)
Best50%Amean Time 141.90 ( 0.00%) 141.71 ( 0.13%)
Best25%Amean Time 141.80 ( 0.00%) 141.43 ( 0.26%)The impact on bonnie is marginal and within the noise because a
significant percentage of the file being truncated has been reclaimed
and consists of shadow entries which reduce the hotness of the
pagevec_release path.Link: http://lkml.kernel.org/r/20171018075952.10627-5-mgorman@techsingularity.net
Signed-off-by: Mel Gorman
Cc: Andi Kleen
Cc: Dave Chinner
Cc: Dave Hansen
Cc: Jan Kara
Cc: Johannes Weiner
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages. Just drop the argument.Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Daniel Jordan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently pagevec_lookup_range_tag() takes number of pages to look up
but most users don't need this. Create a new function
pagevec_lookup_range_nr_tag() that takes maximum number of pages to
lookup for Ceph which wants this functionality so that we can drop
nr_pages argument from pagevec_lookup_range_tag().Link: http://lkml.kernel.org/r/20171009151359.31984-13-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Daniel Jordan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "Ranged pagevec tagged lookup", v3.
In this series I provide a ranged variant of pagevec_lookup_tag() and
use it in places where it makes sense. This series removes some common
code and it also has a potential for speeding up some operations
similarly as for pagevec_lookup_range() (but for now I can think of only
artificial cases where this happens).This patch (of 16):
Implement a variant of find_get_pages_tag() that stops iterating at
given index. Lots of users of this function (through pagevec_lookup())
actually want a range lookup and all of them are currently open-coding
this.Also create corresponding pagevec_lookup_range_tag() function.
Link: http://lkml.kernel.org/r/20171009151359.31984-2-jack@suse.cz
Signed-off-by: Jan Kara
Reviewed-by: Daniel Jordan
Cc: Bob Peterson
Cc: Chao Yu
Cc: David Howells
Cc: David Sterba
Cc: Ilya Dryomov
Cc: Jaegeuk Kim
Cc: Ryusuke Konishi
Cc: Steve French
Cc: "Theodore Ts'o"
Cc: "Yan, Zheng"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Oct, 2017
1 commit
-
MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
SwapBacked). There is no lock to prevent the page is added to swap
cache between these two steps by page reclaim. Page reclaim could add
the page to swap cache and unmap the page. After page reclaim, the page
is added back to lru. At that time, we probably start draining per-cpu
pagevec and mark the page lazyfree. So the page could be in a state
with SwapBacked cleared and PG_swapcache set. Next time there is a
refault in the virtual address, do_swap_page can find the page from swap
cache but the page has PageSwapCache false because SwapBacked isn't set,
so do_swap_page will bail out and do nothing. The task will keep
running into fault handler.Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
Link: http://lkml.kernel.org/r/6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com
Signed-off-by: Shaohua Li
Reported-by: Artem Savkov
Tested-by: Artem Savkov
Reviewed-by: Rik van Riel
Acked-by: Johannes Weiner
Acked-by: Michal Hocko
Acked-by: Minchan Kim
Cc: Hillf Danton
Cc: Hugh Dickins
Cc: Mel Gorman
Cc: [4.12+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Sep, 2017
1 commit
-
Platform with advance system bus (like CAPI or CCIX) allow device memory
to be accessible from CPU in a cache coherent fashion. Add a new type of
ZONE_DEVICE to represent such memory. The use case are the same as for
the un-addressable device memory but without all the corners cases.Link: http://lkml.kernel.org/r/20170817000548.32038-19-jglisse@redhat.com
Signed-off-by: Jérôme Glisse
Cc: Aneesh Kumar
Cc: Paul E. McKenney
Cc: Benjamin Herrenschmidt
Cc: Dan Williams
Cc: Ross Zwisler
Cc: Balbir Singh
Cc: David Nellans
Cc: Evgeny Baskakov
Cc: Johannes Weiner
Cc: John Hubbard
Cc: Kirill A. Shutemov
Cc: Mark Hairgrove
Cc: Michal Hocko
Cc: Sherry Cheung
Cc: Subhash Gutti
Cc: Vladimir Davydov
Cc: Bob Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Sep, 2017
3 commits
-
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages.Just drop the argument.
Link: http://lkml.kernel.org/r/20170726114704.7626-11-jack@suse.cz
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Implement a variant of find_get_pages() that stops iterating at given
index. This may be substantial performance gain if the mapping is
sparse. See following commit for details. Furthermore lots of users of
this function (through pagevec_lookup()) actually want a range lookup
and all of them are currently open-coding this.Also create corresponding pagevec_lookup_range() function.
Link: http://lkml.kernel.org/r/20170726114704.7626-4-jack@suse.cz
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make pagevec_lookup() (and underlying find_get_pages()) update index to
the next page where iteration should continue. Most callers want this
and also pagevec_lookup_tag() already does this.Link: http://lkml.kernel.org/r/20170726114704.7626-3-jack@suse.cz
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Jul, 2017
1 commit
-
The rework of the cpu hotplug locking unearthed potential deadlocks with
the memory hotplug locking code.The solution for these is to rework the memory hotplug locking code as
well and take the cpu hotplug lock before the memory hotplug lock in
mem_hotplug_begin(), but this will cause a recursive locking of the cpu
hotplug lock when the memory hotplug code calls lru_add_drain_all().Split out the inner workings of lru_add_drain_all() into
lru_add_drain_all_cpuslocked() so this function can be invoked from the
memory hotplug code with the cpu hotplug lock held.Link: http://lkml.kernel.org/r/20170704093421.419329357@linutronix.de
Signed-off-by: Thomas Gleixner
Reported-by: Andrey Ryabinin
Acked-by: Michal Hocko
Acked-by: Vlastimil Babka
Cc: Vladimir Davydov
Cc: Peter Zijlstra
Cc: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Jul, 2017
1 commit
-
Track the following reclaim counters for every memory cgroup: PGREFILL,
PGSCAN, PGSTEAL, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE and PGLAZYFREED.These values are exposed using the memory.stats interface of cgroup v2.
The meaning of each value is the same as for global counters, available
using /proc/vmstat.Also, for consistency, rename mem_cgroup_count_vm_event() to
count_memcg_event_mm().Link: http://lkml.kernel.org/r/1494530183-30808-1-git-send-email-guro@fb.com
Signed-off-by: Roman Gushchin
Suggested-by: Johannes Weiner
Acked-by: Michal Hocko
Acked-by: Vladimir Davydov
Acked-by: Johannes Weiner
Cc: Tejun Heo
Cc: Li Zefan
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 May, 2017
1 commit
-
madv()'s MADV_FREE indicate pages are 'lazyfree'. They are still
anonymous pages, but they can be freed without pageout. To distinguish
these from normal anonymous pages, we clear their SwapBacked flag.MADV_FREE pages could be freed without pageout, so they pretty much like
used once file pages. For such pages, we'd like to reclaim them once
there is memory pressure. Also it might be unfair reclaiming MADV_FREE
pages always before used once file pages and we definitively want to
reclaim the pages before other anonymous and file pages.To speed up MADV_FREE pages reclaim, we put the pages into
LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
nowadays and should be full of used once file pages. Reclaiming
MADV_FREE pages will not have much interfere of anonymous and active
file pages. And the inactive file pages and MADV_FREE pages will be
reclaimed according to their age, so we don't reclaim too many MADV_FREE
pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
means we can reclaim the pages without swap support. This idea is
suggested by Johannes.This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
avoid bisect failure, next patch will do it.The patch is based on Minchan's original patch.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/2f87063c1e9354677b7618c647abde77b07561e5.1487965799.git.shli@fb.com
Signed-off-by: Shaohua Li
Suggested-by: Johannes Weiner
Acked-by: Johannes Weiner
Acked-by: Minchan Kim
Acked-by: Michal Hocko
Acked-by: Hillf Danton
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds