18 Oct, 2020
1 commit
-
For the case where read-ahead is disabled on the file, or if the cgroup
is congested, ensure that we can at least do 1 page of read-ahead to
make progress on the read in an async fashion. This could potentially be
larger, but it's not needed in terms of functionality, so let's error on
the side of caution as larger counts of pages may run into reclaim
issues (particularly if we're congested).This makes sure we're not hitting the potentially sync ->readpage() path
for IO that is marked IOCB_WAITQ, which could cause us to block. It also
means we'll use the same path for IO, regardless of whether or not
read-ahead happens to be disabled on the lower level device.Acked-by: Johannes Weiner
Reported-by: Matthew Wilcox (Oracle)
Reported-by: Hao_Xu
[axboe: updated for new ractl API]
Signed-off-by: Jens Axboe
17 Oct, 2020
7 commits
-
The file_ra_state being passed into page_cache_sync_readahead() was being
ignored in favour of using the one embedded in the struct file. The only
caller for which this makes a difference is the fsverity code if the file
has been marked as POSIX_FADV_RANDOM, but it's confusing and worth fixing.Signed-off-by: David Howells
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Eric Biggers
Link: https://lkml.kernel.org/r/20200903140844.14194-10-willy@infradead.org
Signed-off-by: Linus Torvalds -
Reimplement page_cache_sync_readahead() and page_cache_async_readahead()
as wrappers around versions of the function which take a readahead_control
in preparation for making do_sync_mmap_readahead() pass down an RAC
struct.Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: David Howells
Cc: Eric Biggers
Link: https://lkml.kernel.org/r/20200903140844.14194-8-willy@infradead.org
Signed-off-by: Linus Torvalds -
Reimplement force_page_cache_readahead() as a wrapper around
force_page_cache_ra(). Pass the existing readahead_control from
page_cache_sync_readahead().Signed-off-by: David Howells
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Eric Biggers
Link: https://lkml.kernel.org/r/20200903140844.14194-7-willy@infradead.org
Signed-off-by: Linus Torvalds -
Make ondemand_readahead() take a readahead_control struct in preparation
for making do_sync_mmap_readahead() pass down an RAC struct.Signed-off-by: David Howells
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Eric Biggers
Link: https://lkml.kernel.org/r/20200903140844.14194-6-willy@infradead.org
Signed-off-by: Linus Torvalds -
Rename __do_page_cache_readahead() to do_page_cache_ra() and call it
directly from ondemand_readahead() instead of indirecting via ra_submit().Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: David Howells
Cc: Eric Biggers
Link: https://lkml.kernel.org/r/20200903140844.14194-5-willy@infradead.org
Signed-off-by: Linus Torvalds -
Define it in the callers instead of in page_cache_ra_unbounded().
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: David Howells
Cc: Eric Biggers
Link: https://lkml.kernel.org/r/20200903140844.14194-4-willy@infradead.org
Signed-off-by: Linus Torvalds -
Patch series "Readahead patches for 5.9/5.10".
These are infrastructure for both the THP patchset and for the fscache
rewrite,For both pieces of infrastructure being build on top of this patchset, we
want the ractl to be available higher in the call-stack.For David's work, he wants to add the 'critical page' to the ractl so that
he knows which page NEEDS to be brought in from storage, and which ones
are nice-to-have. We might want something similar in block storage too.
It used to be simple -- the first page was the critical one, but then mmap
added fault-around and so for that usecase, the middle page is the
critical one. Anyway, I don't have any code to show that yet, we just
know that the lowest point in the callchain where we have that information
is do_sync_mmap_readahead() and so the ractl needs to start its life
there.For THP, we havew the code that needs it. It's actually the apex patch to
the series; the one which finally starts to allocate THPs and present them
to consenting filesystems:
http://git.infradead.org/users/willy/pagecache.git/commitdiff/798bcf30ab2eff278caad03a9edca74d2f8ae760This patch (of 8):
Allow for a more concise definition of a struct readahead_control.
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Eric Biggers
Cc: David Howells
Link: https://lkml.kernel.org/r/20200903140844.14194-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20200903140844.14194-3-willy@infradead.org
Signed-off-by: Linus Torvalds
03 Jun, 2020
13 commits
-
Ensure that memory allocations in the readahead path do not attempt to
reclaim file-backed pages, which could lead to a deadlock. It is
possible, though unlikely this is the root cause of a problem observed
by Cong Wang.Reported-by: Cong Wang
Suggested-by: Michal Hocko
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Christoph Hellwig
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: John Hubbard
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-16-willy@infradead.org
Signed-off-by: Linus Torvalds -
If the page is already in cache, we don't set PageReadahead on it.
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: John Hubbard
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-15-willy@infradead.org
Signed-off-by: Linus Torvalds -
ext4 and f2fs have duplicated the guts of the readahead code so they can
read past i_size. Instead, separate out the guts of the readahead code
so they can call it directly.Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Tested-by: Eric Biggers
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Reviewed-by: Eric Biggers
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: John Hubbard
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-14-willy@infradead.org
Signed-off-by: Linus Torvalds -
By reducing nr_to_read, we can eliminate this check from inside the loop.
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Christoph Hellwig
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-13-willy@infradead.org
Signed-off-by: Linus Torvalds -
This replaces ->readpages with a saner interface:
- Return void instead of an ignored error code.
- Page cache is already populated with locked pages when ->readahead
is called.
- New arguments can be passed to the implementation without changing
all the filesystems that use a common helper function like
mpage_readahead().Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-12-willy@infradead.org
Signed-off-by: Linus Torvalds -
When populating the page cache for readahead, mappings that use
->readpages must populate the page cache themselves as the pages are
passed on a linked list which would normally be used for the page
cache's LRU. For mappings that use ->readpage or the upcoming
->readahead method, we can put the pages into the page cache as soon as
they're allocated, which solves a race between readahead and direct IO.
It also lets us remove the gfp argument from read_pages().Use the new readahead_page() API to implement the repeated calls to
->readpage(), just like most filesystems will.Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: John Hubbard
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-11-willy@infradead.org
Signed-off-by: Linus Torvalds -
Replace the page_offset variable with 'index + i'.
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-10-willy@infradead.org
Signed-off-by: Linus Torvalds -
Change the type of page_idx to unsigned long, and rename it -- it's just
a loop counter, not a page index.Suggested-by: John Hubbard
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Dave Chinner
Reviewed-by: William Kucharski
Reviewed-by: Johannes Thumshirn
Cc: Chao Yu
Cc: Christoph Hellwig
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-9-willy@infradead.org
Signed-off-by: Linus Torvalds -
The word 'offset' is used ambiguously to mean 'byte offset within a
page', 'byte offset from the start of the file' and 'page offset from
the start of the file'.Use 'index' to mean 'page offset from the start of the file' throughout
the readahead code.[ We should probably rename the 'pgoff_t' type to 'pgidx_t' too - Linus ]
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Zi Yan
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Christoph Hellwig
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: John Hubbard
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-8-willy@infradead.org
Signed-off-by: Linus Torvalds -
In this patch, only between __do_page_cache_readahead() and
read_pages(), but it will be extended in upcoming patches. The
read_pages() function becomes aops centric, as this makes the most sense
by the end of the patchset.Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Reviewed-by: Johannes Thumshirn
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-7-willy@infradead.org
Signed-off-by: Linus Torvalds -
Simplify the callers by moving the check for nr_pages and the BUG_ON
into read_pages().Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Zi Yan
Reviewed-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Reviewed-by: Johannes Thumshirn
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-5-willy@infradead.org
Signed-off-by: Linus Torvalds -
We used to assign the return value to a variable, which we then ignored.
Remove the pretence of caring.Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: John Hubbard
Reviewed-by: William Kucharski
Reviewed-by: Johannes Thumshirn
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-4-willy@infradead.org
Signed-off-by: Linus Torvalds -
ondemand_readahead has two callers, neither of which use the return
value. That means that both ra_submit and __do_page_cache_readahead()
can return void, and we don't need to worry that a present page in the
readahead window causes us to return a smaller nr_pages than we ought to
have.Similarly, no caller uses the return value from
force_page_cache_readahead().Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Dave Chinner
Reviewed-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-3-willy@infradead.org
Signed-off-by: Linus Torvalds
21 May, 2019
1 commit
-
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the fileThese files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:GPL-2.0-only
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
06 Mar, 2019
1 commit
-
Many kernel-doc comments in mm/ have the return value descriptions
either misformatted or omitted at all which makes kernel-doc script
unhappy:$ make V=1 htmldocs
...
./mm/util.c:36: info: Scanning doc for kstrdup
./mm/util.c:41: warning: No description found for return value of 'kstrdup'
./mm/util.c:57: info: Scanning doc for kstrdup_const
./mm/util.c:66: warning: No description found for return value of 'kstrdup_const'
./mm/util.c:75: info: Scanning doc for kstrndup
./mm/util.c:83: warning: No description found for return value of 'kstrndup'
...Fixing the formatting and adding the missing return value descriptions
eliminates ~100 such warnings.Link: http://lkml.kernel.org/r/1549549644-4903-4-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Reviewed-by: Andrew Morton
Cc: Jonathan Corbet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Dec, 2018
1 commit
-
It's a trivial simplification for get_next_ra_size() and clear enough for
humans to understand.It also fixes potential overflow if ra->size(< ra_pages) is too large.
Link: http://lkml.kernel.org/r/1540707206-19649-1-git-send-email-hsiangkao@aol.com
Signed-off-by: Gao Xiang
Reviewed-by: Fengguang Wu
Reviewed-by: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
21 Oct, 2018
2 commits
-
This one is trivial.
Signed-off-by: Matthew Wilcox
-
The page cache offers the ability to search for a miss in the previous or
next N locations. Rather than teach the XArray about the page cache's
definition of a miss, use xas_prev() and xas_next() to search the page
array. This should be more efficient as it does not have to start the
lookup from the top for each index.Signed-off-by: Matthew Wilcox
30 Sep, 2018
1 commit
-
Introduce xarray value entries and tagged pointers to replace radix
tree exceptional entries. This is a slight change in encoding to allow
the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
value entry). It is also a change in emphasis; exceptional entries are
intimidating and different. As the comment explains, you can choose
to store values or pointers in the xarray and they are both first-class
citizens.Signed-off-by: Matthew Wilcox
Reviewed-by: Josef Bacik
31 Aug, 2018
1 commit
-
The implementation of readahead(2) syscall is identical to that of
fadvise64(POSIX_FADV_WILLNEED) with a few exceptions:
1. readahead(2) returns -EINVAL for !mapping->a_ops and fadvise64()
ignores the request and returns 0.
2. fadvise64() checks for integer overflow corner case
3. fadvise64() calls the optional filesystem fadvise() file operationUnite the two implementations by calling vfs_fadvise() from readahead(2)
syscall. Check the !mapping->a_ops in readahead(2) syscall to preserve
documented syscall ABI behaviour.Suggested-by: Miklos Szeredi
Fixes: d1d04ef8572b ("ovl: stack file ops")
Signed-off-by: Amir Goldstein
Signed-off-by: Miklos Szeredi
27 Jul, 2018
1 commit
-
ondemand_readahead() checks bdi->io_pages to cap the maximum pages
that need to be processed. This works until the readit section. If
we would do an async only readahead (async size = sync size) and
target is at beginning of window we expand the pages by another
get_next_ra_size() pages. Btrace for large reads shows that kernel
always issues a doubled size read at the beginning of processing.
Add an additional check for io_pages in the lower part of the func.
The fix helps devices that hard limit bio pages and rely on proper
handling of max_hw_read_sectors (e.g. older FusionIO cards). For
that reason it could qualify for stable.Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting")
Cc: stable@vger.kernel.org
Signed-off-by: Markus Stockhausen stockhausen@collogia.de
Signed-off-by: Jens Axboe
09 Jul, 2018
1 commit
-
We noticed in testing we'd get pretty bad latency stalls under heavy
pressure because read ahead would try to do its thing while the cgroup
was under severe pressure. If we're under this much pressure we want to
do as little IO as possible so we can still make progress on real work
if we're a throttled cgroup, so just skip readahead if our group is
under pressure.Signed-off-by: Josef Bacik
Acked-by: Tejun Heo
Acked-by: Andrew Morton
Signed-off-by: Jens Axboe
02 Jun, 2018
3 commits
-
That way file systems don't have to go spotting for non-contiguous pages
and work around them. It also kicks off I/O earlier, allowing it to
finish earlier and reduce latency.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
We never return an error, so switch to returning an unsigned int. Most
callers already did implicit casts to an unsigned type, and the one that
didn't can be simplified now.Suggested-by: Matthew Wilcox
Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong -
It counts the number of pages acted on, so name it nr_pages to make that
obvious.Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong
12 Apr, 2018
1 commit
-
Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root. Rename the address_space ->page_tree to ->i_pages,
since we don't really care that it's a tree.[willy@infradead.org: fix nds32, fs/dax.c]
Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
Signed-off-by: Matthew Wilcox
Acked-by: Jeff Layton
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Ryusuke Konishi
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Apr, 2018
1 commit
-
Using this helper allows us to avoid the in-kernel calls to the
sys_readahead() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_readahead().This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.netCc: Andrew Morton
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski
13 Dec, 2016
1 commit
-
We ran into a funky issue, where someone doing 256K buffered reads saw
128K requests at the device level. Turns out it is read-ahead capping
the request size, since we use 128K as the default setting. This
doesn't make a lot of sense - if someone is issuing 256K reads, they
should see 256K reads, regardless of the read-ahead setting, if the
underlying device can support a 256K read in a single command.This patch introduces a bdi hint, io_pages. This is the soft max IO
size for the lower level, I've hooked it up to the bdev settings here.
Read-ahead is modified to issue the maximum of the user request size,
and the read-ahead max size, but capped to the max request size on the
device side. The latter is done to avoid reading ahead too much, if the
application asks for a huge read. With this patch, the kernel behaves
like the application expects.Link: http://lkml.kernel.org/r/1479498073-8657-1-git-send-email-axboe@fb.com
Signed-off-by: Jens Axboe
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Aug, 2016
1 commit
-
For DAX inodes we need to be careful to never have page cache pages in
the mapping->page_tree. This radix tree should be composed only of DAX
exceptional entries and zero pages.ltp's readahead02 test was triggering a warning because we were trying
to insert a DAX exceptional entry but found that a page cache page had
already been inserted into the tree. This page was being inserted into
the radix tree in response to a readahead(2) call.Readahead doesn't make sense for DAX inodes, but we don't want it to
report a failure either. Instead, we just return success and don't do
any work.Link: http://lkml.kernel.org/r/20160824221429.21158-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler
Reported-by: Jeff Moyer
Cc: Dan Williams
Cc: Dave Chinner
Cc: Dave Hansen
Cc: Jan Kara
Cc: [4.5+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jul, 2016
1 commit
-
Vladimir has noticed that we might declare memcg oom even during
readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
restriction) while __do_page_cache_readahead uses
page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
OOMs. This gfp mask discrepancy is really unfortunate and easily
fixable. Drop page_cache_alloc_readahead() which only has one user and
outsource the gfp_mask logic into readahead_gfp_mask and propagate this
mask from __do_page_cache_readahead down to read_pages.This alone would have only very limited impact as most filesystems are
implementing ->readpages and the common implementation mpage_readpages
does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to
use readahead_gfp_mask instead as this function is called only during
readahead as well. The same applies to read_cache_pages.ext4 has its own ext4_mpage_readpages but the path which has pages !=
NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are
doing a very similar pattern to mpage_readpages so the same can be
applied to them as well.[akpm@linux-foundation.org: coding-style fixes]
[mhocko@suse.com: restrict gfp mask in mpage_alloc]
Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko
Cc: Vladimir Davydov
Cc: Chris Mason
Cc: Steve French
Cc: Theodore Ts'o
Cc: Jan Kara
Cc: Mike Marshall
Cc: Jaegeuk Kim
Cc: Changman Lee
Cc: Chao Yu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Apr, 2016
1 commit
-
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.Let's stop pretending that pages in page cache are special. They are
not.The changes are pretty straight-forward:
- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds
15 Jan, 2016
1 commit
-
Move lru_to_page() from internal.h to mm_inline.h.
Signed-off-by: Geliang Tang
Acked-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds