Eric Lee / smarc-fsl-linux-kernel

18 Oct, 2020

1 commit

324bcf54c mm: use limited read-ahead to satisfy read ... Browse Code »

For the case where read-ahead is disabled on the file, or if the cgroup
is congested, ensure that we can at least do 1 page of read-ahead to
make progress on the read in an async fashion. This could potentially be
larger, but it's not needed in terms of functionality, so let's error on
the side of caution as larger counts of pages may run into reclaim
issues (particularly if we're congested).

This makes sure we're not hitting the potentially sync ->readpage() path
for IO that is marked IOCB_WAITQ, which could cause us to block. It also
means we'll use the same path for IO, regardless of whether or not
read-ahead happens to be disabled on the lower level device.

Acked-by: Johannes Weiner
Reported-by: Matthew Wilcox (Oracle)
Reported-by: Hao_Xu
[axboe: updated for new ractl API]
Signed-off-by: Jens Axboe

Jens Axboe
2020-10-18 03:49:08 +0800

17 Oct, 2020

7 commits

b1647dc0d mm/readahead: pass a file_ra_state into force_page_cache_ra ... Browse Code »

The file_ra_state being passed into page_cache_sync_readahead() was being
ignored in favour of using the one embedded in the struct file. The only
caller for which this makes a difference is the fsverity code if the file
has been marked as POSIX_FADV_RANDOM, but it's confusing and worth fixing.

Signed-off-by: David Howells
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Eric Biggers
Link: https://lkml.kernel.org/r/20200903140844.14194-10-willy@infradead.org
Signed-off-by: Linus Torvalds

David Howells
2020-10-17 02:11:16 +0800
fefa7c478 mm/readahead: add page_cache_sync_ra and page_cache_async_ra ... Browse Code »

Reimplement page_cache_sync_readahead() and page_cache_async_readahead()
as wrappers around versions of the function which take a readahead_control
in preparation for making do_sync_mmap_readahead() pass down an RAC
struct.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: David Howells
Cc: Eric Biggers
Link: https://lkml.kernel.org/r/20200903140844.14194-8-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-10-17 02:11:16 +0800
7b3df3b9a mm/readahead: pass readahead_control to force_page_cache_ra ... Browse Code »

Reimplement force_page_cache_readahead() as a wrapper around
force_page_cache_ra(). Pass the existing readahead_control from
page_cache_sync_readahead().

Signed-off-by: David Howells
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Eric Biggers
Link: https://lkml.kernel.org/r/20200903140844.14194-7-willy@infradead.org
Signed-off-by: Linus Torvalds

David Howells
2020-10-17 02:11:16 +0800
6e4af69ae mm/readahead: make ondemand_readahead take a readahead_control ... Browse Code »

Make ondemand_readahead() take a readahead_control struct in preparation
for making do_sync_mmap_readahead() pass down an RAC struct.

Signed-off-by: David Howells
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Eric Biggers
Link: https://lkml.kernel.org/r/20200903140844.14194-6-willy@infradead.org
Signed-off-by: Linus Torvalds

David Howells
2020-10-17 02:11:16 +0800
8238287ea mm/readahead: make do_page_cache_ra take a readahead_control ... Browse Code »

Rename __do_page_cache_readahead() to do_page_cache_ra() and call it
directly from ondemand_readahead() instead of indirecting via ra_submit().

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: David Howells
Cc: Eric Biggers
Link: https://lkml.kernel.org/r/20200903140844.14194-5-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-10-17 02:11:16 +0800
73bb49da5 mm/readahead: make page_cache_ra_unbounded take a readahead_control ... Browse Code »

Define it in the callers instead of in page_cache_ra_unbounded().

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: David Howells
Cc: Eric Biggers
Link: https://lkml.kernel.org/r/20200903140844.14194-4-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-10-17 02:11:16 +0800
1aa83cfa5 mm/readahead: add DEFINE_READAHEAD ... Browse Code »

Patch series "Readahead patches for 5.9/5.10".

These are infrastructure for both the THP patchset and for the fscache
rewrite,

For both pieces of infrastructure being build on top of this patchset, we
want the ractl to be available higher in the call-stack.

For David's work, he wants to add the 'critical page' to the ractl so that
he knows which page NEEDS to be brought in from storage, and which ones
are nice-to-have. We might want something similar in block storage too.
It used to be simple -- the first page was the critical one, but then mmap
added fault-around and so for that usecase, the middle page is the
critical one. Anyway, I don't have any code to show that yet, we just
know that the lowest point in the callchain where we have that information
is do_sync_mmap_readahead() and so the ractl needs to start its life
there.

For THP, we havew the code that needs it. It's actually the apex patch to
the series; the one which finally starts to allocate THPs and present them
to consenting filesystems:
http://git.infradead.org/users/willy/pagecache.git/commitdiff/798bcf30ab2eff278caad03a9edca74d2f8ae760

This patch (of 8):

Allow for a more concise definition of a struct readahead_control.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Cc: Eric Biggers
Cc: David Howells
Link: https://lkml.kernel.org/r/20200903140844.14194-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20200903140844.14194-3-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-10-17 02:11:15 +0800

03 Jun, 2020

13 commits

f2c817bed mm: use memalloc_nofs_save in readahead path ... Browse Code »

Ensure that memory allocations in the readahead path do not attempt to
reclaim file-backed pages, which could lead to a deadlock. It is
possible, though unlikely this is the root cause of a problem observed
by Cong Wang.

Reported-by: Cong Wang
Suggested-by: Michal Hocko
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Christoph Hellwig
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: John Hubbard
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-16-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:07 +0800
2d8163e48 mm: document why we don't set PageReadahead ... Browse Code »

If the page is already in cache, we don't set PageReadahead on it.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: John Hubbard
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-15-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:07 +0800
2c684234d mm: add page_cache_readahead_unbounded ... Browse Code »

ext4 and f2fs have duplicated the guts of the readahead code so they can
read past i_size. Instead, separate out the guts of the readahead code
so they can call it directly.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Tested-by: Eric Biggers
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Reviewed-by: Eric Biggers
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: John Hubbard
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-14-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:06 +0800
b0f31d78c mm: move end_index check out of readahead loop ... Browse Code »

By reducing nr_to_read, we can eliminate this check from inside the loop.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Christoph Hellwig
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-13-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:06 +0800
8151b4c8b mm: add readahead address space operation ... Browse Code »

This replaces ->readpages with a saner interface:
- Return void instead of an ignored error code.
- Page cache is already populated with locked pages when ->readahead
is called.
- New arguments can be passed to the implementation without changing
all the filesystems that use a common helper function like
mpage_readahead().

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-12-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:06 +0800
c1f6925e1 mm: put readahead pages in cache earlier ... Browse Code »

When populating the page cache for readahead, mappings that use
->readpages must populate the page cache themselves as the pages are
passed on a linked list which would normally be used for the page
cache's LRU. For mappings that use ->readpage or the upcoming
->readahead method, we can put the pages into the page cache as soon as
they're allocated, which solves a race between readahead and direct IO.
It also lets us remove the gfp argument from read_pages().

Use the new readahead_page() API to implement the repeated calls to
->readpage(), just like most filesystems will.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: John Hubbard
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-11-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:06 +0800
ef8153b60 mm: remove 'page_offset' from readahead loop ... Browse Code »

Replace the page_offset variable with 'index + i'.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-10-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:06 +0800
c2c7ad74b mm: rename readahead loop variable to 'i' ... Browse Code »

Change the type of page_idx to unsigned long, and rename it -- it's just
a loop counter, not a page index.

Suggested-by: John Hubbard
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Dave Chinner
Reviewed-by: William Kucharski
Reviewed-by: Johannes Thumshirn
Cc: Chao Yu
Cc: Christoph Hellwig
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-9-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:06 +0800
08eb9658a mm: rename various 'offset' parameters to 'index' ... Browse Code »

The word 'offset' is used ambiguously to mean 'byte offset within a
page', 'byte offset from the start of the file' and 'page offset from
the start of the file'.

Use 'index' to mean 'page offset from the start of the file' throughout
the readahead code.

[ We should probably rename the 'pgoff_t' type to 'pgidx_t' too - Linus ]

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Zi Yan
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Christoph Hellwig
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: John Hubbard
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-8-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:06 +0800
a4d965366 mm: use readahead_control to pass arguments ... Browse Code »

In this patch, only between __do_page_cache_readahead() and
read_pages(), but it will be extended in upcoming patches. The
read_pages() function becomes aops centric, as this makes the most sense
by the end of the patchset.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Reviewed-by: Johannes Thumshirn
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-7-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:06 +0800
ad4ae1c73 mm: move readahead nr_pages check into read_pages ... Browse Code »

Simplify the callers by moving the check for nr_pages and the BUG_ON
into read_pages().

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Zi Yan
Reviewed-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Reviewed-by: Johannes Thumshirn
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-5-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:06 +0800
a1ef85665 mm: ignore return value of ->readpages ... Browse Code »

We used to assign the return value to a variable, which we then ignored.
Remove the pretence of caring.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: John Hubbard
Reviewed-by: William Kucharski
Reviewed-by: Johannes Thumshirn
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-4-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:06 +0800
9a42823a1 mm: return void from various readahead functions ... Browse Code »

ondemand_readahead has two callers, neither of which use the return
value. That means that both ra_submit and __do_page_cache_readahead()
can return void, and we don't need to worry that a present page in the
readahead window causes us to return a smaller nr_pages than we ought to
have.

Similarly, no caller uses the return value from
force_page_cache_readahead().

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Andrew Morton
Reviewed-by: Dave Chinner
Reviewed-by: John Hubbard
Reviewed-by: Christoph Hellwig
Reviewed-by: William Kucharski
Cc: Chao Yu
Cc: Cong Wang
Cc: Darrick J. Wong
Cc: Eric Biggers
Cc: Gao Xiang
Cc: Jaegeuk Kim
Cc: Joseph Qi
Cc: Junxiao Bi
Cc: Michal Hocko
Cc: Zi Yan
Cc: Johannes Thumshirn
Cc: Miklos Szeredi
Link: http://lkml.kernel.org/r/20200414150233.24495-3-willy@infradead.org
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2020-06-03 01:59:06 +0800

21 May, 2019

1 commit

457c89965 treewide: Add SPDX license identifier for missed files ... Browse Code »

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:45 +0800

06 Mar, 2019

1 commit

a862f68a8 docs/core-api/mm: fix return value descriptions in mm/ ... Browse Code »

Many kernel-doc comments in mm/ have the return value descriptions
either misformatted or omitted at all which makes kernel-doc script
unhappy:

$ make V=1 htmldocs
...
./mm/util.c:36: info: Scanning doc for kstrdup
./mm/util.c:41: warning: No description found for return value of 'kstrdup'
./mm/util.c:57: info: Scanning doc for kstrdup_const
./mm/util.c:66: warning: No description found for return value of 'kstrdup_const'
./mm/util.c:75: info: Scanning doc for kstrndup
./mm/util.c:83: warning: No description found for return value of 'kstrndup'
...

Fixing the formatting and adding the missing return value descriptions
eliminates ~100 such warnings.

Link: http://lkml.kernel.org/r/1549549644-4903-4-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport
Reviewed-by: Andrew Morton
Cc: Jonathan Corbet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Rapoport
2019-03-06 13:07:20 +0800

29 Dec, 2018

1 commit

20ff1c950 mm/readahead.c: simplify get_next_ra_size() ... Browse Code »

It's a trivial simplification for get_next_ra_size() and clear enough for
humans to understand.

It also fixes potential overflow if ra->size(< ra_pages) is too large.

Link: http://lkml.kernel.org/r/1540707206-19649-1-git-send-email-hsiangkao@aol.com
Signed-off-by: Gao Xiang
Reviewed-by: Fengguang Wu
Reviewed-by: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gao Xiang
2018-12-29 04:11:46 +0800

21 Oct, 2018

2 commits

560d454ba mm: Convert __do_page_cache_readahead to XArray ... Browse Code »

This one is trivial.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:46:37 +0800
0d3f92966 page cache: Convert hole search to XArray ... Browse Code »

The page cache offers the ability to search for a miss in the previous or
next N locations. Rather than teach the XArray about the page cache's
definition of a miss, use xas_prev() and xas_next() to search the page
array. This should be more efficient as it does not have to start the
lookup from the top for each index.

Signed-off-by: Matthew Wilcox

Matthew Wilcox
2018-10-21 22:46:33 +0800

30 Sep, 2018

1 commit

3159f943a xarray: Replace exceptional entries ... Browse Code »

Introduce xarray value entries and tagged pointers to replace radix
tree exceptional entries. This is a slight change in encoding to allow
the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
value entry). It is also a change in emphasis; exceptional entries are
intimidating and different. As the comment explains, you can choose
to store values or pointers in the xarray and they are both first-class
citizens.

Signed-off-by: Matthew Wilcox
Reviewed-by: Josef Bacik

Matthew Wilcox
2018-09-30 10:47:49 +0800

31 Aug, 2018

1 commit

3d8f76153 vfs: implement readahead(2) using POSIX_FADV_WILLNEED ... Browse Code »

The implementation of readahead(2) syscall is identical to that of
fadvise64(POSIX_FADV_WILLNEED) with a few exceptions:
1. readahead(2) returns -EINVAL for !mapping->a_ops and fadvise64()
ignores the request and returns 0.
2. fadvise64() checks for integer overflow corner case
3. fadvise64() calls the optional filesystem fadvise() file operation

Unite the two implementations by calling vfs_fadvise() from readahead(2)
syscall. Check the !mapping->a_ops in readahead(2) syscall to preserve
documented syscall ABI behaviour.

Suggested-by: Miklos Szeredi
Fixes: d1d04ef8572b ("ovl: stack file ops")
Signed-off-by: Amir Goldstein
Signed-off-by: Miklos Szeredi

Amir Goldstein
2018-08-31 02:01:32 +0800

27 Jul, 2018

1 commit

dc30b96ab readahead: stricter check for bdi io_pages ... Browse Code »

ondemand_readahead() checks bdi->io_pages to cap the maximum pages
that need to be processed. This works until the readit section. If
we would do an async only readahead (async size = sync size) and
target is at beginning of window we expand the pages by another
get_next_ra_size() pages. Btrace for large reads shows that kernel
always issues a doubled size read at the beginning of processing.
Add an additional check for io_pages in the lower part of the func.
The fix helps devices that hard limit bio pages and rely on proper
handling of max_hw_read_sectors (e.g. older FusionIO cards). For
that reason it could qualify for stable.

Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting")
Cc: stable@vger.kernel.org
Signed-off-by: Markus Stockhausen stockhausen@collogia.de
Signed-off-by: Jens Axboe

Markus Stockhausen
2018-07-27 23:09:53 +0800

09 Jul, 2018

1 commit

ca47e8c72 mm: skip readahead if the cgroup is congested ... Browse Code »

We noticed in testing we'd get pretty bad latency stalls under heavy
pressure because read ahead would try to do its thing while the cgroup
was under severe pressure. If we're under this much pressure we want to
do as little IO as possible so we can still make progress on real work
if we're a throttled cgroup, so just skip readahead if our group is
under pressure.

Signed-off-by: Josef Bacik
Acked-by: Tejun Heo
Acked-by: Andrew Morton
Signed-off-by: Jens Axboe

Josef Bacik
2018-07-09 23:07:54 +0800

02 Jun, 2018

3 commits

b3751e6ab mm: split ->readpages calls to avoid non-contiguous pages lists ... Browse Code »

That way file systems don't have to go spotting for non-contiguous pages
and work around them. It also kicks off I/O earlier, allowing it to
finish earlier and reduce latency.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2018-06-02 09:37:32 +0800
c534aa3fd mm: return an unsigned int from __do_page_cache_readahead ... Browse Code »

We never return an error, so switch to returning an unsigned int. Most
callers already did implicit casts to an unsigned type, and the one that
didn't can be simplified now.

Suggested-by: Matthew Wilcox
Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2018-06-02 09:37:32 +0800
836978b35 mm: give the 'ret' variable a better name __do_page_cache_readahead ... Browse Code »

It counts the number of pages acted on, so name it nr_pages to make that
obvious.

Signed-off-by: Christoph Hellwig
Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Signed-off-by: Darrick J. Wong

Christoph Hellwig
2018-06-02 09:37:32 +0800

12 Apr, 2018

1 commit

b93b01631 page cache: use xa_lock ... Browse Code »

Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root. Rename the address_space ->page_tree to ->i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
Signed-off-by: Matthew Wilcox
Acked-by: Jeff Layton
Cc: Darrick J. Wong
Cc: Dave Chinner
Cc: Ryusuke Konishi
Cc: Will Deacon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2018-04-12 01:28:39 +0800

03 Apr, 2018

1 commit

c7b95d515 mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() ... Browse Code »

Using this helper allows us to avoid the in-kernel calls to the
sys_readahead() syscall. The ksys_ prefix denotes that this function is
meant as a drop-in replacement for the syscall. In particular, it uses the
same calling convention as sys_readahead().

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Cc: Andrew Morton
Cc: linux-mm@kvack.org
Signed-off-by: Dominik Brodowski

Dominik Brodowski
2018-04-03 02:16:12 +0800

13 Dec, 2016

1 commit

9491ae4aa mm: don't cap request size based on read-ahead setting ... Browse Code »

We ran into a funky issue, where someone doing 256K buffered reads saw
128K requests at the device level. Turns out it is read-ahead capping
the request size, since we use 128K as the default setting. This
doesn't make a lot of sense - if someone is issuing 256K reads, they
should see 256K reads, regardless of the read-ahead setting, if the
underlying device can support a 256K read in a single command.

This patch introduces a bdi hint, io_pages. This is the soft max IO
size for the lower level, I've hooked it up to the bdev settings here.
Read-ahead is modified to issue the maximum of the user request size,
and the read-ahead max size, but capped to the max request size on the
device side. The latter is done to avoid reading ahead too much, if the
application asks for a huge read. With this patch, the kernel behaves
like the application expects.

Link: http://lkml.kernel.org/r/1479498073-8657-1-git-send-email-axboe@fb.com
Signed-off-by: Jens Axboe
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jens Axboe
2016-12-13 10:55:08 +0800

27 Aug, 2016

1 commit

11bd969fd mm: silently skip readahead for DAX inodes ... Browse Code »

For DAX inodes we need to be careful to never have page cache pages in
the mapping->page_tree. This radix tree should be composed only of DAX
exceptional entries and zero pages.

ltp's readahead02 test was triggering a warning because we were trying
to insert a DAX exceptional entry but found that a page cache page had
already been inserted into the tree. This page was being inserted into
the radix tree in response to a readahead(2) call.

Readahead doesn't make sense for DAX inodes, but we don't want it to
report a failure either. Instead, we just return success and don't do
any work.

Link: http://lkml.kernel.org/r/20160824221429.21158-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler
Reported-by: Jeff Moyer
Cc: Dan Williams
Cc: Dave Chinner
Cc: Dave Hansen
Cc: Jan Kara
Cc: [4.5+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2016-08-27 08:39:35 +0800

27 Jul, 2016

1 commit

8a5c743e3 mm, memcg: use consistent gfp flags during readahead ... Browse Code »

Vladimir has noticed that we might declare memcg oom even during
readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
restriction) while __do_page_cache_readahead uses
page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
OOMs. This gfp mask discrepancy is really unfortunate and easily
fixable. Drop page_cache_alloc_readahead() which only has one user and
outsource the gfp_mask logic into readahead_gfp_mask and propagate this
mask from __do_page_cache_readahead down to read_pages.

This alone would have only very limited impact as most filesystems are
implementing ->readpages and the common implementation mpage_readpages
does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to
use readahead_gfp_mask instead as this function is called only during
readahead as well. The same applies to read_cache_pages.

ext4 has its own ext4_mpage_readpages but the path which has pages !=
NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are
doing a very similar pattern to mpage_readpages so the same can be
applied to them as well.

[akpm@linux-foundation.org: coding-style fixes]
[mhocko@suse.com: restrict gfp mask in mpage_alloc]
Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko
Cc: Vladimir Davydov
Cc: Chris Mason
Cc: Steve French
Cc: Theodore Ts'o
Cc: Jan Kara
Cc: Mike Marshall
Cc: Jaegeuk Kim
Cc: Changman Lee
Cc: Chao Yu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-07-27 07:19:19 +0800

05 Apr, 2016

1 commit

09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

15 Jan, 2016

1 commit

d72ee9111 mm: move lru_to_page to mm_inline.h ... Browse Code »

Move lru_to_page() from internal.h to mm_inline.h.

Signed-off-by: Geliang Tang
Acked-by: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Geliang Tang
2016-01-15 08:00:49 +0800