Eric Lee / smarc-fsl-linux-kernel

04 Mar, 2013

1 commit

4a0fd5bf0 teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long ... Browse Code »

... and convert a bunch of SYSCALL_DEFINE ones to SYSCALL_DEFINE,
killing the boilerplate crap around them.

Signed-off-by: Al Viro

Al Viro
2013-03-04 11:46:22 +0800

27 Sep, 2012

2 commits

2903ff019 switch simple cases of fget_light to fdget ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-09-27 10:20:08 +0800
132ea2479 switch readahead(2) to fget_light() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-09-27 09:10:04 +0800

30 May, 2012

1 commit

782182e53 mm: move readahead syscall to mm/readahead.c ... Browse Code »

It is better to define readahead(2) in mm/readahead.c than in
mm/filemap.c.

Signed-off-by: Cong Wang
Cc: Fengguang Wu
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cong Wang
2012-05-30 07:22:23 +0800

31 Oct, 2011

1 commit

b95f1b31b mm: Map most files to use export.h instead of module.h ... Browse Code »

The files changed within are only using the EXPORT_SYMBOL
macro variants. They are not using core modular infrastructure
and hence don't need module.h but only the export.h header.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

25 May, 2011

1 commit

7b1de5868 readahead: readahead page allocations are OK to fail ... Browse Code »

Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.

readahead page allocations are completely optional. They are OK to fail
and in particular shall not trigger OOM on themselves.

Reported-by: Dave Young
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Wu Fengguang
Reviewed-by: Minchan Kim
Reviewed-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2011-05-25 23:39:25 +0800

10 Mar, 2011

2 commits

5b417b187 read-ahead: use plugging ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:26 +0800
7eaceacca block: remove per-queue plugging ... Browse Code »
86

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800

25 May, 2010

1 commit

bf8abe8b9 readahead.c: fix comment ... Browse Code »

Fix a wrong comment over page_cache_async_readahead().

Signed-off-by: Huang Shijie
Acked-by: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Shijie
2010-05-25 23:07:00 +0800

07 Apr, 2010

1 commit

70655c06b readahead: fix NULL filp dereference ... Browse Code »

btrfs relocate_file_extent_cluster() calls us with NULL filp:

[ 4005.426805] BUG: unable to handle kernel NULL pointer dereference at 00000021
[ 4005.426818] IP: [] page_cache_sync_readahead+0x18/0x3e

Signed-off-by: Wu Fengguang
Cc: Yan Zheng
Reported-by: Kirill A. Shutemov
Tested-by: Kirill A. Shutemov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2010-04-07 23:38:03 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

07 Mar, 2010

1 commit

0141450f6 readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM ... Browse Code »

This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.

POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
a 16K read will be carried out in 4 _sync_ 1-page reads.

In other places, ra_pages==0 means
- it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
- some IO error happened
where multi-page read IO won't help or should be avoided.

POSIX_FADV_RANDOM actually want a different semantics: to disable the
*heuristic* readahead algorithm, and to use a dumb one which faithfully
submit read IO for whatever application requests.

So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.

Note that the random hint is not likely to help random reads performance
noticeably. And it may be too permissive on huge request size (its IO
size is not limited by read_ahead_kb).

In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
(NFS read) performance of the application increased by 313%!

Tested-by: Quentin Barnes
Signed-off-by: Wu Fengguang
Cc: Nick Piggin
Cc: Andi Kleen
Cc: Steven Whitehouse
Cc: David Howells
Cc: Jonathan Corbet
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Trond Myklebust
Cc: Chuck Lever
Cc: [2.6.33.x]
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2010-03-07 03:26:25 +0800

18 Dec, 2009

1 commit

65a80b4c6 readahead: add blk_run_backing_dev ... Browse Code »

I added blk_run_backing_dev on page_cache_async_readahead so readahead I/O
is unpluged to improve throughput on especially RAID environment.

The normal case is, if page N become uptodate at time T(N), then T(N)
Acked-by: Wu Fengguang
Cc: Jens Axboe
Cc: KOSAKI Motohiro
Tested-by: Ronald
Cc: Bart Van Assche
Cc: Vladislav Bolkhovitin
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hisashi Hifumi
2009-12-18 07:45:32 +0800

17 Jun, 2009

8 commits

10be0b372 readahead: introduce context readahead algorithm ... Browse Code »

Introduce page cache context based readahead algorithm.
This is to better support concurrent read streams in general.

RATIONALE
---------
The current readahead algorithm detects interleaved reads in a _passive_ way.
Given a sequence of interleaved streams 1,1001,2,1002,3,4,1003,5,1004,1005,6,...
By checking for (offset == prev_offset + 1), it will discover the sequentialness
between 3,4 and between 1004,1005, and start doing sequential readahead for the
individual streams since page 4 and page 1005.

The context readahead algorithm guarantees to discover the sequentialness no
matter how the streams are interleaved. For the above example, it will start
sequential readahead since page 2 and 1002.

The trick is to poke for page @offset-1 in the page cache when it has no other
clues on the sequentialness of request @offset: if the current requenst belongs
to a sequential stream, that stream must have accessed page @offset-1 recently,
and the page will still be cached now. So if page @offset-1 is there, we can
take request @offset as a sequential access.

BENEFICIARIES
-------------
- strictly interleaved reads i.e. 1,1001,2,1002,3,1003,...
the current readahead will take them as silly random reads;
the context readahead will take them as two sequential streams.

- cooperative IO processes i.e. NFS and SCST
They create a thread pool, farming off (sequential) IO requests to different
threads which will be performing interleaved IO.

It was not easy(or possible) to reliably tell from file->f_ra all those
cooperative processes working on the same sequential stream, since they will
have different file->f_ra instances. And NFSD's file->f_ra is particularly
unusable, since their file objects are dynamically created for each request.
The nfsd does have code trying to restore the f_ra bits, but not satisfactory.

The new scheme is to detect the sequential pattern via looking up the page
cache, which provides one single and consistent view of the pages recently
accessed. That makes sequential detection for cooperative processes possible.

USER REPORT
-----------
Vladislav recommends the addition of context readahead as a result of his SCST
benchmarks. It leads to 6%~40% performance gains in various cases and achieves
equal performance in others. http://lkml.org/lkml/2009/3/19/239

OVERHEADS
---------
In theory, it introduces one extra page cache lookup per random read. However
the below benchmark shows context readahead to be slightly faster, wondering..

Randomly reading 200MB amount of data on a sparse file, repeat 20 times for
each block size. The average throughputs are:

original ra context ra gain
4K random reads: 65.561MB/s 65.648MB/s +0.1%
16K random reads: 124.767MB/s 124.951MB/s +0.1%
64K random reads: 162.123MB/s 162.278MB/s +0.1%

Cc: Jens Axboe
Cc: Jeff Moyer
Tested-by: Vladislav Bolkhovitin
Signed-off-by: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:30 +0800
045a2529a readahead: move the random read case to bottom ... Browse Code »

Split all readahead cases, and move the random one to bottom.

No behavior changes.

This is to prepare for the introduction of context readahead, and make it
easy for inserting accounting/tracing points for each case.

Signed-off-by: Wu Fengguang
Cc: Vladislav Bolkhovitin
Cc: Jens Axboe
Cc: Jeff Moyer
Cc: Nick Piggin
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:30 +0800
d30a11004 readahead: record mmap read-around states in file_ra_state ... Browse Code »

Mmap read-around now shares the same code style and data structure with
readahead code.

This also removes do_page_cache_readahead(). Its last user, mmap
read-around, has been changed to call ra_submit().

The no-readahead-if-congested logic is dumped by the way. Users will be
pretty sensitive about the slow loading of executables. So it's
unfavorable to disabled mmap read-around on a congested queue.

[akpm@linux-foundation.org: coding-style fixes]
Cc: Nick Piggin
Signed-off-by: Fengguang Wu
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:29 +0800
51daa88eb readahead: remove sync/async readahead call dependency ... Browse Code »

The readahead call scheme is error-prone in that it expects the call sites
to check for async readahead after doing a sync one. I.e.

if (!page)
page_cache_sync_readahead();
page = find_get_page();
if (page && PageReadahead(page))
page_cache_async_readahead();

This is because PG_readahead could be set by a sync readahead for the
_current_ newly faulted in page, and the readahead code simply expects one
more callback on the same page to start the async readahead. If the
caller fails to do so, it will miss the PG_readahead bits and never able
to start an async readahead.

Eliminate this insane constraint by piggy-backing the async part into the
current readahead window.

Now if an async readahead should be started immediately after a sync one,
the readahead logic itself will do it. So the following code becomes
valid: (the 'else' in particular)

if (!page)
page_cache_sync_readahead();
else if (PageReadahead(page))
page_cache_async_readahead();

Cc: Nick Piggin
Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:29 +0800
160334a0c readahead: increase interleaved readahead size ... Browse Code »

Make sure interleaved readahead size is larger than request size. This
also makes the readahead window grow up more quickly.

Reported-by: Xu Chenfeng
Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:29 +0800
caca7cb74 readahead: remove one unnecessary radix tree lookup ... Browse Code »

(hit_readahead_marker != 0) means the page at @offset is present, so we
can search for non-present page starting from @offset+1.

Reported-by: Xu Chenfeng
Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:28 +0800
fc31d16ad readahead: apply max_sane_readahead() limit in ondemand_readahead() ... Browse Code »

Just in case someone aggressively sets a huge readahead size.

Cc: Nick Piggin
Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:28 +0800
f7e839dd3 readahead: move max_sane_readahead() calls into force_page_cache_readahead() ... Browse Code »

Impact: code simplification.

Cc: Nick Piggin
Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:28 +0800

03 Apr, 2009

2 commits

266cf658e FS-Cache: Recruit a page flags for cache management ... Browse Code »

Recruit a page flag to aid in cache management. The following extra flag is
defined:

(1) PG_fscache (PG_private_2)

The marked page is backed by a local cache and is pinning resources in the
cache driver.

If PG_fscache is set, then things that checked for PG_private will now also
check for that. This includes things like truncation and page invalidation.
The function page_has_private() had been added to make the checks for both
PG_private and PG_private_2 at the same time.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Rik van Riel
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:36 +0800
03fb3d2af FS-Cache: Release page->private after failed readahead ... Browse Code »

The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails. If the filler function fails, then
the problematic page is left attached to the pagecache (with appropriate flags
set, one presumes) and the remaining to-be-attached pages are invalidated and
discarded. This permits pages with caching references associated with them to
be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Rik van Riel
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:35 +0800

26 Mar, 2009

1 commit

26160158d Move the default_backing_dev_info out of readahead.c and into backing-dev.c ... Browse Code »

It really makes no sense to have it in readahead.c, so move it where
it belongs.

Signed-off-by: Jens Axboe

Jens Axboe
2009-03-26 18:01:33 +0800

20 Oct, 2008

1 commit

4f98a2fee vmscan: split LRU lists into anon & file sets ... Browse Code »

Split the LRU lists in two, one set for pages that are backed by real file
systems ("file") and one for pages that are backed by memory and swap
("anon"). The latter includes tmpfs.

The advantage of doing this is that the VM will not have to scan over lots
of anonymous pages (which we generally do not want to swap out), just to
find the page cache pages that it should evict.

This patch has the infrastructure and a basic policy to balance how much
we scan the anon lists and how much we scan the file lists. The big
policy changes are in separate patches.

[lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]
[kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]
[kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page]
[hugh@veritas.com: memcg swapbacked pages active]
[hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]
[akpm@linux-foundation.org: fix /proc/vmstat units]
[nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]
[kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]
[kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]
Signed-off-by: Rik van Riel
Signed-off-by: Lee Schermerhorn
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Hugh Dickins
Signed-off-by: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2008-10-20 23:50:25 +0800

17 Oct, 2008

1 commit

e1f8e8744 Remove Andrew Morton's old email accounts ... Browse Code »

People can use the real name an an index into MAINTAINERS to find the
current email address.

Signed-off-by: Francois Cami
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Francois Cami
2008-10-17 02:21:32 +0800

27 Jul, 2008

1 commit

30002ed2e mm: readahead scan lockless ... Browse Code »

radix_tree_next_hole() is implemented as a series of radix_tree_lookup()s.
So it can be called locklessly, under rcu_read_lock().

Signed-off-by: Nick Piggin
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Hugh Dickins
Cc: "Paul E. McKenney"
Reviewed-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2008-07-27 03:00:06 +0800

30 Apr, 2008

1 commit

cf0ca9fe5 mm: bdi: export BDI attributes in sysfs ... Browse Code »

Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info object.
This allows us to see and set the various BDI specific variables.

In particular this properly exposes the read-ahead window for all relevant
users and /sys/block//queue/read_ahead_kb should be deprecated.

With patient help from Kay Sievers and Greg KH

[mszeredi@suse.cz]

- split off NFS and FUSE changes into separate patches
- document new sysfs attributes under Documentation/ABI
- do bdi_class_init as a core_initcall, otherwise the "default" BDI
won't be initialized
- remove bdi_init_fmt macro, it's not used very much

[akpm@linux-foundation.org: fix ia64 warning]
Signed-off-by: Peter Zijlstra
Cc: Kay Sievers
Acked-by: Greg KH
Cc: Trond Myklebust
Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2008-04-30 23:29:49 +0800

20 Mar, 2008

1 commit

f7850d932 mm/readahead: fix kernel-doc notation ... Browse Code »

Fix kernel-doc notation in mm/readahead.c.

Change ":" to ";" so that it doesn't get treated as a doc section heading.
Move the comment block ending "*/" to a line by itself so that the text on
that last line is not lost (dropped).

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2008-03-20 09:53:37 +0800

17 Oct, 2007

7 commits

e0bf68dde mm: bdi init hooks ... Browse Code »

provide BDI constructor/destructor hooks

[akpm@linux-foundation.org: compile fix]
Signed-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-10-17 23:42:45 +0800
eb2be1893 mm: buffered write cleanup ... Browse Code »

Quite a bit of code is used in maintaining these "cached pages" that are
probably pretty unlikely to get used. It would require a narrow race where
the page is inserted concurrently while this process is allocating a page
in order to create the spare page. Then a multi-page write into an uncached
part of the file, to make use of it.

Next, the buffered write path (and others) uses its own LRU pagevec when it
should be just using the per-CPU LRU pagevec (which will cut down on both data
and code size cacheline footprint). Also, these private LRU pagevecs are
emptied after just a very short time, in contrast with the per-CPU pagevecs
that are persistent. Net result: 7.3 times fewer lru_lock acquisitions required
to add the pages to pagecache for a bulk write (in 4K chunks).

[this gets rid of some cond_resched() calls in readahead.c and mpage.c due
to clashes in -mm. What put them there, and why? ]

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:54 +0800
001281881 mm: use lockless radix-tree probe ... Browse Code »

Probing pages and radix_tree_tagged are lockless operations with the lockless
radix-tree. Convert these users to RCU locking rather than using tree_lock.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-10-17 00:42:53 +0800
535443f51 readahead: remove several readahead macros ... Browse Code »

Remove VM_MAX_CACHE_HIT, MAX_RA_PAGES and MIN_RA_PAGES.

Signed-off-by: Fengguang Wu
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2007-10-17 00:42:52 +0800
6b10c6c9f readahead: basic support of interleaved reads ... Browse Code »

This is a simplified version of the pagecache context based readahead. It
handles the case of multiple threads reading on the same fd and invalidating
each others' readahead state. It does the trick by scanning the pagecache and
recovering the current read stream's readahead status.

The algorithm works in a opportunistic way, in that it does not try to detect
interleaved reads _actively_, which requires a probe into the page cache
(which means a little more overhead for random reads). It only tries to
handle a previously started sequential readahead whose state was overwritten
by another concurrent stream, and it can do this job pretty well.

Negative and positive examples(or what you can expect from it):

1) it cannot detect and serve perfect request-by-request interleaved reads
right:
time stream 1 stream 2
0 1
1 1001
2 2
3 1002
4 3
5 1003
6 4
7 1004
8 5
9 1005

Here no single readahead will be carried out.

2) However, if it's two concurrent reads by two threads, the chance of the
initial sequential readahead be started is huge. Once the first sequential
readahead is started for a stream, this patch will ensure that the readahead
window continues to rampup and won't be disturbed by other streams.

time stream 1 stream 2
0 1
1 2
2 1001
3 3
4 1002
5 1003
6 4
7 5
8 1004
9 6
10 1005
11 7
12 1006
13 1007

Here stream 1 will start a readahead at page 2, and stream 2 will start its
first readahead at page 1003. From then on the two streams will be served
right.

Cc: Rusty Russell
Signed-off-by: Fengguang Wu
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2007-10-17 00:42:52 +0800
f4e6b498d readahead: combine file_ra_state.prev_index/prev_offset into prev_pos ... Browse Code »

Combine the file_ra_state members
unsigned long prev_index
unsigned int prev_offset
into
loff_t prev_pos

It is more consistent and better supports huge files.

Thanks to Peter for the nice proposal!

[akpm@linux-foundation.org: fix shift overflow]
Cc: Peter Zijlstra
Signed-off-by: Fengguang Wu
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2007-10-17 00:42:52 +0800
937085aa3 readahead: compacting file_ra_state ... Browse Code »

Use 'unsigned int' instead of 'unsigned long' for readahead sizes.

This helps reduce memory consumption on 64bit CPU when a lot of files are
opened.

CC: Andi Kleen
Signed-off-by: Fengguang Wu
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2007-10-17 00:42:52 +0800

10 Oct, 2007

1 commit

f5ff8422b Fix warnings with !CONFIG_BLOCK ... Browse Code »

Hide everything in blkdev.h with CONFIG_BLOCK isn't set, and fixup
the (few) files that fail to build because they were relying on blkdev.h
pulling in extra includes for them.

Signed-off-by: Jens Axboe

Jens Axboe
2007-10-10 15:25:57 +0800

20 Jul, 2007

3 commits

f9acc8c7b readahead: sanify file_ra_state names ... Browse Code »

Rename some file_ra_state variables and remove some accessors.

It results in much simpler code.
Kudos to Rusty!

Signed-off-by: Fengguang Wu
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2007-07-20 01:04:44 +0800
cf914a7d6 readahead: split ondemand readahead interface into two functions ... Browse Code »

Split ondemand readahead interface into two functions. I think this makes it
a little clearer for non-readahead experts (like Rusty).

Internally they both call ondemand_readahead(), but the page argument is
changed to an obvious boolean flag.

Signed-off-by: Rusty Russell
Signed-off-by: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2007-07-20 01:04:44 +0800
fe3cba17c mm: share PG_readahead and PG_reclaim ... Browse Code »

Share the same page flag bit for PG_readahead and PG_reclaim.

One is used only on file reads, another is only for emergency writes. One
is used mostly for fresh/young pages, another is for old pages.

Combinations of possible interactions are:

a) clear PG_reclaim => implicit clear of PG_readahead
it will delay an asynchronous readahead into a synchronous one
it actually does _good_ for readahead:
the pages will be reclaimed soon, it's readahead thrashing!
in this case, synchronous readahead makes more sense.

b) clear PG_readahead => implicit clear of PG_reclaim
one(and only one) page will not be reclaimed in time
it can be avoided by checking PageWriteback(page) in readahead first

c) set PG_reclaim => implicit set of PG_readahead
will confuse readahead and make it restart the size rampup process
it's a trivial problem, and can mostly be avoided by checking
PageWriteback(page) first in readahead

d) set PG_readahead => implicit set of PG_reclaim
PG_readahead will never be set on already cached pages.
PG_reclaim will always be cleared on dirtying a page.
so not a problem.

In summary,
a) we get better behavior
b,d) possible interactions can be avoided
c) racy condition exists that might affect readahead, but the chance
is _really_ low, and the hurt on readahead is trivial.

Compound pages also use PG_reclaim, but for now they do not interact with
reclaim/readahead code.

Signed-off-by: Fengguang Wu
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2007-07-20 01:04:44 +0800