Eric Lee / smarc-fsl-linux-kernel

20 Jul, 2007

4 commits

cf914a7d6 readahead: split ondemand readahead interface into two functions ... Browse Code »

Split ondemand readahead interface into two functions. I think this makes it
a little clearer for non-readahead experts (like Rusty).

Internally they both call ondemand_readahead(), but the page argument is
changed to an obvious boolean flag.

Signed-off-by: Rusty Russell
Signed-off-by: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rusty Russell
2007-07-20 01:04:44 +0800
d8983910a readahead: pass real splice size ... Browse Code »

Pass real splice size to page_cache_readahead_ondemand().

The splice code works in chunks of 16 pages internally. The readahead code
should be told of the overall splice size, instead of the internal chunk size.
Otherwize bad things may happen. Imagine some 17-page random splice reads.
The code before this patch will result in two readahead calls: readahead(16);
readahead(1); That leads to one 16-page I/O and one 32-page I/O: one extra I/O
and 31 readahead miss pages.

Signed-off-by: Fengguang Wu
Cc: Jens Axboe
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2007-07-20 01:04:44 +0800
431a4820b readahead: move synchronous readahead call out of splice loop ... Browse Code »

Move synchronous page_cache_readahead_ondemand() call out of splice loop.

This avoids one pointless page allocation/insertion in case of non-zero
ra_pages, or many pointless readahead calls in case of zero ra_pages.

Note that if a user sets ra_pages to less than PIPE_BUFFERS=16 pages, he will
not get expected readahead behavior anyway. The splice code works in batches
of 16 pages, which can be taken as another form of synchronous readahead.

Signed-off-by: Fengguang Wu
Cc: Jens Axboe
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2007-07-20 01:04:44 +0800
a08a166fe readahead: convert splice invocations ... Browse Code »

Convert splice reads to use on-demand readahead.

Signed-off-by: Fengguang Wu
Cc: Steven Pratt
Cc: Ram Pai
Cc: Jens Axboe
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fengguang Wu
2007-07-20 01:04:44 +0800

16 Jul, 2007

1 commit

bcd4f3acb splice: direct splicing updates ppos twice ... Browse Code »

OGAWA Hirofumi reported that he's noticed
nfsd read corruption in recent kernels, and did the hard work of
discovering that it's due to splice updating the file position twice.
This means that the next operation would start further ahead than it
should.

nfsd_vfs_read()
splice_direct_to_actor()
while(len) {
do_splice_to() [update sd->pos]
-> generic_file_splice_read() [read from sd->pos]
nfsd_direct_splice_actor()
-> __splice_from_pipe() [update sd->pos]

There's nothing wrong with the core splice code, but the direct
splicing is an addon that calls both input and output paths.
So it has to take care in locally caching offset so it remains correct.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-16 21:02:48 +0800

13 Jul, 2007

2 commits

51a92c0f6 splice: fix offset mangling with direct splicing (sendfile) ... Browse Code »

If the output actor doesn't transfer the full amount of data, we will
increment ppos too much. Two related bugs in there:

- We need to break out and return actor() retval if it is shorted than
what we spliced into the pipe.

- Adjust ppos only according to actor() return.

Also fix loop problem in generic_file_splice_read(), it should not keep
going when data has already been transferred.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-13 20:14:31 +0800
29ce20586 security: revalidate rw permissions for sys_splice and sys_vmsplice ... Browse Code »

Revalidate read/write permissions for splice(2) and vmslice(2), in case
security policy has changed since the files were opened.

Acked-by: Stephen Smalley
Signed-off-by: James Morris
Signed-off-by: Jens Axboe

James Morris
2007-07-13 20:14:29 +0800

10 Jul, 2007

7 commits

0845718da pipe: add documentation and comments ... Browse Code »

As per Andrew Mortons request, here's a set of documentation for
the generic pipe_buf_operations hooks, the pipe, and pipe_buffer
structures.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 14:04:16 +0800
cac36bb06 pipe: change the ->pin() operation to ->confirm() ... Browse Code »

The name 'pin' was badly chosen, it doesn't pin a pipe buffer
in the most commonly used sense in the kernel. So change the
name to 'confirm', after debating this issue with Hugh
Dickins a bit.

A good return from ->confirm() means that the buffer is really
there, and that the contents are good.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 14:04:15 +0800
932cc6d4f splice: completely document external interface with kerneldoc ... Browse Code »

Also add fs/splice.c as a kerneldoc target with a smaller blurb that
should be expanded to better explain the overview of splice.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 14:04:15 +0800
497f9625c pipe: allow passing around of ops private pointer ... Browse Code »

relay needs this for proper consumption handling, and the network
receive support needs it as well to lookup the sk_buff on pipe
release.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 14:04:14 +0800
d6b29d7ce splice: divorce the splice structure/function definitions from the pipe header ... Browse Code »

We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 14:04:14 +0800
6a14b90bb vmsplice: add vmsplice-to-user support ... Browse Code »

A bit of a cheat, it actually just copies the data to userspace. But
this makes the interface nice and symmetric and enables people to build
on splice, with room for future improvement in performance.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 14:04:12 +0800
c66ab6fa7 splice: abstract out actor data ... Browse Code »

For direct splicing (or private splicing), the output may not be a file.
So abstract out the handling into a specified actor function and put
the data in the splice_desc structure earlier, so we can build on top
of that.

This is the first step in better splice handling for drivers, and also
for implementing vmsplice _to_ user memory.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 14:04:12 +0800

15 Jun, 2007

3 commits

02676e5ae splice: only check do_wakeup in splice_to_pipe() for a real pipe ... Browse Code »

We only ever set do_wakeup to non-zero if the pipe has an inode
backing, so it's pointless to check outside the pipe->inode
check.

Signed-off-by: Jens Axboe

Jens Axboe
2007-06-15 19:16:13 +0800
00de00bda splice: fix leak of pages on short splice to pipe ... Browse Code »

If the destination pipe is full and we already transferred
data, we break out instead of waiting for more pipe room.
The exit logic looks at spd->nr_pages to see if we moved
everything inside the spd container, but we decrement that
variable in the loop to decide when spd has emptied.

Instead we want to compare to the original page count in
the spd, so cache that in a local variable.

Signed-off-by: Jens Axboe

Jens Axboe
2007-06-15 19:14:22 +0800
17ee4f49a splice: adjust balance_dirty_pages_ratelimited() call ... Browse Code »

As we have potentially dirtied more than 1 page, we should indicate as
such to the dirty page balancing. So call
balance_dirty_pages_ratelimited_nr() and pass in the approximate number
of pages we dirtied.

Signed-off-by: Jens Axboe

Jens Axboe
2007-06-15 19:10:37 +0800

08 Jun, 2007

5 commits

620a324b7 splice: __generic_file_splice_read: fix read/truncate race ... Browse Code »

Original patch and description from Neil Brown ,
merged and adapted to splice branch by me. Neils text follows:

__generic_file_splice_read() currently samples the i_size at the start
and doesn't do so again unless it needs to call ->readpage to load
a page. After ->readpage it has to re-sample i_size as a truncate
may have caused that page to be filled with zeros, and the read()
call should not see these.

However there are other activities that might cause ->readpage to be
called on a page between the time that __generic_file_splice_read()
samples i_size and when it finds that it has an uptodate page. These
include at least read-ahead and possibly another thread performing a
read

So we must sample i_size *after* it has an uptodate page. Thus the
current sampling at the start and after a read can be replaced with a
sampling before page addition into spd.

Signed-off-by: Jens Axboe

Jens Axboe
2007-06-08 14:34:11 +0800
475ecade6 splice: __generic_file_splice_read: fix i_size_read() length checks ... Browse Code »

__generic_file_splice_read's partial page check, at eof after readpage,
not only got its calculations wrong, but also reused the loff variable:
causing data corruption when splicing from a non-0 offset in the file's
last page (revealed by ext2 -b 1024 testing on a loop of a tmpfs file).

Signed-off-by: Hugh Dickins
Signed-off-by: Jens Axboe

Hugh Dickins
2007-06-08 14:34:05 +0800
20d698db6 splice: move balance_dirty_pages_ratelimited() outside of splice actor ... Browse Code »

I've seen inode related deadlocks, so move this call outside of the
actor itself, which may hold the inode lock.

Signed-off-by: Jens Axboe

Jens Axboe
2007-06-08 14:33:59 +0800
267adc3e6 splice: remove do_splice_direct() symbol export ... Browse Code »

It's only supposed to be used by do_sendfile(), which is never
modular. So kill the export.

Signed-off-by: Jens Axboe

Jens Axboe
2007-06-08 14:33:41 +0800
d366d3988 splice: move inode size check into generic_file_splice_read() ... Browse Code »

Signed-off-by: Jens Axboe

Jens Axboe
2007-06-08 14:32:38 +0800

08 May, 2007

2 commits

86aa5ac53 [PATCH] splice: always call into page_cache_readahead() ... Browse Code »

Don't try to guess what the read-ahead logic will do, allow it
to make its own decisions.

Signed-off-by: Jens Axboe

Jens Axboe
2007-05-08 14:46:19 +0800
9ae9d68cb [PATCH] splice(): fix interaction with readahead ... Browse Code »

Eric Dumazet, thank you for disclosing this bug.

Readahead logic somehow fails to populate the page range with data.
It can be because

1) the readahead routine is not always called in the following lines of

fs/splice.c:
if (!loff || nr_pages > 1)
page_cache_readahead(mapping, &in->f_ra, in, index, nr_pages);

2) even called, page_cache_readahead() wont guarantee the pages are there.
It wont submit readahead I/O for pages already in the radix tree, or when
(ra_pages == 0), or after 256 cache hits.

In your case, it should be because of the retried reads, which lead to
excessive cache hits, and disables readahead at some time.

And that _one_ failure of readahead blocks the whole read process.
The application receives EAGAIN and retries the read, but
__generic_file_splice_read() refuse to make progress:

- in the previous invocation, it has allocated a blank page and inserted it
into the radix tree, but never has the chance to start I/O for it: the test
of SPLICE_F_NONBLOCK goes before that.

- in the retried invocation, the readahead code will neither get out of the
cache hit mode, nor will it submit I/O for an already existing page.

Cc: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Fengguang Wu
2007-05-08 14:44:36 +0800

29 Mar, 2007

1 commit

d9993c37e [PATCH] splice: partial write fix ... Browse Code »

Currently if partial write has happened while ->commit_write() then page
wasn't marked as accessed and rebalanced.

Signed-off-by: Monakhov Dmitriy
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Dmitriy Monakhov
2007-03-29 20:26:42 +0800

27 Mar, 2007

3 commits

40bee44ea Export __splice_from_pipe() ... Browse Code »

Ocfs2 wants to implement it's own splice write actor so that it can better
manage cluster / page locks. This lets us re-use the rest of splice write
while only providing our own code where it's actually important.

Signed-off-by: Mark Fasheh
Signed-off-by: Jens Axboe

Mark Fasheh
2007-03-27 14:55:47 +0800
08c725916 2/2 splice: dont readpage ... Browse Code »

Splice does not need to readpage to bring the page uptodate before writing
to it, because prepare_write will take care of that for us.

Splice is also wrong to SetPageUptodate before the page is actually uptodate.
This results in the old uninitialised memory leak. This gets fixed as a
matter of course when removing the readpage logic.

Signed-off-by: Nick Piggin
Signed-off-by: Jens Axboe

Nick Piggin
2007-03-27 14:55:39 +0800
485ddb4b9 1/2 splice: dont steal ... Browse Code »

Stealing pages with splice is problematic because we cannot just insert
an uptodate page into the pagecache and hope the filesystem can take care
of it later.

We also cannot just ClearPageUptodate, then hope prepare_write does not
write anything into the page, because I don't think prepare_write gives
that guarantee.

Remove support for SPLICE_F_MOVE for now. If we really want to bring it
back, we might be able to do so with a the new filesystem buffered write
aops APIs I'm working on. If we really don't want to bring it back, then
we should decide that sooner rather than later, and remove the flag and
all the stealing infrastructure before anybody starts using it.

Signed-off-by: Nick Piggin
Signed-off-by: Jens Axboe

Nick Piggin
2007-03-27 14:55:08 +0800

14 Dec, 2006

1 commit

d4c3cca94 [PATCH] constify pipe_buf_operations ... Browse Code »

- pipe/splice should use const pipe_buf_operations and file_operations

- struct pipe_inode_info has an unused field "start" : get rid of it.

Signed-off-by: Eric Dumazet
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2006-12-14 01:05:47 +0800

09 Dec, 2006

1 commit

0f7fc9e4d [PATCH] VFS: change struct file to use struct path ... Browse Code »

This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.

Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef "Jeff" Sipek
2006-12-09 00:28:41 +0800

05 Nov, 2006

1 commit

ddac0d39c [PATCH] splice: fix problem introduced with inode diet ... Browse Code »

After the inode slimming patch that unionised i_pipe/i_bdev/i_cdev, it's
no longer enough to check for existance of ->i_pipe to verify that this
is a pipe.

Original patch from Eric Dumazet
Final solution suggested by Linus.

Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds

Jens Axboe
2006-11-05 00:45:39 +0800

29 Oct, 2006

1 commit

2ae88149a [PATCH] mm: clean up pagecache allocation ... Browse Code »

- Consolidate page_cache_alloc

- Fix splice: only the pagecache pages and filesystem data need to use
mapping_gfp_mask.

- Fix grab_cache_page_nowait: same as splice, also honour NUMA placement.

Signed-off-by: Nick Piggin
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-10-29 02:30:50 +0800

20 Oct, 2006

3 commits

8c34e2d63 [PATCH] Remove SUID when splicing into an inode ... Browse Code »

Originally from Mark Fasheh

generic_file_splice_write() does not remove S_ISUID or S_ISGID. This is
inconsistent with the way we generally write to files.

Signed-off-by: Mark Fasheh
Signed-off-by: Jens Axboe

Jens Axboe
2006-10-20 02:53:09 +0800
6da618098 [PATCH] Introduce generic_file_splice_write_nolock() ... Browse Code »

This allows file systems to manage their own i_mutex locking while
still re-using the generic_file_splice_write() logic.

OCFS2 in particular wants this so that it can order cluster locks within
i_mutex.

Signed-off-by: Mark Fasheh
Signed-off-by: Jens Axboe

Mark Fasheh
2006-10-20 02:53:08 +0800
62752ee19 [PATCH] Take i_mutex in splice_from_pipe() ... Browse Code »

The splice_actor may be calling ->prepare_write() and ->commit_write(). We
want i_mutex on the inode being written to before calling those so that we
don't race i_size changes.

The double locking behavior is done elsewhere in splice.c, and if we
eventually want _nolock variants of generic_file_splice_write(), fs modules
might have to replicate the nasty locking code. We introduce
inode_double_lock() and inode_double_unlock() to consolidate the locking
rules into one set of functions.

Signed-off-by: Mark Fasheh
Signed-off-by: Jens Axboe

Mark Fasheh
2006-10-20 02:53:08 +0800

12 Oct, 2006

1 commit

e6e80f294 [PATCH] splice: fix pipe_to_file() ->prepare_write() error path ... Browse Code »

Don't jump to the unlock+release path, we already did that.

Signed-off-by: Jens Axboe

Jens Axboe
2006-10-12 21:08:51 +0800

01 Oct, 2006

1 commit

0fe234795 [PATCH] Update axboe@suse.de email address ... Browse Code »

As people often look for the copyright in files to see who to mail,
update the link to a neutral one.

Signed-off-by: Jens Axboe

Jens Axboe
2006-10-01 02:52:34 +0800

10 Jul, 2006

1 commit

aadd06e5c [PATCH] splice: fix problems with sys_tee() ... Browse Code »

Several issues noticed/fixed:

- We cannot reliably block in link_pipe() while holding both input and output
mutexes. So do preparatory checks before locking down both mutexes and doing
the link.

- The ipipe->nrbufs vs i check was bad, because we could have dropped the
ipipe lock in-between. This causes us to potentially look at unknown
buffers if we were racing with someone else reading this pipe.

Signed-off-by: Jens Axboe

Jens Axboe
2006-07-10 17:00:01 +0800

23 Jun, 2006

1 commit

9e94cd4fd [PATCH] splice: retrieve mapping after locking the page ... Browse Code »

Otherwise we could be racing with truncate/mapping removal.

Problem found/fixed by Nick Piggin , logic rewritten
by me.

Signed-off-by: Jens Axboe

Jens Axboe
2006-06-23 23:10:39 +0800

04 May, 2006

1 commit

a0548871e [PATCH] splice: redo page lookup if add_to_page_cache() returns -EEXIST ... Browse Code »

This can happen quite easily, if several processes are trying to splice
the same file at the same time. It's not a failure, it just means someone
raced with us in allocating this file page. So just dump the allocated
page and relookup the original.

Signed-off-by: Jens Axboe

Jens Axboe
2006-05-04 12:55:12 +0800