10 Jul, 2006
1 commit
-
Several issues noticed/fixed:
- We cannot reliably block in link_pipe() while holding both input and output
mutexes. So do preparatory checks before locking down both mutexes and doing
the link.- The ipipe->nrbufs vs i check was bad, because we could have dropped the
ipipe lock in-between. This causes us to potentially look at unknown
buffers if we were racing with someone else reading this pipe.Signed-off-by: Jens Axboe
23 Jun, 2006
1 commit
-
Otherwise we could be racing with truncate/mapping removal.
Problem found/fixed by Nick Piggin , logic rewritten
by me.Signed-off-by: Jens Axboe
04 May, 2006
4 commits
-
This can happen quite easily, if several processes are trying to splice
the same file at the same time. It's not a failure, it just means someone
raced with us in allocating this file page. So just dump the allocated
page and relookup the original.Signed-off-by: Jens Axboe
-
Same thing was done in fs/pipe.c and most of fs/splice.c, but we had
a few missing still.Signed-off-by: Jens Axboe
-
Nick says that the current construct isn't safe. This goes back to the
original, but sets PIPE_BUF_FLAG_LRU on user pages as well as they all
seem to be on the LRU in the first place.Signed-off-by: Jens Axboe
-
Looking at generic_file_buffered_write(), we need to unlock_page() if
prepare write fails and it isn't due to racing with truncate().Also trim the size if ->prepare_write() fails, if we have to.
Signed-off-by: Jens Axboe
02 May, 2006
8 commits
-
Apply the same rules as the anon pipe pages, only allow stealing
if no one else is using the page.Signed-off-by: Jens Axboe
-
Currently we rely on the PIPE_BUF_FLAG_LRU flag being set correctly
to know whether we need to fiddle with page LRU state after stealing it,
however for some origins we just don't know if the page is on the LRU
list or not.So remove PIPE_BUF_FLAG_LRU and do this check/add manually in pipe_to_file()
instead.Signed-off-by: Jens Axboe
-
We need to use the minium of {len, PAGE_SIZE-off}, not {len, PAGE_SIZE}-off.
The latter doesn't make any sense, and could cause us to attempt negative
length transfers...Signed-off-by: Jens Axboe
-
If SPLICE_F_GIFT is set, the user is basically giving this pages away to
the kernel. That means we can steal them for eg page cache uses instead
of copying it.The data must be properly page aligned and also a multiple of the page size
in length.Signed-off-by: Jens Axboe
-
The pipe ->map() method uses kmap() to virtually map the pages, which
is both slow and has known scalability issues on SMP. This patch enables
atomic copying of pipe pages, by pre-faulting data and using kmap_atomic()
instead.lmbench bw_pipe and lat_pipe measurements agree this is a Good Thing. Here
are results from that on a UP machine with highmem (1.5GiB of RAM), running
first a UP kernel, SMP kernel, and SMP kernel patched.Vanilla-UP:
Pipe bandwidth: 1622.28 MB/sec
Pipe bandwidth: 1610.59 MB/sec
Pipe bandwidth: 1608.30 MB/sec
Pipe latency: 7.3275 microseconds
Pipe latency: 7.2995 microseconds
Pipe latency: 7.3097 microsecondsVanilla-SMP:
Pipe bandwidth: 1382.19 MB/sec
Pipe bandwidth: 1317.27 MB/sec
Pipe bandwidth: 1355.61 MB/sec
Pipe latency: 9.6402 microseconds
Pipe latency: 9.6696 microseconds
Pipe latency: 9.6153 microsecondsPatched-SMP:
Pipe bandwidth: 1578.70 MB/sec
Pipe bandwidth: 1579.95 MB/sec
Pipe bandwidth: 1578.63 MB/sec
Pipe latency: 9.1654 microseconds
Pipe latency: 9.2266 microseconds
Pipe latency: 9.1527 microsecondsSigned-off-by: Jens Axboe
-
Notify the readahead logic of the missing page. Suggested by
Oleg Nesterov.Signed-off-by: Jens Axboe
-
The ->map() function is really expensive on highmem machines right now,
since it has to use the slower kmap() instead of kmap_atomic(). Splice
rarely needs to access the virtual address of a page, so it's a waste
of time doing it.Introduce ->pin() to take over the responsibility of making sure the
page data is valid. ->map() is then reduced to just kmap(). That way we
can also share a most of the pipe buffer ops between pipe.c and splice.cSigned-off-by: Jens Axboe
-
Found by Oleg Nesterov , fixed by me.
- Only allow full pages to go to the page cache.
- Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN.
- Remember to clear 'stolen' if add_to_page_cache() fails.And as a cleanup on that:
- Make the bottom fall-through logic a little less convoluted. Also make
the steal path hold an extra reference to the page, so we don't have
to differentiate between stolen and non-stolen at the end.Signed-off-by: Jens Axboe
30 Apr, 2006
1 commit
-
- Check that page has suitable count for stealing in the regular pipes.
- pipe_to_file() assumes that the page is locked on succesful steal, so
do that in the pipe steal hook
- Missing unlock_page() in add_to_page_cache() failure.Signed-off-by: Jens Axboe
27 Apr, 2006
2 commits
-
Use the new find_get_pages_contig() to potentially look up the entire
splice range in one single call. This speeds up generic_file_splice_read()
quite a bit.Signed-off-by: Jens Axboe
-
Avoids doing useless work, when the file is fully cached.
Signed-off-by: Jens Axboe
26 Apr, 2006
4 commits
-
We need these for people writing their own ->splice_read/write hooks.
Signed-off-by: Jens Axboe
-
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice()
moves data to a pipe, with the input being a user address range instead.This uses an approach suggested by Linus, where we can hold partial ranges
inside the pages[] map. Hopefully this will be useful for network
receive support as well.Signed-off-by: Jens Axboe
-
Make the move_from_pipe() actors return number of bytes processed, then
move_from_pipe() can decide more cleverly when to move on to the next
buffer.This fixes problems with pipe offset and differing file offset.
Signed-off-by: Jens Axboe
-
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe
20 Apr, 2006
1 commit
-
Signed-off-by: Jens Axboe
19 Apr, 2006
5 commits
-
Since ->map() no longer locks the page, we need to adjust the handling
of those pages (and stealing) a little. This now passes full regressions
again.Signed-off-by: Jens Axboe
-
- We need to adjust *ppos for writes as well.
- Copy back modified offset value if one was passed in, similar to
what sendfile does.Signed-off-by: Jens Axboe
-
We need to ensure that we only drop a lock that is ordered last, to avoid
ABBA deadlocks with competing processes.Signed-off-by: Jens Axboe
-
- generic_file_splice_read() more readable and correct
- Don't bail on page allocation with NONBLOCK set, just don't allow
direct blocking on IO (eg lock_page).Signed-off-by: Jens Axboe
-
We need to check i_size after doing a blocking readpage.
Signed-off-by: Jens Axboe
11 Apr, 2006
8 commits
-
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.Signed-off-by: Jens Axboe
-
We need not use ->f_pos as the offset for the file input/output. If the
user passed an offset pointer in through sys_splice(), just use that and
leave ->f_pos alone.Signed-off-by: Jens Axboe
-
- capitalize consistently
- end sentences in one way or another
- update comment text to match the implementationSigned-off-by: Ingo Molnar
Signed-off-by: Jens Axboe -
The comment is also somewhat out of date, correct that as well.
Signed-off-by: Jens Axboe
-
Also corrects a few comments. Patch mainly from Ingo, changes by me.
Signed-off-by: Ingo Molnar
Signed-off-by: Jens Axboe -
- Kill the local variables that cache ->nrbufs, they just take up space.
- Only set do_wakeup for a real pipe. This is a big win for direct splicing.
- Kill i_mutex lock around ->f_pos update, regular io paths don't do this
either.Signed-off-by: Jens Axboe
-
Using find_get_page() is a lot faster than find_or_create_page(). This
gets splice a lot closer to sendfile() for fd -> socket transfers.Signed-off-by: Jens Axboe
-
It's more efficient for sendfile() emulation. Basically we cache an
internal private pipe and just use that as the intermediate area for
pages. Direct splicing is not available from sys_splice(), it is only
meant to be used for sendfile() emulation.Additional patch from Ingo Molnar to avoid the PIPE_BUFFERS loop at
exit for the normal fast path.Signed-off-by: Jens Axboe
10 Apr, 2006
5 commits
-
add optional input and output offsets to sys_splice(), for seekable file
descriptors:asmlinkage long sys_splice(int fd_in, loff_t __user *off_in,
int fd_out, loff_t __user *off_out,
size_t len, unsigned int flags);semantics are straightforward: f_pos will be updated with the offset
provided by user-space, before the splice transfer is about to begin.
Providing a NULL offset pointer means the existing f_pos will be used
(and updated in situ). Providing an offset for a pipe results in
-ESPIPE. Providing an invalid offset pointer results in -EFAULT.Signed-off-by: Ingo Molnar
Signed-off-by: Jens Axboe -
separate out the 'internal pipe object' abstraction, and make it
usable to splice. This cleans up and fixes several aspects of the
internal splice APIs and the pipe code:- pipes: the allocation and freeing of pipe_inode_info is now more symmetric
and more streamlined with existing kernel practices.- splice: small micro-optimization: less pointer dereferencing in splice
methodsSigned-off-by: Ingo Molnar
Update XFS for the ->splice_read/->splice_write changes.
Signed-off-by: Jens Axboe
-
We don't want to call into the read-ahead logic unless we are at the
start of a page, _or_ we have multiple pages to read.Signed-off-by: Jens Axboe
-
We don't really need to lock down the pages, just make sure they
are uptodate.Signed-off-by: Jens Axboe
-
The whole shadow/pages logic got overly complex, and this simpler
approach is actually faster in testing.Signed-off-by: Jens Axboe