06 Jan, 2017
1 commit
-
commit 52bce91165e5f2db422b2b972e83d389e5e4725c upstream.
Commit 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()")
caused a regression when there were no more readers left on a pipe that
was being spliced into: rather than the expected SIGPIPE and -EPIPE
return value, the writer would end up waiting forever for space to free
up (which obviously was not going to happen with no readers around).Fixes: 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()")
Reported-and-tested-by: Andreas Schwab
Debugged-by: Al Viro
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
27 Nov, 2016
1 commit
-
Botched calculation of number of pages. As the result,
we were dropping pieces when doing splice to pipe from
e.g. 9p.Reported-by: Alexei Starovoitov
Tested-by: Alexei Starovoitov
Signed-off-by: Al Viro
11 Nov, 2016
1 commit
-
i_size check is a leftover from the horrors that used to play with
the page cache in that function. With the switch to ->read_iter(),
it's neither needed nor correct - for gfs2 it ends up being buggy,
since i_size is not guaranteed to be correct until later (inside
->read_iter()).Spotted-by: Abhi Das
Signed-off-by: Al Viro
11 Oct, 2016
1 commit
-
by making sure we call iov_iter_advance() on original
iov_iter even if direct_IO (done on its copy) has returned 0.
It's a no-op for old iov_iter flavours and does the right thing
(== truncation of the stuff we'd allocated, but not filled) in
ITER_PIPE case. Failures (e.g. -EIO) get caught and dealt with
by cleanup in generic_file_read_iter().Signed-off-by: Al Viro
06 Oct, 2016
6 commits
-
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro -
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro -
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro -
we only use iov_iter_get_pages_alloc() and iov_iter_advance() -
pages are filled by kernel_readv() via a kvec array (as we used
to do all along), so iov_iter here is used only as a way of
arranging for those pages to be in pipe.Signed-off-by: Al Viro
-
... and kill the ->splice_read() instances that can be switched to it
Signed-off-by: Al Viro
-
iov_iter variant for passing data into pipe. copy_to_iter()
copies data into page(s) it has allocated and stuffs them into
the pipe; copy_page_to_iter() stuffs there a reference to the
page given to it. Both will try to coalesce if possible.
iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages()
and friends will do as copy_to_iter() would have and return the
pages where the data would've been copied. iov_iter_advance()
will truncate everything past the spot it has advanced to.New primitive: iov_iter_pipe(), used for initializing those.
pipe should be locked all along.Running out of space acts as fault would for iovec-backed ones;
in other words, giving it to ->read_iter() may result in short
read if the pipe overflows, or -EFAULT if it happens with nothing
copied there.In other words, ->read_iter() on those acts pretty much like
->splice_read(). Moreover, all generic_file_splice_read() users,
as well as many other ->splice_read() instances can be switched
to that scheme - that'll happen in the next commit.Signed-off-by: Al Viro
04 Oct, 2016
4 commits
-
single-buffer analogue of splice_to_pipe(); vmsplice_to_pipe() switched
to that, leaving splice_to_pipe() only for ->splice_read() instances
(and that only until they are converted as well).Signed-off-by: Al Viro
-
* splice_to_pipe() stops at pipe overflow and does *not* take pipe_lock
* ->splice_read() instances do the same
* vmsplice_to_pipe() and do_splice() (ultimate callers of splice_to_pipe())
arrange for waiting, looping, etc. themselves.That should make pipe_lock the outermost one.
Unfortunately, existing rules for the amount passed by vmsplice_to_pipe()
and do_splice() are quite ugly _and_ userland code can be easily broken
by changing those. It's not even "no more than the maximal capacity of
this pipe" - it's "once we'd fed pipe->nr_buffers pages into the pipe,
leave instead of waiting".Considering how poorly these rules are documented, let's try "wait for some
space to appear, unless given SPLICE_F_NONBLOCK, then push into pipe
and if we run into overflow, we are done".Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
11 May, 2016
1 commit
05 Apr, 2016
1 commit
-
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.Let's stop pretending that pages in page cache are special. They are
not.The changes are pretty straight-forward:
- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds
04 Apr, 2016
1 commit
-
pipe capacity won't exceed 2G anyway.
Signed-off-by: Al Viro
19 Mar, 2016
2 commits
-
Running the following command:
busybox cat /sys/kernel/debug/tracing/trace_pipe > /dev/null
with any tracing enabled pretty very quickly leads to various NULL
pointer dereferences and VM BUG_ON()s, such as these:BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [] generic_pipe_buf_release+0xc/0x40
Call Trace:
[] splice_direct_to_actor+0x143/0x1e0
[] ? generic_pipe_buf_nosteal+0x10/0x10
[] do_splice_direct+0x8f/0xb0
[] do_sendfile+0x199/0x380
[] SyS_sendfile64+0x90/0xa0
[] entry_SYSCALL_64_fastpath+0x12/0x6dpage dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
kernel BUG at include/linux/mm.h:367!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
RIP: [] generic_pipe_buf_release+0x3c/0x40
Call Trace:
[] splice_direct_to_actor+0x143/0x1e0
[] ? generic_pipe_buf_nosteal+0x10/0x10
[] do_splice_direct+0x8f/0xb0
[] do_sendfile+0x199/0x380
[] SyS_sendfile64+0x90/0xa0
[] tracesys_phase2+0x84/0x89(busybox's cat uses sendfile(2), unlike the coreutils version)
This is because tracing_splice_read_pipe() can call splice_to_pipe()
with spd->nr_pages == 0. spd_pages underflows in splice_to_pipe() and
we fill the page pointers and the other fields of the pipe_buffers with
garbage.All other callers of splice_to_pipe() avoid calling it when nr_pages ==
0, and we could make tracing_splice_read_pipe() do that too, but it
seems reasonable to have splice_to_page() handle this condition
gracefully.Cc: stable@vger.kernel.org
Signed-off-by: Rabin Vincent
Reviewed-by: Christoph Hellwig
Signed-off-by: Al Viro
05 Mar, 2016
1 commit
-
This way we can set kiocb flags also from the sync read/write path for
the read_iter/write_iter operations. For now there is no way to pass
flags to plain read/write operations as there is no real need for that,
and all flags passed are explicitly rejected for these files.Signed-off-by: Milosz Tanski
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig
Reviewed-by: Stephen Bates
Tested-by: Stephen Bates
Acked-by: Jeff Moyer
Signed-off-by: Al Viro
09 Jan, 2016
1 commit
-
During testing, I discovered that __generic_file_splice_read() returns
0 (EOF) when aops->readpage fails with AOP_TRUNCATED_PAGE on the first
page of a single/multi-page splice read operation. This EOF return code
causes the userspace test to (correctly) report a zero-length read error
when it was expecting otherwise.The current strategy of returning a partial non-zero read when ->readpage
returns AOP_TRUNCATED_PAGE works only when the failed page is not the
first of the lot being processed.This patch attempts to retry lookup and call ->readpage again on pages
that had previously failed with AOP_TRUNCATED_PAGE. With this patch, my
tests pass and I haven't noticed any unwanted side effects.This version removes the thrice-retry loop and instead indefinitely
retries lookups on AOP_TRUNCATED_PAGE errors from ->readpage. This
behavior is now similar to do_generic_file_read().Signed-off-by: Abhi Das
Reviewed-by: Jan Kara
Cc: Bob Peterson
Cc: Al Viro
Signed-off-by: Al Viro
24 Nov, 2015
2 commits
-
The following test program from Dmitry can cause softlockups or RCU
stalls as it copies 1GB from tmpfs into eventfd and we don't have any
scheduling point at that path in sendfile(2) implementation:int r1 = eventfd(0, 0);
int r2 = memfd_create("", 0);
unsigned long n = 1<
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara
Signed-off-by: Al Viro -
Commit 296291cdd162 (mm: make sendfile(2) killable) fixed an issue where
sendfile(2) was doing a lot of tiny writes into a filesystem and thus
was unkillable for a long time. However sendfile(2) can be (mis)used to
issue lots of writes into arbitrary file descriptor such as evenfd or
similar special file descriptors which never hit the standard filesystem
write path and thus are still unkillable. E.g. the following example
from Dmitry burns CPU for ~16s on my test system without possibility to
be killed:int r1 = eventfd(0, 0);
int r2 = memfd_create("", 0);
unsigned long n = 1<
Signed-off-by: Jan Kara
Signed-off-by: Al Viro
07 Nov, 2015
1 commit
-
There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.Let's introduce a helper function which makes the restriction explicit and
easier to track. This patch doesn't introduce any functional changes.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko
Suggested-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Jun, 2015
2 commits
-
Merge first patchbomb from Andrew Morton:
- a few misc things
- ocfs2 udpates
- kernel/watchdog.c feature work (took ages to get right)
- most of MM. A few tricky bits are held up and probably won't make 4.2.
* emailed patches from Andrew Morton : (91 commits)
mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()
mm, thp: respect MPOL_PREFERRED policy with non-local node
tmpfs: truncate prealloc blocks past i_size
mm/memory hotplug: print the last vmemmap region at the end of hot add memory
mm/mmap.c: optimization of do_mmap_pgoff function
mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan
mm: kmemleak: avoid deadlock on the kmemleak object insertion error path
mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup()
mm: kmemleak: fix delete_object_*() race when called on the same memory block
mm: kmemleak: allow safe memory scanning during kmemleak disabling
memcg: convert mem_cgroup->under_oom from atomic_t to int
memcg: remove unused mem_cgroup->oom_wakeups
frontswap: allow multiple backends
x86, mirror: x86 enabling - find mirrored memory ranges
mm/memblock: allocate boot time data structures from mirrored memory
mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute
mm: do not ignore mapping_gfp_mask in page cache allocation paths
mm/cma.c: fix typos in comments
mm/oom_kill.c: print points as unsigned int
mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages
... -
page_cache_read, do_generic_file_read, __generic_file_splice_read and
__ntfs_grab_cache_pages currently ignore mapping_gfp_mask when calling
add_to_page_cache_lru which might cause recursion into fs down in the
direct reclaim path if the mapping really relies on GFP_NOFS semantic.This doesn't seem to be the case now because page_cache_read (page fault
path) doesn't seem to suffer from the reclaim recursion issues and
do_generic_file_read and __generic_file_splice_read also shouldn't be
called under fs locks which would deadlock in the reclaim path. Anyway it
is better to obey mapping gfp mask and prevent from later breakage.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko
Cc: Dave Chinner
Cc: Neil Brown
Cc: Johannes Weiner
Cc: Al Viro
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Tetsuo Handa
Cc: Anton Altaparmakov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 May, 2015
1 commit
-
unix_stream_recvmsg is refactored to unix_stream_read_generic in this
patch and enhanced to deal with pipe splicing. The refactoring is
inneglible, we mostly have to deal with a non-existing struct msghdr
argument.Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
06 May, 2015
1 commit
-
Using sendfile with below small program to get MD5 sums of some files,
it appear that big files (over 64kbytes with 4k pages system) get a
wrong MD5 sum while small files get the correct sum.
This program uses sendfile() to send a file to an AF_ALG socket
for hashing./* md5sum2.c */
#include
#include
#include
#include
#include
#include
#include
#include
#includeint main(int argc, char **argv)
{
int sk = socket(AF_ALG, SOCK_SEQPACKET, 0);
struct stat st;
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "hash",
.salg_name = "md5",
};
int n;bind(sk, (struct sockaddr*)&sa, sizeof(sa));
for (n = 1; n < argc; n++) {
int size;
int offset = 0;
char buf[4096];
int fd;
int sko;
int i;fd = open(argv[n], O_RDONLY);
sko = accept(sk, NULL, 0);
fstat(fd, &st);
size = st.st_size;
sendfile(sko, fd, &offset, size);
size = read(sko, buf, sizeof(buf));
for (i = 0; i < size; i++)
printf("%2.2x", buf[i]);
printf(" %s\n", argv[n]);
close(fd);
close(sko);
}
exit(0);
}Test below is done using official linux patch files. First result is
with a software based md5sum. Second result is with the program above.root@vgoip:~# ls -l patch-3.6.*
-rw-r--r-- 1 root root 64011 Aug 24 12:01 patch-3.6.2.gz
-rw-r--r-- 1 root root 94131 Aug 24 12:01 patch-3.6.3.gzroot@vgoip:~# md5sum patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gzroot@vgoip:~# ./md5sum2 patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz
5fd77b24e68bb24dcc72d6e57c64790e patch-3.6.3.gzAfter investivation, it appears that sendfile() sends the files by blocks
of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each
block, the SPLICE_F_MORE flag is missing, therefore the hashing operation
is reset as if it was the end of the file.This patch adds SPLICE_F_MORE to the flags when more data is pending.
With the patch applied, we get the correct sums:
root@vgoip:~# md5sum patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gzroot@vgoip:~# ./md5sum2 patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gzSigned-off-by: Christophe Leroy
Signed-off-by: Jens Axboe
16 Apr, 2015
1 commit
-
The original dax patchset split the ext2/4_file_operations because of the
two NULL splice_read/splice_write in the dax case.In the vfs if splice_read/splice_write are NULL we then call
default_splice_read/write.What we do here is make generic_file_splice_read aware of IS_DAX() so the
original ext2/4_file_operations can be used as is.For write it appears that iter_file_splice_write is just fine. It uses
the regular f_op->write(file,..) or new_sync_write(file, ...).Signed-off-by: Boaz Harrosh
Reviewed-by: Jan Kara
Cc: Dave Chinner
Cc: Matthew Wilcox
Cc: Hugh Dickins
Cc: Mel Gorman
Cc: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Apr, 2015
1 commit
-
Signed-off-by: Al Viro
26 Mar, 2015
1 commit
-
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
29 Jan, 2015
2 commits
-
Simple helpers that pass an arbitrary iov_iter to filesystems.
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
similar to iov_iter_kvec(), for ITER_BVEC ones
Signed-off-by: Al Viro
24 Oct, 2014
1 commit
-
Export do_splice_direct() to modules. Needed by overlay filesystem.
Signed-off-by: Miklos Szeredi
12 Jun, 2014
4 commits
-
Backmerge of dcache.c changes from mainline. It's that, or complete
rebase...Conflicts:
fs/splice.cSigned-off-by: Al Viro
-
no callers left
Signed-off-by: Al Viro
-
ocfs2 was using a bunch of splice.c guts...
Signed-off-by: Al Viro
-
iter_file_splice_write() - a ->splice_write() instance that gathers the
pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
it to ->write_iter(). A bunch of simple cases coverted to that...[AV: fixed the braino spotted by Cyrill]
Signed-off-by: Al Viro
28 May, 2014
1 commit
-
Commit 6130f5315ee8 "switch vmsplice_to_user() to copy_page_to_iter()" in
v3.15-rc1 broke vmsplice(2).This patch fixes two bugs:
- count is not initialized to a proper value, which resulted in no data
being copied- if rw_copy_check_uvector() returns negative then the iov might be leaked.
Tested OK.
Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro
07 May, 2014
1 commit
-
For now, just use the same thing we pass to ->direct_IO() - it's all
iovec-based at the moment. Pass it explicitly to iov_iter_init() and
account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
uses.Signed-off-by: Al Viro