16 Apr, 2015
1 commit
-
Signed-off-by: David Howells
Signed-off-by: Al Viro
12 Apr, 2015
1 commit
-
All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.Signed-off-by: Al Viro
26 Mar, 2015
1 commit
-
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
07 May, 2014
3 commits
-
parallel to copy_page_to_iter(). pipe_write() switched to it (and became
->write_iter()).Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
-
For now, just use the same thing we pass to ->direct_IO() - it's all
iovec-based at the moment. Pass it explicitly to iov_iter_init() and
account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
uses.Signed-off-by: Al Viro
02 Apr, 2014
2 commits
-
Signed-off-by: Al Viro
-
all pipe_buffer_operations have the same instances of those...
Signed-off-by: Al Viro
24 Jan, 2014
1 commit
-
Pipe has no data associated with fs so it is not good idea to block
pipe_write() if FS is frozen, but we can not update file's time on such
filesystem. Let's use same idea as we use in touch_time().Addresses https://bugzilla.kernel.org/show_bug.cgi?id=65701
Signed-off-by: Dmitry Monakhov
Reviewed-by: Jan Kara
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Dec, 2013
1 commit
-
The pipe code was trying (and failing) to be very careful about freeing
the pipe info only after the last access, with a pattern like:spin_lock(&inode->i_lock);
if (!--pipe->files) {
inode->i_pipe = NULL;
kill = 1;
}
spin_unlock(&inode->i_lock);
__pipe_unlock(pipe);
if (kill)
free_pipe_info(pipe);where the final freeing is done last.
HOWEVER. The above is actually broken, because while the freeing is
done at the end, if we have two racing processes releasing the pipe
inode info, the one that *doesn't* free it will decrement the ->files
count, and unlock the inode i_lock, but then still use the
"pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".This is *very* hard to trigger in practice, since the race window is
very small, and adding debug options seems to just hide it by slowing
things down.Simon originally reported this way back in July as an Oops in
kmem_cache_allocate due to a single bit corruption (due to the final
"spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
allocation that had re-used the free'd pipe-info), it's taken this long
to figure out.Since the 'pipe->files' accesses aren't even protected by the pipe lock
(we very much use the inode lock for that), the simple solution is to
just drop the pipe lock early. And since there were two users of this
pattern, create a helper function for it.Introduced commit ba5bb147330a ("pipe: take allocation and freeing of
pipe_inode_info out of ->i_mutex").Reported-by: Simon Kirby
Reported-by: Ian Applegate
Acked-by: Al Viro
Cc: stable@kernel.org # v3.10+
Signed-off-by: Linus Torvalds
08 May, 2013
1 commit
-
Faster kernel compiles by way of fewer unnecessary includes.
[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Apr, 2013
11 commits
-
and rename __free_pipe_info() to free_pipe_info()
Signed-off-by: Al Viro
-
not used anymore
Signed-off-by: Al Viro
-
it's used only as a flag to distinguish normal pipes/FIFOs from the
internal per-task one used by file-to-file splice. And pipe->files
would work just as well for that purpose...Signed-off-by: Al Viro
-
fs/pipe.c file_operations methods *know* that pipe is not an internal one;
no need to check pipe->inode for those callers.Signed-off-by: Al Viro
-
simplify get_pipe_info(), while we are at it
Signed-off-by: Al Viro
-
now it can be done - put mutex into pipe_inode_info, use it instead
of ->i_mutexSigned-off-by: Al Viro
-
* new field - pipe->files; number of struct file over that pipe (all
sharing the same inode, of course); protected by inode->i_lock.
* pipe_release() decrements pipe->files, clears inode->i_pipe when
if the counter has reached 0 (all under ->i_lock) and, in that case,
frees pipe after having done pipe_unlock()
* fifo_open() starts with grabbing ->i_lock, and either bumps pipe->files
if ->i_pipe was non-NULL or allocates a new pipe (dropping and regaining
->i_lock) and rechecks ->i_pipe; if it's still NULL, inserts new pipe
there, otherwise bumps ->i_pipe->files and frees the one we'd allocated.
At that point we know that ->i_pipe is non-NULL and won't go away, so
we can do pipe_lock() on it and proceed as we used to. If we end up
failing, decrement pipe->files and if it reaches 0 clear ->i_pipe and
free the sucker after pipe_unlock().Signed-off-by: Al Viro
-
* use the fact that file_inode(file)->i_pipe doesn't change
while the file is opened - no locks needed to access that.
* switch to pipe_lock/pipe_unlock where it's easy to doSigned-off-by: Al Viro
-
Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
12 Mar, 2013
1 commit
-
If you open a pipe for neither read nor write, the pipe code will not
add any usage counters to the pipe, causing the 'struct pipe_inode_info"
to be potentially released early.That doesn't normally matter, since you cannot actually use the pipe,
but the pipe release code - particularly fasync handling - still expects
the actual pipe infrastructure to all be there. And rather than adding
NULL pointer checks, let's just disallow this case, the same way we
already do for the named pipe ("fifo") case.This is ancient going back to pre-2.4 days, and until trinity, nobody
naver noticed.Reported-by: Dave Jones
Signed-off-by: Linus Torvalds
23 Feb, 2013
2 commits
-
Allocating a file structure in function get_empty_filp() might fail because
of several reasons:
- not enough memory for file structures
- operation is not allowed
- user is over its limitCurrently the function returns NULL in all cases and we loose the exact
reason of the error. All callers of get_empty_filp() assume that the function
can fail with ENFILE only.Return error through pointer. Change all callers to preserve this error code.
[AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
(things remaining here deal with alloc_file()), removed pipe(2) behaviour change]Signed-off-by: Anatol Pomozov
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Al Viro -
Signed-off-by: Al Viro
27 Sep, 2012
1 commit
-
don't mess with sys_close() if copy_to_user() fails; just postpone
fd_install() until we know it hasn't.Signed-off-by: Al Viro
02 Aug, 2012
1 commit
-
Pull second vfs pile from Al Viro:
"The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
deadlock reproduced by xfstests 068), symlink and hardlink restriction
patches, plus assorted cleanups and fixes.Note that another fsfreeze deadlock (emergency thaw one) is *not*
dealt with - the series by Fernando conflicts a lot with Jan's, breaks
userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
for massive vfsmount leak; this is going to be handled next cycle.
There probably will be another pull request, but that stuff won't be
in it."Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
delousing target_core_file a bit
Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
fs: Remove old freezing mechanism
ext2: Implement freezing
btrfs: Convert to new freezing mechanism
nilfs2: Convert to new freezing mechanism
ntfs: Convert to new freezing mechanism
fuse: Convert to new freezing mechanism
gfs2: Convert to new freezing mechanism
ocfs2: Convert to new freezing mechanism
xfs: Convert to new freezing code
ext4: Convert to new freezing mechanism
fs: Protect write paths by sb_start_write - sb_end_write
fs: Skip atime update on frozen filesystem
fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
fs: Improve filesystem freezing handling
switch the protection of percpu_counter list to spinlock
nfsd: Push mnt_want_write() outside of i_mutex
btrfs: Push mnt_want_write() outside of i_mutex
fat: Push mnt_want_write() outside of i_mutex
...
30 Jul, 2012
1 commit
-
Signed-off-by: Al Viro
24 Jul, 2012
1 commit
-
Signed-off-by: Cong Wang
02 Jun, 2012
1 commit
-
Btrfs has to make sure we have space to allocate new blocks in order to modify
the inode, so updating time can fail. We've gotten around this by having our
own file_update_time but this is kind of a pain, and Christoph has indicated he
would like to make xfs do something different with atime updates. So introduce
->update_time, where we will deal with i_version an a/m/c time updates and
indicate which changes need to be made. The normal version just does what it
has always done, updates the time and marks the inode dirty, and then
filesystems can choose to do something different.I've gone through all of the users of file_update_time and made them check for
errors with the exception of the fault code since it's complicated and I wasn't
quite sure what to do there, also Jan is going to be pushing the file time
updates into page_mkwrite for those who have it so that should satisfy btrfs and
make it not a big deal to check the file_update_time() return code in the
generic fault path. Thanks,Signed-off-by: Josef Bacik
31 May, 2012
1 commit
-
As described in commit 07d106d0a ("vfs: fix up ENOIOCTLCMD error
handling"), drivers should return -ENOIOCTLCMD if they receive an ioctl
command which they don't understand. Doing so will result in -ENOTTY
being returned to userspace, which matches the behaviour of the compat
layer if it fails to translate an ioctl command.This patch fixes the pipe ioctl to return -ENOIOCTLCMD instead of
-EINVAL when passed an unknown ioctl command.Cc: Al Viro
Cc: Andrew Morton
Signed-off-by: Will Deacon
Signed-off-by: Al Viro
30 Apr, 2012
1 commit
-
The actual internal pipe implementation is already really about
individual packets (called "pipe buffers"), and this simply exposes that
as a special packetized mode.When we are in the packetized mode (marked by O_DIRECT as suggested by
Alan Cox), a write() on a pipe will not merge the new data with previous
writes, so each write will get a pipe buffer of its own. The pipe
buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn
will tell the reader side to break the read at that boundary (and throw
away any partial packet contents that do not fit in the read buffer).End result: as long as you do writes less than PIPE_BUF in size (so that
the pipe doesn't have to split them up), you can now treat the pipe as a
packet interface, where each read() system call will read one packet at
a time. You can just use a sufficiently big read buffer (PIPE_BUF is
sufficient, since bigger than that doesn't guarantee atomicity anyway),
and the return value of the read() will naturally give you the size of
the packet.NOTE! We do not support zero-sized packets, and zero-sized reads and
writes to a pipe continue to be no-ops. Also note that big packets will
currently be split at write time, but that the size at which that
happens is not really specified (except that it's bigger than PIPE_BUF).
Currently that limit is the system page size, but we might want to
explicitly support bigger packets some day.The main user for this is going to be the autofs packet interface,
allowing us to stop having to care so deeply about exact packet sizes
(which have had bugs with 32/64-bit compatibility modes). But user
space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will
fail with an EINVAL on kernels that do not support this interface.Tested-by: Michael Tokarev
Cc: Alan Cox
Cc: David Miller
Cc: Ian Kent
Cc: Thomas Meyer
Cc: stable@kernel.org # needed for systemd/autofs interaction fix
Signed-off-by: Linus Torvalds
24 Mar, 2012
1 commit
-
- Move open-coded filesystem magic numbers into magic.h
- Rearrange magic.h so that the filesystem-related constants are grouped
together.Signed-off-by: Muthukumar R
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Mar, 2012
1 commit
-
Acked-by: Benjamin LaHaise
Signed-off-by: Cong Wang
13 Jan, 2012
1 commit
-
When a user with the CAP_SYS_RESOURCE cap tries to F_SETPIPE_SZ a pipe
with size bigger than kmalloc() can alloc it spits out an ugly warning:------------[ cut here ]------------
WARNING: at mm/page_alloc.c:2095 __alloc_pages_nodemask+0x5d3/0x7a0()
Pid: 733, comm: a.out Not tainted 3.2.0-rc1+ #4
Call Trace:
warn_slowpath_common+0x75/0xb0
warn_slowpath_null+0x15/0x20
__alloc_pages_nodemask+0x5d3/0x7a0
__get_free_pages+0x12/0x50
__kmalloc+0x12b/0x150
pipe_set_size+0x75/0x120
pipe_fcntl+0xf8/0x140
do_fcntl+0x2d4/0x410
sys_fcntl+0x66/0xa0
system_call_fastpath+0x16/0x1b
---[ end trace 432f702e6db7b5ee ]---Instead, make kcalloc() handle the overflow case and fail quietly.
[akpm@linux-foundation.org: switch to sizeof(*bufs) for 80-column niceness]
Signed-off-by: Sasha Levin
Cc: Alexander Viro
Acked-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Jan, 2012
1 commit
-
... so no exitcalls there. Not much would work if pipe(2) would stop
working, after all...Signed-off-by: Al Viro
01 Nov, 2011
1 commit
-
Currently a statfs on a pipe's /proc//fd/ link returns -ENOSYS. Wire
pipfs up so that the statfs succeeds.This is required by checkpoint-restart in the userspace to make it
possible to distinguish pipes from fifos.When we dump information about task's open files we use the /proc/pid/fd
directoy's symlinks and the fact that opening any of them gives us exactly
the same dentry->inode pair as the original process has. Now if a task
we're dumping has opened pipe and fifo we need to detect this and act
accordingly. Knowing that an fd with type S_ISFIFO resides on a pipefs is
the most precise way.Signed-off-by: Pavel Emelyanov
Reviewed-by: Tejun Heo
Acked-by: Serge Hallyn
Signed-off-by: Cyrill Gorcunov
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jul, 2011
1 commit
-
Workloads using pipes and sockets hit inode_sb_list_lock contention.
superblock s_inodes list is needed for quota, dirty, pagecache and
fsnotify management. pipe/anon/socket fs are clearly not candidates for
these.Signed-off-by: Eric Dumazet
Reviewed-by: Christoph Hellwig
Signed-off-by: Al Viro
24 Jul, 2011
1 commit
-
For a number of file systems that don't have a mount point (e.g. sockfs
and pipefs), they are not marked as long term. Therefore in
mntput_no_expire, all locks in vfs_mount lock are taken instead of just
local cpu's lock to aggregate reference counts when we release
reference to file objects. In fact, only local lock need to have been
taken to update ref counts as these file systems are in no danger of
going away until we are ready to unregister them.The attached patch marks file systems using kern_mount without
mount point as long term. The contentions of vfs_mount lock
is now eliminated. Before un-registering such file system,
kern_unmount should be called to remove the long term flag and
make the mount point ready to be freed.Signed-off-by: Tim Chen
Signed-off-by: Al Viro
21 Jan, 2011
1 commit
-
Commit e462c448fdc8 ("pipe: use event aware wakeups") optimized the pipe
event wakeup calls to avoid wakeups if the events do not match the
requested set.However, the optimization was buggy, in that it didn't actually use the
correct sets for the events: when we make room for more data to be
written, the pipe poll() routine will return both the POLLOUT _and_
POLLWRNORM bits. Similarly for read.And most critically, when a pipe is released, that will potentially
result in POLLHUP|POLLERR (depending on whether it was the last reader
or writer), not just the regular POLLIN|POLLOUT.This bug showed itself as a hung gnome-screensaver-dialog process, stuck
forever (or at least until it was poked by a signal or by being traced)
in a poll() system call.Cc: Davide Libenzi
Cc: David S. Miller
Cc: Eric Dumazet
Cc: Jens Axboe
Cc: Andrew Morton
Signed-off-by: Linus Torvalds