25 Aug, 2011
1 commit
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message
fuse: mark pages accessed when written to
fuse: delete dead .write_begin and .write_end aops
fuse: fix flock
fuse: fix non-ANSI void function notation
08 Aug, 2011
1 commit
-
Commit a9ff4f87 "fuse: support BSD locking semantics" overlooked a
number of issues with supporing flock locks over existing POSIX
locking infrastructure:- it's not backward compatible, passing flock(2) calls to userspace
unconditionally (if userspace sets FUSE_POSIX_LOCKS)- it doesn't cater for the fact that flock locks are automatically
unlocked on file release- it doesn't take into account the fact that flock exclusive locks
(write locks) don't need an fd opened for write.The last one invalidates the original premise of the patch that flock
locks can be emulated with POSIX locks.This patch fixes the first two issues. The last one needs to be fixed
in userspace if the filesystem assumed that a write lock will happen
only on a file operned for write (as in the case of the current fuse
library).Reported-by: Sebastian Pipping
Signed-off-by: Miklos Szeredi
21 Jul, 2011
1 commit
-
Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,Acked-by: Jan Kara
Signed-off-by: Josef Bacik
Signed-off-by: Al Viro
21 Mar, 2011
1 commit
-
Reduce the size of struct fuse_request by removing cuse_init_out from
the request structure and allocating it dinamically instead.CC: Tejun Heo
Signed-off-by: Miklos Szeredi
25 Feb, 2011
1 commit
-
Single threaded NTFS-3G could get stuck if a delayed RELEASE reply
triggered a DESTROY request via path_put().Fix this by
a) making RELEASE requests synchronous, whenever possible, on fuseblk
filesystemsb) if not possible (triggered by an asynchronous read/write) then do
the path_put() in a separate thread with schedule_work().Reported-by: Oliver Neukum
Cc: stable@kernel.org
Signed-off-by: Miklos Szeredi
08 Dec, 2010
2 commits
-
Terje Malmedal reports that a fuse filesystem with 32 million inodes
on a machine with lots of memory can take up to 30 minutes to process
FORGET requests when all those inodes are evicted from the icache.To solve this, create a BATCH_FORGET request that allows up to about
8000 FORGET requests to be sent in a single message.This request is only sent if userspace supports interface version 7.16
or later, otherwise fall back to sending individual FORGET messages.Reported-by: Terje Malmedal
Signed-off-by: Miklos Szeredi -
Terje Malmedal reports that a fuse filesystem with 32 million inodes
on a machine with lots of memory can go unresponsive for up to 30
minutes when all those inodes are evicted from the icache.The reason is that FORGET messages, sent when the inode is evicted,
are queued up together with regular filesystem requests, and while the
huge queue of FORGET messages are processed no other filesystem
operation can proceed.Since a full fuse request structure is allocated for each inode, these
take up quite a bit of memory as well.To solve these issues, create a slim 'fuse_forget_link' structure
containing just the minimum of information required to send the FORGET
request and chain these on a separate queue.When userspace is asking for a request make sure that FORGET and
non-FORGET requests are selected fairly: for each 8 non-FORGET allow
16 FORGET requests. This will make sure FORGETs do not pile up, yet
other requests are also allowed to proceed while the queued FORGETs
are processed.Reported-by: Terje Malmedal
Signed-off-by: Miklos Szeredi
12 Jul, 2010
2 commits
-
Userspace filesystem can request data to be retrieved from the inode's
mapping. This request is synchronous and the retrieved data is queued
as a new request. If the write to the fuse device returns an error
then the retrieve request was not completed and a reply will not be
sent.Only present pages are returned in the retrieve reply. Retrieving
stops when it finds a non-present page and only data prior to that is
returned.This request doesn't change the dirty state of pages.
Signed-off-by: Miklos Szeredi
-
Userspace filesystem can request data to be stored in the inode's
mapping. This request is synchronous and has no reply. If the write
to the fuse device returns an error then the store request was not
fully completed (but may have updated some pages).If the stored data overflows the current file size, then the size is
extended, similarly to a write(2) on the filesystem.Pages which have been completely stored are marked uptodate.
Signed-off-by: Miklos Szeredi
31 May, 2010
1 commit
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
mm: export generic_pipe_buf_*() to modules
fuse: support splice() reading from fuse device
fuse: allow splice to move pages
mm: export remove_from_page_cache() to modules
mm: export lru_cache_add_*() to modules
fuse: support splice() writing to fuse device
fuse: get page reference for readpages
fuse: use get_user_pages_fast()
fuse: remove unneeded variable
28 May, 2010
1 commit
-
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
25 May, 2010
1 commit
-
When splicing buffers to the fuse device with SPLICE_F_MOVE, try to
move pages from the pipe buffer into the page cache. This allows
populating the fuse filesystem's cache without ever touching the page
contents, i.e. zero copy read capability.The following steps are performed when trying to move a page into the
page cache:- buf->ops->confirm() to make sure the new page is uptodate
- buf->ops->steal() to try to remove the new page from it's previous place
- remove_from_page_cache() on the old page
- add_to_page_cache_locked() on the new pageIf any of the above steps fail (non fatally) then the code falls back
to copying the page. In particular ->steal() will fail if there are
external references (other than the page cache and the pipe buffer) to
the page.Also since the remove_from_page_cache() + add_to_page_cache_locked()
are non-atomic it is possible that the page cache is repopulated in
between the two and add_to_page_cache_locked() will fail. This could
be fixed by creating a new atomic replace_page_cache_page() function.fuse_readpages_end() needed to be reworked so it works even if
page->mapping is NULL for some or all pages which can happen if the
add_to_page_cache_locked() failed.A number of sanity checks were added to make sure the stolen pages
don't have weird flags set, etc... These could be moved into generic
splice/steal code.Signed-off-by: Miklos Szeredi
24 Sep, 2009
1 commit
-
Update some fs code to make use of new helper functions introduced
in the previous patch. Should be no significant change in behaviour
(except CIFS now calls send_sig under i_lock, via inode_newsize_ok).Reviewed-by: Christoph Hellwig
Acked-by: Miklos Szeredi
Cc: linux-nfs@vger.kernel.org
Cc: Trond.Myklebust@netapp.com
Cc: linux-cifs-client@lists.samba.org
Cc: sfrench@samba.org
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro
16 Sep, 2009
1 commit
-
Make the max_background and congestion_threshold parameters of a FUSE
mount tunable at runtime by adding the respective knobs to its directory
within the fusectl filesystem.Signed-off-by: Csaba Henk
Signed-off-by: Miklos Szeredi
07 Jul, 2009
1 commit
-
The practical values for these limits depend on the design of the
filesystem server so let userspace set them at initialization time.Signed-off-by: Csaba Henk
Signed-off-by: Miklos Szeredi
01 Jul, 2009
2 commits
-
Add notification messages that allow the filesystem to invalidate VFS
caches.Two notifications are added:
1) inode invalidation
- invalidate cached attributes
- invalidate a range of pages in the page cache (this is optional)2) dentry invalidation
- try to invalidate a subtree in the dentry cache
Care must be taken while accessing the 'struct super_block' for the
mount, as it can go away while an invalidation is in progress. To
prevent this, introduce a rw-semaphore, that is taken for read during
the invalidation and taken for write in the ->kill_sb callback.Cc: Csaba Henk
Cc: Anand Avati
Signed-off-by: Miklos Szeredi -
This patch lets filesystems handle masking the file mode on creation.
This is needed if filesystem is using ACLs.- The CREATE, MKDIR and MKNOD requests are extended with a "umask"
parameter.- A new FUSE_DONT_MASK flag is added to the INIT request/reply. With
this the filesystem may request that the create mode is not masked.CC: Jean-Pierre André
Signed-off-by: Miklos Szeredi
28 Apr, 2009
8 commits
-
Export the following symbols for CUSE.
fuse_conn_put()
fuse_conn_get()
fuse_conn_kill()
fuse_send_init()
fuse_do_open()
fuse_sync_release()
fuse_direct_io()
fuse_do_ioctl()
fuse_file_poll()
fuse_request_alloc()
fuse_get_req()
fuse_put_request()
fuse_request_send()
fuse_abort_conn()
fuse_dev_release()
fuse_dev_operationsSigned-off-by: Tejun Heo
Signed-off-by: Miklos Szeredi -
Update fuse_conn_init() such that it doesn't take @sb and move bdi
registration into a separate function. Also separate out
fuse_conn_kill() from fuse_put_super().These will be used to implement cuse.
Signed-off-by: Tejun Heo
Signed-off-by: Miklos Szeredi -
Make fuse_sync_release() a generic helper function that doesn't need a
struct inode pointer. This makes it suitable for use by CUSE.Change return value of fuse_release_common() from int to void.
Signed-off-by: Miklos Szeredi
-
Create a helper for sending an OPEN request that doesn't need a struct
inode pointer.Signed-off-by: Miklos Szeredi
-
Move setting ff->fh, ff->nodeid and file->private_data outside
fuse_finish_open(). Add ->open_flags member to struct fuse_file.This simplifies the argument passing to fuse_finish_open() and
fuse_release_fill(), and paves the way for creating an open helper
that doesn't need an inode pointer.Signed-off-by: Miklos Szeredi
-
Use ff->fc and ff->nodeid instead of passing down the inode.
This prepares this function for use by CUSE, where the inode is not
owned by a fuse filesystem.Signed-off-by: Miklos Szeredi
-
Add new members ->fc and ->nodeid to struct fuse_file. This will aid
in converting functions for use by CUSE, where the inode is not owned
by a fuse filesystem.Signed-off-by: Miklos Szeredi
-
Use struct path instead of separate dentry and vfsmount in
req->misc.release.Signed-off-by: Miklos Szeredi
28 Mar, 2009
1 commit
-
Signed-off-by: Al Viro
26 Nov, 2008
6 commits
-
Add fuse_conn->release() so that fuse_conn can be embedded in other
structures.Signed-off-by: Tejun Heo
Signed-off-by: Miklos Szeredi -
Separate out fuse_conn_init() from new_conn() and while at it
initialize fuse_conn->entry during conn initialization.This will be used by CUSE.
Signed-off-by: Tejun Heo
Signed-off-by: Miklos Szeredi -
Add fuse_ prefix to request_send*() and get_root_inode() as some of
those functions will be exported for CUSE. With or without CUSE
export, having the function names scoped is a good idea for
debuggability.Signed-off-by: Tejun Heo
Signed-off-by: Miklos Szeredi -
Implement poll support. Polled files are indexed using kh in a RB
tree rooted at fuse_conn->polled_files.Client should send FUSE_NOTIFY_POLL notification once after processing
FUSE_POLL which has FUSE_POLL_SCHEDULE_NOTIFY set. Sending
notification unconditionally after the latest poll or everytime file
content might have changed is inefficient but won't cause malfunction.fuse_file_poll() can sleep and requires patches from the following
thread which allows f_op->poll() to sleep.http://thread.gmane.org/gmane.linux.kernel/726176
Signed-off-by: Tejun Heo
Signed-off-by: Miklos Szeredi -
The file handle, fuse_file->fh, is opaque value supplied by userland
FUSE server and uniqueness is not guaranteed. Add file kernel handle,
fuse_file->kh, which is allocated by the kernel on file allocation and
guaranteed to be unique.This will be used by poll to match notification to the respective file
but can be used for other purposes where unique file handle is
necessary.Signed-off-by: Tejun Heo
Signed-off-by: Miklos Szeredi -
Fix coding style errors reported by checkpatch and others. Uptdate
copyright date to 2008.Signed-off-by: Miklos Szeredi
16 Oct, 2008
1 commit
-
Add include protectors to include/linux/fuse.h and fs/fuse/fuse_i.h.
Signed-off-by: Tejun Heo
Signed-off-by: Miklos Szeredi
26 Jul, 2008
2 commits
-
Implement the get_parent export operation by sending a LOOKUP request with
".." as the name.Implement looking up an inode by node ID after it has been evicted from
the cache. This is done by seding a LOOKUP request with "." as the name
(for all file types, not just directories).The filesystem can set the FUSE_EXPORT_SUPPORT flag in the INIT reply, to
indicate that it supports these special lookups.Thanks to John Muir for the original implementation of this feature.
Signed-off-by: Miklos Szeredi
Cc: "J. Bruce Fields"
Cc: Trond Myklebust
Cc: Matthew Wilcox
Cc: David Teigland
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Implement export_operations, to allow fuse filesystems to be exported to
NFS. This feature has been in the out-of-tree fuse module, and is widely
used and tested.It has not been originally merged into mainline, because doing the NFS
export in userspace was thought to be a cleaner and more efficient way of
doing it, than through the kernel.While that is true, it would also have involved a lot of duplicated effort
at reimplementing NFS exporting (all the different versions of the
protocol). This effort was unfortunately not undertaken by anyone, so we
are left with doing it the easy but less efficient way.If this feature goes in, the out-of-tree fuse module can go away,
which would have several advantages:- not having to maintain two versions
- less confusion for users
- no bugs due to kernel API changesComment from hch:
- Use the same fh_type values as XFS, since we use the same fh encoding.Signed-off-by: Miklos Szeredi
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 May, 2008
1 commit
-
Prior to 2.6.26 fuse only supported single page write requests. In theory all
fuse filesystem should be able support bigger than 4k writes, as there's
nothing in the API to prevent it. Unfortunately there's a known case in
NTFS-3G where big writes cause filesystem corruption. There could also be
other filesystems, where the lack of testing with big write requests would
result in bugs.To prevent such problems on a kernel upgrade, disable big writes by default,
but let filesystems set a flag to turn it on.Signed-off-by: Miklos Szeredi
Cc: Szabolcs Szakacsits
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 Apr, 2008
4 commits
-
Node ID is 64bit but it is passed as unsigned long to some functions. This
breakage wasn't noticed, because libfuse uses unsigned long too.Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If the READ request returned a short count, then either
- cached size is incorrect
- filesystem is buggy, as short reads are only allowed on EOFSo assume that the size is wrong and refresh it, so that cached read() doesn't
zero fill the missing chunk.Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Quoting Linus (3 years ago, FUSE inclusion discussions):
"User-space filesystems are hard to get right. I'd claim that they
are almost impossible, unless you limit them somehow (shared
writable mappings are the nastiest part - if you don't have those,
you can reasonably limit your problems by limiting the number of
dirty pages you accept through normal "write()" calls)."Instead of attempting the impossible, I've just waited for the dirty page
accounting infrastructure to materialize (thanks to Peter Zijlstra and
others). This nicely solved the biggest problem: limiting the number of pages
used for write caching.Some small details remained, however, which this largish patch attempts to
address. It provides a page writeback implementation for fuse, which is
completely safe against VM related deadlocks. Performance may not be very
good for certain usage patterns, but generally it should be acceptable.It has been tested extensively with fsx-linux and bash-shared-mapping.
Fuse page writeback design
--------------------------fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM.
It copies the contents of the original page, and queues a WRITE request to the
userspace filesystem using this temp page.The writeback is finished instantly from the MM's point of view: the page is
removed from the radix trees, and the PageDirty and PageWriteback flags are
cleared.For the duration of the actual write, the NR_WRITEBACK_TEMP counter is
incremented. The per-bdi writeback count is not decremented until the actual
write completes.On dirtying the page, fuse waits for a previous write to finish before
proceeding. This makes sure, there can only be one temporary page used at a
time for one cached page.This approach is wasteful in both memory and CPU bandwidth, so why is this
complication needed?The basic problem is that there can be no guarantee about the time in which
the userspace filesystem will complete a write. It may be buggy or even
malicious, and fail to complete WRITE requests. We don't want unrelated parts
of the system to grind to a halt in such cases.Also a filesystem may need additional resources (particularly memory) to
complete a WRITE request. There's a great danger of a deadlock if that
allocation may wait for the writepage to finish.Currently there are several cases where the kernel can block on page
writeback:- allocation order is larger than PAGE_ALLOC_COSTLY_ORDER
- page migration
- throttle_vm_writeout (through NR_WRITEBACK)
- sync(2)Of course in some cases (fsync, msync) we explicitly want to allow blocking.
So for these cases new code has to be added to fuse, since the VM is not
tracking writeback pages for us any more.As an extra safetly measure, the maximum dirty ratio allocated to a single
fuse filesystem is set to 1% by default. This way one (or several) buggy or
malicious fuse filesystems cannot slow down the rest of the system by hogging
dirty memory.With appropriate privileges, this limit can be raised through
'/sys/class/bdi//max_ratio'.Signed-off-by: Miklos Szeredi
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Register FUSE's backing_dev_info under sysfs with the name "fuse-MAJOR:MINOR"
Make the fuse control filesystem use s_dev instead of a fuse specific ID.
This makes it easier to match directories under /sys/fs/fuse/connections/ with
directories under /sys/class/bdi, and with actual mounts.Signed-off-by: Miklos Szeredi
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds