Eric Lee / smarc-fsl-linux-kernel

13 May, 2008

1 commit

78bb6cb9a fuse: add flag to turn on big writes ... Browse Code »

Prior to 2.6.26 fuse only supported single page write requests. In theory all
fuse filesystem should be able support bigger than 4k writes, as there's
nothing in the API to prevent it. Unfortunately there's a known case in
NTFS-3G where big writes cause filesystem corruption. There could also be
other filesystems, where the lack of testing with big write requests would
result in bugs.

To prevent such problems on a kernel upgrade, disable big writes by default,
but let filesystems set a flag to turn it on.

Signed-off-by: Miklos Szeredi
Cc: Szabolcs Szakacsits
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-05-13 23:02:26 +0800

01 May, 2008

1 commit

bd7309677 fuse: use clamp() rather than nested min/max ... Browse Code »

clamp() exists for this use.

Signed-off-by: Harvey Harrison
Cc: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Harvey Harrison
2008-05-01 23:04:02 +0800

30 Apr, 2008

9 commits

4dbf930ed fuse: fix sparse warnings ... Browse Code »

fs/fuse/dev.c:306:2: warning: context imbalance in 'wait_answer_interruptible' - unexpected unlock
fs/fuse/dev.c:361:2: warning: context imbalance in 'request_wait_answer' - unexpected unlock
fs/fuse/dev.c:1002:4: warning: context imbalance in 'end_io_requests' - unexpected unlock

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-04-30 23:29:51 +0800
5559b8f4d fuse: fix race in llseek ... Browse Code »

Fuse doesn't use i_mutex to protect setting i_size, and so
generic_file_llseek() can be racy: it doesn't use i_size_read().

So do a fuse specific llseek method, which does use i_size_read().

[akpm@linux-foundation.org: make `retval' loff_t]
Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-04-30 23:29:51 +0800
b48badf01 fuse: fix node ID type ... Browse Code »

Node ID is 64bit but it is passed as unsigned long to some functions. This
breakage wasn't noticed, because libfuse uses unsigned long too.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-04-30 23:29:51 +0800
e5d9a0df0 fuse: fix max i/o size calculation ... Browse Code »

Fix a bug that Werner Baumann reported: fuse can send a bigger write request
than the maximum specified. This only affected direct_io operation.

In addition set a sane minimum for the max_read and max_write tunables, so I/O
always makes some progress.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-04-30 23:29:51 +0800
5c5c5e51b fuse: update file size on short read ... Browse Code »

If the READ request returned a short count, then either

- cached size is incorrect
- filesystem is buggy, as short reads are only allowed on EOF

So assume that the size is wrong and refresh it, so that cached read() doesn't
zero fill the missing chunk.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-04-30 23:29:50 +0800
ea9b9907b fuse: implement perform_write ... Browse Code »

Introduce fuse_perform_write. With fusexmp (a passthrough filesystem), large
(1MB) writes into a backing tmpfs filesystem are sped up by almost 4 times
(256MB/s vs 71MB/s).

[mszeredi@suse.cz]:

- split into smaller functions
- testing
- duplicate generic_file_aio_write(), so that there's no need to add a
new ->perform_write() a_op. Comment from hch.

Signed-off-by: Nick Piggin
Signed-off-by: Miklos Szeredi
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2008-04-30 23:29:50 +0800
854512ec3 fuse: clean up setting i_size in write ... Browse Code »

Extract common code for setting i_size in write functions into a common
helper.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-04-30 23:29:50 +0800
3be5a52b3 fuse: support writable mmap ... Browse Code »

Quoting Linus (3 years ago, FUSE inclusion discussions):

"User-space filesystems are hard to get right. I'd claim that they
are almost impossible, unless you limit them somehow (shared
writable mappings are the nastiest part - if you don't have those,
you can reasonably limit your problems by limiting the number of
dirty pages you accept through normal "write()" calls)."

Instead of attempting the impossible, I've just waited for the dirty page
accounting infrastructure to materialize (thanks to Peter Zijlstra and
others). This nicely solved the biggest problem: limiting the number of pages
used for write caching.

Some small details remained, however, which this largish patch attempts to
address. It provides a page writeback implementation for fuse, which is
completely safe against VM related deadlocks. Performance may not be very
good for certain usage patterns, but generally it should be acceptable.

It has been tested extensively with fsx-linux and bash-shared-mapping.

Fuse page writeback design
--------------------------

fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM.
It copies the contents of the original page, and queues a WRITE request to the
userspace filesystem using this temp page.

The writeback is finished instantly from the MM's point of view: the page is
removed from the radix trees, and the PageDirty and PageWriteback flags are
cleared.

For the duration of the actual write, the NR_WRITEBACK_TEMP counter is
incremented. The per-bdi writeback count is not decremented until the actual
write completes.

On dirtying the page, fuse waits for a previous write to finish before
proceeding. This makes sure, there can only be one temporary page used at a
time for one cached page.

This approach is wasteful in both memory and CPU bandwidth, so why is this
complication needed?

The basic problem is that there can be no guarantee about the time in which
the userspace filesystem will complete a write. It may be buggy or even
malicious, and fail to complete WRITE requests. We don't want unrelated parts
of the system to grind to a halt in such cases.

Also a filesystem may need additional resources (particularly memory) to
complete a WRITE request. There's a great danger of a deadlock if that
allocation may wait for the writepage to finish.

Currently there are several cases where the kernel can block on page
writeback:

- allocation order is larger than PAGE_ALLOC_COSTLY_ORDER
- page migration
- throttle_vm_writeout (through NR_WRITEBACK)
- sync(2)

Of course in some cases (fsync, msync) we explicitly want to allow blocking.
So for these cases new code has to be added to fuse, since the VM is not
tracking writeback pages for us any more.

As an extra safetly measure, the maximum dirty ratio allocated to a single
fuse filesystem is set to 1% by default. This way one (or several) buggy or
malicious fuse filesystems cannot slow down the rest of the system by hogging
dirty memory.

With appropriate privileges, this limit can be raised through
'/sys/class/bdi//max_ratio'.

Signed-off-by: Miklos Szeredi
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-04-30 23:29:50 +0800
b6f2fcbcf mm: bdi: expose the BDI object in sysfs for FUSE ... Browse Code »

Register FUSE's backing_dev_info under sysfs with the name "fuse-MAJOR:MINOR"

Make the fuse control filesystem use s_dev instead of a fuse specific ID.
This makes it easier to match directories under /sys/fs/fuse/connections/ with
directories under /sys/class/bdi, and with actual mounts.

Signed-off-by: Miklos Szeredi
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-04-30 23:29:49 +0800

25 Apr, 2008

1 commit

42faad996 [PATCH] restore sane ->umount_begin() API ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-04-25 21:23:25 +0800

24 Feb, 2008

1 commit

1a823ac9f fuse: fix permission checking ... Browse Code »

I added a nasty local variable shadowing bug to fuse in 2.6.24, with the
result, that the 'default_permissions' mount option is basically ignored.

How did this happen?

- old err declaration in inner scope
- new err getting declared in outer scope
- 'return err' from inner scope getting removed
- old declaration not being noticed

-Wshadow would have saved us, but it doesn't seem practical for
the kernel :(

More testing would have also saved us :((

Signed-off-by: Miklos Szeredi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-02-24 09:12:13 +0800

09 Feb, 2008

1 commit

d1875dbaa mount options: fix fuse ... Browse Code »

Add blksize= option to /proc/mounts for fuseblk filesystems.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-02-09 01:22:40 +0800

08 Feb, 2008

2 commits

fa300b191 iget: stop FUSE from using iget() and read_inode() ... Browse Code »

Stop the FUSE filesystem from using read_inode(), which it doesn't use anyway.

Signed-off-by: David Howells
Cc: Miklos Szeredi
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2008-02-08 00:42:28 +0800
e231c2ee6 Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p) ... Browse Code »

Convert instances of ERR_PTR(PTR_ERR(p)) to ERR_CAST(p) using:

perl -spi -e 's/ERR_PTR[(]PTR_ERR[(](.*)[)][)]/ERR_CAST(\1)/' `grep -rl 'ERR_PTR[(]*PTR_ERR' fs crypto net security`

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2008-02-08 00:42:26 +0800

07 Feb, 2008

3 commits

d12def1bc fuse: limit queued background requests ... Browse Code »

Libfuse basically creates a new thread for each new request. This is fine for
synchronous requests, which are naturally limited. However background
requests (especially writepage) can cause a thread creation storm.

To avoid this, limit the number of background requests available to userspace.

This is done by introducing another queue for background requests, and a
counter for the number of "active" requests, which are currently available for
userspace.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-02-07 02:41:13 +0800
b57d42644 fuse: save space in struct fuse_req ... Browse Code »

Move the fields 'dentry' and 'vfsmount' into the request specific union, since
these are only used for the RELEASE request.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-02-07 02:41:13 +0800
0952b2a4a fuse: fix attribute caching after create ... Browse Code »

Invalidate attributes on create, since st_ctime is updated. Reported by
Szabolcs Szakacsits.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-02-07 02:41:13 +0800

25 Jan, 2008

4 commits

197b12d67 Kobject: convert fs/* from kobject_unregister() to kobject_put() ... Browse Code »

There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().

Cc: Kay Sievers
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2008-01-25 12:40:40 +0800
00d266662 kobject: convert main fs kobject to use kobject_create ... Browse Code »

This also renames fs_subsys to fs_kobj to catch all current users with a
build error instead of a build warning which can easily be missed.

Cc: Kay Sievers
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2008-01-25 12:40:13 +0800
5c89e17e9 kobject: convert fuse to use kobject_create ... Browse Code »

We don't need a kset here, a simple kobject will do just fine, so
dynamically create the kobject and use it.

Cc: Kay Sievers
Cc: Miklos Szeredi
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2008-01-25 12:40:11 +0800
3514faca1 kobject: remove struct kobj_type from struct kset ... Browse Code »

We don't need a "default" ktype for a kset. We should set this
explicitly every time for each kset. This change is needed so that we
can make ksets dynamic, and cleans up one of the odd, undocumented
assumption that the kset/kobject/ktype model has.

This patch is based on a lot of help from Kay Sievers.

Nasty bug in the block code was found by Dave Young

Cc: Kay Sievers
Cc: Dave Young
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2008-01-25 12:40:10 +0800

30 Nov, 2007

6 commits

08b633070 fuse: fix attribute caching after rename ... Browse Code »

Invalidate attributes on rename, since some filesystems may update
st_ctime. Reported by Szabolcs Szakacsits

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-11-30 01:24:54 +0800
fbee36b92 fuse: fix uninitialized field in fuse_inode ... Browse Code »

I found problems accessing (executing) previously existing files, until
I did chmod on them (or setattr).

If the fi->attr_version is not initialized, then it could be
larger than fc->attr_version until a setattr is executed, and as a
result the inode attributes would never be set.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

John Muir
2007-11-30 01:24:54 +0800
d0186b25e fuse: fix FUSE_FILE_OPS sending ... Browse Code »

FUSE_FILE_OPS is meant to signal that the kernel will send the open file to to
the userspace filesystem for operations on open files, so that sillyrenaming
unlinked files becomes unnecessary.

However this needs VFS changes, which won't make it into 2.6.24.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-11-30 01:24:54 +0800
a6643094e fuse: pass open flags to read and write ... Browse Code »

Some open flags (O_APPEND, O_DIRECT) can be changed with fcntl(F_SETFL, ...)
after open, but fuse currently only sends the flags to userspace in open.

To make it possible to correcly handle changing flags, send the
current value to userspace in each read and write.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-11-30 01:24:54 +0800
7dca9fd39 fuse: cleanup: add fuse_get_attr_version() ... Browse Code »

Extract repeated code into helper function, as suggested by Akpm.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-11-30 01:24:54 +0800
bcb4be809 fuse: fix reading past EOF ... Browse Code »

Currently reading a fuse file will stop at cached i_size and return
EOF, even though the file might have grown since the attributes were
last updated.

So detect if trying to read past EOF, and refresh the attributes
before continuing with the read.

Thanks to mpb for the report.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-11-30 01:24:54 +0800

15 Nov, 2007

1 commit

8744969a8 fuse_file_alloc(): fix NULL dereferences ... Browse Code »

Fix obvious NULL dereferences spotted by the Coverity checker.

Signed-off-by: Adrian Bunk
Acked-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2007-11-15 10:45:42 +0800

19 Oct, 2007

10 commits

0e9663ee4 fuse: add blksize field to fuse_attr ... Browse Code »

There are cases when the filesystem will be passed the buffer from a single
read or write call, namely:

1) in 'direct-io' mode (not O_DIRECT), read/write requests don't go
through the page cache, but go directly to the userspace fs

2) currently buffered writes are done with single page requests, but
if Nick's ->perform_write() patch goes it, it will be possible to
do larger write requests. But only if the original write() was
also bigger than a page.

In these cases the filesystem might want to give a hint to the app
about the optimal I/O size.

Allow the userspace filesystem to supply a blksize value to be returned by
stat() and friends. If the field is zero, it defaults to the old
PAGE_CACHE_SIZE value.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-10-19 05:37:31 +0800
f33321141 fuse: add support for mandatory locking ... Browse Code »

For mandatory locking the userspace filesystem needs to know the lock
ownership for read, write and truncate operations.

This patch adds the necessary fields to the protocol.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-10-19 05:37:31 +0800
b25e82e56 fuse: add helper for asynchronous writes ... Browse Code »

This patch adds a new helper function fuse_write_fill() which makes it
possible to send WRITE requests asynchronously.

A new flag for WRITE requests is also added which indicates that this a write
from the page cache, and not a "normal" file write.

This patch is in preparation for writable mmap support.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-10-19 05:37:31 +0800
93a8c3cd9 fuse: add list of writable files to fuse_inode ... Browse Code »

Each WRITE request must carry a valid file descriptor. When a page is written
back from a memory mapping, the file through which the page was dirtied is not
available, so a new mechananism is needed to find a suitable file in
->writepage(s).

A list of fuse_files is added to fuse_inode. The file is removed from the
list in fuse_release().

This patch is in preparation for writable mmap support.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-10-19 05:37:31 +0800
a9ff4f870 fuse: support BSD locking semantics ... Browse Code »

It is trivial to add support for flock(2) semantics to the existing protocol,
by setting the lock owner field to the file pointer, and passing a new
FUSE_LK_FLOCK flag with the locking request.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-10-19 05:37:31 +0800
6ff958edb fuse: add atomic open+truncate support ... Browse Code »

This patch allows fuse filesystems to implement open(..., O_TRUNC) as a single
request, instead of separate truncate and open requests.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-10-19 05:37:31 +0800
17637cbab fuse: improve utimes support ... Browse Code »

Add two new flags for setattr: FATTR_ATIME_NOW and FATTR_MTIME_NOW. These
mean, that atime or mtime should be changed to the current time.

Also it is now possible to update atime or mtime individually, not just
together.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-10-19 05:37:30 +0800
49d4914fd fuse: clean up open file passing in setattr ... Browse Code »

Clean up supplying open file to the setattr operation. In addition to being a
cleanup it prepares for the changes in the way the open file is passed to the
setattr method.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-10-19 05:37:30 +0800
c79e322f6 fuse: add file handle to getattr operation ... Browse Code »

Add necessary protocol changes for supplying a file handle with the getattr
operation. Step the API version to 7.9.

This patch doesn't actually supply the file handle, because that needs some
kind of VFS support, which we haven't yet been able to agree upon.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-10-19 05:37:30 +0800
1fb69e781 fuse: fix race between getattr and write ... Browse Code »

Getattr and lookup operations can be running in parallel to attribute changing
operations, such as write and setattr.

This means, that if for example getattr was slower than a write, the cached
size attribute could be set to a stale value.

To prevent this race, introduce a per-filesystem attribute version counter.
This counter is incremented whenever cached attributes are modified, and the
incremented value stored in the inode.

Before storing new attributes in the cache, getattr and lookup check, using
the version number, whether the attributes have been modified during the
request's lifetime. If so, the returned attributes are not cached, because
they might be stale.

Thanks to Jakub Bogusz for the bug report and test program.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Miklos Szeredi
Cc: Jakub Bogusz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-10-19 05:37:30 +0800