26 Jan, 2009

2 commits

  • Move fuse_copy_finish() to before calling fuse_notify_poll_wakeup().
    This is not a big issue because fuse_notify_poll_wakeup() should be
    atomic, but it's cleaner this way, and later uses of notification will
    need to be able to finish the copying before performing some actions.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • If a fuse filesystem is unmounted but the device file descriptor
    remains open and a new mount reuses the old device number, then the
    mount fails with EEXIST and the following warning is printed in the
    kernel log:

    WARNING: at fs/sysfs/dir.c:462 sysfs_add_one+0x35/0x3d()
    sysfs: duplicate filename '0:15' can not be created

    The cause is that the bdi belonging to the fuse filesystem was
    destoryed only after the device file was released. Fix this by
    calling bdi_destroy() from fuse_put_super() instead.

    Signed-off-by: Miklos Szeredi
    CC: stable@kernel.org

    Miklos Szeredi
     

07 Jan, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: clean up annotations of fc->lock
    fuse: fix sparse warning in ioctl
    fuse: update interface version
    fuse: add fuse_conn->release()
    fuse: separate out fuse_conn_init() from new_conn()
    fuse: add fuse_ prefix to several functions
    fuse: implement poll support
    fuse: implement unsolicited notification
    fuse: add file kernel handle
    fuse: implement ioctl support
    fuse: don't let fuse_req->end() put the base reference
    fuse: move FUSE_MINOR to miscdevice.h
    fuse: style fixes

    Linus Torvalds
     

02 Dec, 2008

1 commit


26 Nov, 2008

5 commits

  • Add fuse_ prefix to request_send*() and get_root_inode() as some of
    those functions will be exported for CUSE. With or without CUSE
    export, having the function names scoped is a good idea for
    debuggability.

    Signed-off-by: Tejun Heo
    Signed-off-by: Miklos Szeredi

    Tejun Heo
     
  • Implement poll support. Polled files are indexed using kh in a RB
    tree rooted at fuse_conn->polled_files.

    Client should send FUSE_NOTIFY_POLL notification once after processing
    FUSE_POLL which has FUSE_POLL_SCHEDULE_NOTIFY set. Sending
    notification unconditionally after the latest poll or everytime file
    content might have changed is inefficient but won't cause malfunction.

    fuse_file_poll() can sleep and requires patches from the following
    thread which allows f_op->poll() to sleep.

    http://thread.gmane.org/gmane.linux.kernel/726176

    Signed-off-by: Tejun Heo
    Signed-off-by: Miklos Szeredi

    Tejun Heo
     
  • Clients always used to write only in response to read requests. To
    implement poll efficiently, clients should be able to issue
    unsolicited notifications. This patch implements basic notification
    support.

    Zero fuse_out_header.unique is now accepted and considered unsolicited
    notification and the error field contains notification code. This
    patch doesn't implement any actual notification.

    Signed-off-by: Tejun Heo
    Signed-off-by: Miklos Szeredi

    Tejun Heo
     
  • fuse_req->end() was supposed to be put the base reference but there's
    no reason why it should. It only makes things more complex. Move it
    out of ->end() and make it the responsibility of request_end().

    Signed-off-by: Tejun Heo
    Signed-off-by: Miklos Szeredi

    Tejun Heo
     
  • Fix coding style errors reported by checkpatch and others. Uptdate
    copyright date to 2008.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

14 Nov, 2008

1 commit

  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Miklos Szeredi
    Cc: fuse-devel@lists.sourceforge.net
    Signed-off-by: James Morris

    David Howells
     

02 Nov, 2008

1 commit

  • As it is, all instances of ->release() for files that have ->fasync()
    need to remember to evict file from fasync lists; forgetting that
    creates a hole and we actually have a bunch that *does* forget.

    So let's keep our lives simple - let __fput() check FASYNC in
    file->f_flags and call ->fasync() there if it's been set. And lose that
    crap in ->release() instances - leaving it there is still valid, but we
    don't have to bother anymore.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

30 Apr, 2008

2 commits

  • fs/fuse/dev.c:306:2: warning: context imbalance in 'wait_answer_interruptible' - unexpected unlock
    fs/fuse/dev.c:361:2: warning: context imbalance in 'request_wait_answer' - unexpected unlock
    fs/fuse/dev.c:1002:4: warning: context imbalance in 'end_io_requests' - unexpected unlock

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Quoting Linus (3 years ago, FUSE inclusion discussions):

    "User-space filesystems are hard to get right. I'd claim that they
    are almost impossible, unless you limit them somehow (shared
    writable mappings are the nastiest part - if you don't have those,
    you can reasonably limit your problems by limiting the number of
    dirty pages you accept through normal "write()" calls)."

    Instead of attempting the impossible, I've just waited for the dirty page
    accounting infrastructure to materialize (thanks to Peter Zijlstra and
    others). This nicely solved the biggest problem: limiting the number of pages
    used for write caching.

    Some small details remained, however, which this largish patch attempts to
    address. It provides a page writeback implementation for fuse, which is
    completely safe against VM related deadlocks. Performance may not be very
    good for certain usage patterns, but generally it should be acceptable.

    It has been tested extensively with fsx-linux and bash-shared-mapping.

    Fuse page writeback design
    --------------------------

    fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM.
    It copies the contents of the original page, and queues a WRITE request to the
    userspace filesystem using this temp page.

    The writeback is finished instantly from the MM's point of view: the page is
    removed from the radix trees, and the PageDirty and PageWriteback flags are
    cleared.

    For the duration of the actual write, the NR_WRITEBACK_TEMP counter is
    incremented. The per-bdi writeback count is not decremented until the actual
    write completes.

    On dirtying the page, fuse waits for a previous write to finish before
    proceeding. This makes sure, there can only be one temporary page used at a
    time for one cached page.

    This approach is wasteful in both memory and CPU bandwidth, so why is this
    complication needed?

    The basic problem is that there can be no guarantee about the time in which
    the userspace filesystem will complete a write. It may be buggy or even
    malicious, and fail to complete WRITE requests. We don't want unrelated parts
    of the system to grind to a halt in such cases.

    Also a filesystem may need additional resources (particularly memory) to
    complete a WRITE request. There's a great danger of a deadlock if that
    allocation may wait for the writepage to finish.

    Currently there are several cases where the kernel can block on page
    writeback:

    - allocation order is larger than PAGE_ALLOC_COSTLY_ORDER
    - page migration
    - throttle_vm_writeout (through NR_WRITEBACK)
    - sync(2)

    Of course in some cases (fsync, msync) we explicitly want to allow blocking.
    So for these cases new code has to be added to fuse, since the VM is not
    tracking writeback pages for us any more.

    As an extra safetly measure, the maximum dirty ratio allocated to a single
    fuse filesystem is set to 1% by default. This way one (or several) buggy or
    malicious fuse filesystems cannot slow down the rest of the system by hogging
    dirty memory.

    With appropriate privileges, this limit can be raised through
    '/sys/class/bdi//max_ratio'.

    Signed-off-by: Miklos Szeredi
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

07 Feb, 2008

1 commit

  • Libfuse basically creates a new thread for each new request. This is fine for
    synchronous requests, which are naturally limited. However background
    requests (especially writepage) can cause a thread creation storm.

    To avoid this, limit the number of background requests available to userspace.

    This is done by introducing another queue for background requests, and a
    counter for the number of "active" requests, which are currently available for
    userspace.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

17 Oct, 2007

6 commits

  • Don't return -ENOENT for a read() on the fuse device when the request was
    aborted. Instead return -ENODEV, meaning the filesystem has been
    force-umounted or aborted.

    Previously ENOENT meant that the request was interrupted, but now the
    'aborted' flag is not set in case of interrupts.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Don't set 'aborted' flag on a request if it's interrupted. We have to wait
    for the answer anyway, and this would only a very little time while copying
    the reply.

    This means, that write() on the fuse device will not return -ENOENT during
    normal operation, only if the filesystem is aborted by a forced umount or
    through the fusectl interface.

    This could simplify userspace code somewhat when backward compatibility with
    earlier kernel versions is not required.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Move dput/mntput pair from request_end() to fuse_release_end(), because
    there's no other place they are used.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Make lifetime of 'struct fuse_file' independent from 'struct file' by adding a
    reference counter and destructor.

    This will enable asynchronous page writeback, where it cannot be guaranteed,
    that the file is not released while a request with this file handle is being
    served.

    The actual RELEASE request is only sent when there are no more references to
    the fuse_file.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Use wake_up_all instead of wake_up in put_reserved_req(), otherwise it is
    possible that the right task is not woken up.

    Also create a separate reserved_req_waitq in addition to the blocked_waitq,
    since they fulfill totally separate functions.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Set the read and write congestion state if the request queue is close to
    blocking, and clear it when it's not.

    This prevents unnecessary blocking in readahead and (when writable mmaps are
    allowed) writeback.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

08 Dec, 2006

2 commits

  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_KERNEL is an alias of GFP_KERNEL.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

01 Oct, 2006

1 commit


30 Sep, 2006

1 commit


26 Jun, 2006

5 commits

  • Add synchronous request interruption. This is needed for file locking
    operations which have to be interruptible. However filesystem may implement
    interruptibility of other operations (e.g. like NFS 'intr' mount option).

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Rename the 'interrupted' flag to 'aborted', since it indicates exactly that,
    and next patch will introduce an 'interrupted' flag for a

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • All POSIX locks owned by the current task are removed on close(). If the
    FLUSH request resulting initiated by close() fails to reach userspace, there
    might be locks remaining, which cannot be removed.

    The only reason it could fail, is if allocating the request fails. In this
    case use the request reserved for RELEASE, or if that is currently used by
    another FLUSH, wait for it to become available.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Add a control filesystem to fuse, replacing the attributes currently exported
    through sysfs. An empty directory '/sys/fs/fuse/connections' is still created
    in sysfs, and mounting the control filesystem here provides backward
    compatibility.

    Advantages of the control filesystem over the previous solution:

    - allows the object directory and the attributes to be owned by the
    filesystem owner, hence letting unpriviled users abort the
    filesystem connection

    - does not suffer from module unload race

    [akpm@osdl.org: fix this fs for recent dhowells depredations]
    [akpm@osdl.org: fix 64-bit printk warnings]
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Don't put requests into the background when a fatal interrupt occurs while the
    request is in userspace. This removes a major wart from the implementation.

    Backgrounding of requests was introduced to allow breaking of deadlocks.
    However now the same can be achieved by aborting the filesystem through the
    'abort' sysfs attribute.

    This is a change in the interface, but should not cause problems, since these
    kinds of deadlocks never happen during normal operation.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

26 Apr, 2006

2 commits

  • A deadlock was possible, when the last reference to the superblock was
    held due to a background request containing a file reference.

    Releasing the file would release the vfsmount which in turn would
    release the superblock. Since sbput_sem is held during the fput() and
    fuse_put_super() tries to acquire this same semaphore, a deadlock
    results.

    The solution is to move the fput() outside the region protected by
    sbput_sem.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • This reverts 73ce8355c243a434524a34c05cc417dd0467996e commit.

    It was wrong, because it didn't take into account the requirement,
    that iput() for background requests must be performed synchronously
    with ->put_super(), otherwise active inodes may remain after unmount.

    The right solution is to keep the sbput_sem and perform iput() within
    the locked region, but move fput() outside sbput_sem.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

12 Apr, 2006

3 commits

  • Request is already initialized in fuse_request_alloc() so no need to
    do it again in fuse_get_req().

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Properly accounting the number of waiting requests was forgotten in
    "clean up request accounting" patch.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • A deadlock was possible, when the last reference to the superblock was
    held due to a background request containing a file reference.

    Releasing the file would release the vfsmount which in turn would
    release the superblock. Since sbput_sem is held during the fput() and
    fuse_put_super() tries to acquire this same semaphore, a deadlock
    results.

    The chosen soltuion is to get rid of sbput_sem, and instead use the
    spinlock to ensure the referenced inodes/file are released only once.
    Since the actual release may sleep, defer these outside the locked
    region, but using local variables instead of the structure members.

    This is a much more rubust solution.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

11 Apr, 2006

5 commits

  • The previous patch removed limiting the number of outstanding requests. This
    patch adds a much simpler limiting, that is also compatible with file locking
    operations.

    A task may have at most one synchronous request allocated. So these requests
    need not be otherwise limited.

    However the number of background requests (release, forget, asynchronous
    reads, interrupted requests) can grow indefinitely. This can be used by a
    malicous user to cause FUSE to allocate arbitrary amounts of unswappable
    kernel memory, denying service.

    For this reason add a limit for the number of background requests, and block
    allocations of new requests until the number goes bellow the limit.

    Also use this mechanism to block all requests until the INIT reply is
    received.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • FUSE allocated most requests from a fixed size pool filled at mount time.
    However in some cases (release/forget) non-pool requests were used. File
    locking operations aren't well served by the request pool, since they may
    block indefinetly thus exhausting the pool.

    This patch removes the request pool and always allocates requests on demand.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Return consistent error values for the case when the opened device file has no
    mount associated yet.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Remove the global spinlock in favor of a per-mount one.

    This patch is basically find & replace. The difficult part has already been
    done by the previous patch.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • This is in preparation for removing the global spinlock in favor of a
    per-mount one.

    The only critical part is the interaction between fuse_dev_release() and
    fuse_fill_super(): fuse_dev_release() must see the assignment to
    file->private_data, otherwise it will leak the reference to fuse_conn.

    This is ensured by the fput() operation, which will synchronize the assignment
    with other CPU's that may do a final fput() soon after this.

    Also redundant locking is removed from fuse_fill_super(), where exclusion is
    already ensured by the BKL held for this function by the VFS.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi