16 Oct, 2008

4 commits


14 Oct, 2008

1 commit

  • This is a much better version of a previous patch to make the parser
    tables constant. Rather than changing the typedef, we put the "const" in
    all the various places where its required, allowing the __initconst
    exception for nfsroot which was the cause of the previous trouble.

    This was posted for review some time ago and I believe its been in -mm
    since then.

    Signed-off-by: Steven Whitehouse
    Cc: Alexander Viro
    Signed-off-by: Linus Torvalds

    Steven Whitehouse
     

27 Jul, 2008

4 commits

  • * MAY_CHDIR is redundant - it's an equivalent of MAY_ACCESS
    * MAY_ACCESS on fuse should affect only the last step of pathname resolution
    * fchdir() and chroot() should pass MAY_ACCESS, for the same reason why
    chdir() needs that.
    * now that we pass MAY_ACCESS explicitly in all cases, LOOKUP_ACCESS can be
    removed; it has no business being in nameidata.

    Signed-off-by: Al Viro

    Al Viro
     
  • All calls to remove_suid() are made with a file pointer, because
    (similarly to file_update_time) it is called when the file is written.

    Clean up callers by passing in a file instead of a dentry.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • * kill nameidata * argument; map the 3 bits in ->flags anybody cares
    about to new MAY_... ones and pass with the mask.
    * kill redundant gfs2_iop_permission()
    * sanitize ecryptfs_permission()
    * fix remaining places where ->permission() instances might barf on new
    MAY_... found in mask.

    The obvious next target in that direction is permission(9)

    folded fix for nfs_permission() breakage from Miklos Szeredi

    Signed-off-by: Al Viro

    Al Viro
     
  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

26 Jul, 2008

5 commits

  • If fuse filesystem doesn't define it's own lock operations, then allow the
    lock manager to work with fuse.

    Adding lockd support for remote locking is also possible, but more rarely
    used, so leave it till later.

    Signed-off-by: Miklos Szeredi
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Cc: Matthew Wilcox
    Cc: David Teigland
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Implement the get_parent export operation by sending a LOOKUP request with
    ".." as the name.

    Implement looking up an inode by node ID after it has been evicted from
    the cache. This is done by seding a LOOKUP request with "." as the name
    (for all file types, not just directories).

    The filesystem can set the FUSE_EXPORT_SUPPORT flag in the INIT reply, to
    indicate that it supports these special lookups.

    Thanks to John Muir for the original implementation of this feature.

    Signed-off-by: Miklos Szeredi
    Cc: "J. Bruce Fields"
    Cc: Trond Myklebust
    Cc: Matthew Wilcox
    Cc: David Teigland
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Add a new helper function which sends a LOOKUP request with the supplied
    name. This will be used by the next patch to send special LOOKUP requests
    with "." and ".." as the name.

    Signed-off-by: Miklos Szeredi
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Implement export_operations, to allow fuse filesystems to be exported to
    NFS. This feature has been in the out-of-tree fuse module, and is widely
    used and tested.

    It has not been originally merged into mainline, because doing the NFS
    export in userspace was thought to be a cleaner and more efficient way of
    doing it, than through the kernel.

    While that is true, it would also have involved a lot of duplicated effort
    at reimplementing NFS exporting (all the different versions of the
    protocol). This effort was unfortunately not undertaken by anyone, so we
    are left with doing it the easy but less efficient way.

    If this feature goes in, the out-of-tree fuse module can go away,
    which would have several advantages:

    - not having to maintain two versions
    - less confusion for users
    - no bugs due to kernel API changes

    Comment from hch:
    - Use the same fh_type values as XFS, since we use the same fh encoding.

    Signed-off-by: Miklos Szeredi
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Use d_splice_alias() instead of d_add() in fuse lookup code, to allow NFS
    exporting.

    Signed-off-by: Miklos Szeredi
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

18 Jun, 2008

1 commit

  • Use max not min to enforce a lower limit on the max I/O size.

    This bug was introduced by "fuse: fix max i/o size calculation" (commit
    e5d9a0df07484d6d191756878c974e4307fb24ce).

    Thanks to Brian Wang for noticing.

    Reported-by: Brian Wang
    Signed-off-by: Miklos Szeredi
    Acked-by: Szabolcs Szakacsits
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

25 May, 2008

1 commit

  • Fuse allocates a separate bdi for each filesystem, and registers them
    in sysfs with "MAJOR:MINOR" of sb->s_dev (st_dev). This works fine for
    anon devices normally used by fuse, but can conflict with an already
    registered BDI for "fuseblk" filesystems, where sb->s_dev represents a
    real block device. In particularl this happens if a non-partitioned
    device is being mounted.

    Fix by registering with a different name for "fuseblk" filesystems.

    Thanks to Ioan Ionita for the bug report.

    Signed-off-by: Miklos Szeredi
    Reported-by: Ioan Ionita
    Tested-by: Ioan Ionita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

13 May, 2008

1 commit

  • Prior to 2.6.26 fuse only supported single page write requests. In theory all
    fuse filesystem should be able support bigger than 4k writes, as there's
    nothing in the API to prevent it. Unfortunately there's a known case in
    NTFS-3G where big writes cause filesystem corruption. There could also be
    other filesystems, where the lack of testing with big write requests would
    result in bugs.

    To prevent such problems on a kernel upgrade, disable big writes by default,
    but let filesystems set a flag to turn it on.

    Signed-off-by: Miklos Szeredi
    Cc: Szabolcs Szakacsits
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

01 May, 2008

1 commit


30 Apr, 2008

9 commits

  • fs/fuse/dev.c:306:2: warning: context imbalance in 'wait_answer_interruptible' - unexpected unlock
    fs/fuse/dev.c:361:2: warning: context imbalance in 'request_wait_answer' - unexpected unlock
    fs/fuse/dev.c:1002:4: warning: context imbalance in 'end_io_requests' - unexpected unlock

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Fuse doesn't use i_mutex to protect setting i_size, and so
    generic_file_llseek() can be racy: it doesn't use i_size_read().

    So do a fuse specific llseek method, which does use i_size_read().

    [akpm@linux-foundation.org: make `retval' loff_t]
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Node ID is 64bit but it is passed as unsigned long to some functions. This
    breakage wasn't noticed, because libfuse uses unsigned long too.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Fix a bug that Werner Baumann reported: fuse can send a bigger write request
    than the maximum specified. This only affected direct_io operation.

    In addition set a sane minimum for the max_read and max_write tunables, so I/O
    always makes some progress.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • If the READ request returned a short count, then either

    - cached size is incorrect
    - filesystem is buggy, as short reads are only allowed on EOF

    So assume that the size is wrong and refresh it, so that cached read() doesn't
    zero fill the missing chunk.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Introduce fuse_perform_write. With fusexmp (a passthrough filesystem), large
    (1MB) writes into a backing tmpfs filesystem are sped up by almost 4 times
    (256MB/s vs 71MB/s).

    [mszeredi@suse.cz]:

    - split into smaller functions
    - testing
    - duplicate generic_file_aio_write(), so that there's no need to add a
    new ->perform_write() a_op. Comment from hch.

    Signed-off-by: Nick Piggin
    Signed-off-by: Miklos Szeredi
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Extract common code for setting i_size in write functions into a common
    helper.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Quoting Linus (3 years ago, FUSE inclusion discussions):

    "User-space filesystems are hard to get right. I'd claim that they
    are almost impossible, unless you limit them somehow (shared
    writable mappings are the nastiest part - if you don't have those,
    you can reasonably limit your problems by limiting the number of
    dirty pages you accept through normal "write()" calls)."

    Instead of attempting the impossible, I've just waited for the dirty page
    accounting infrastructure to materialize (thanks to Peter Zijlstra and
    others). This nicely solved the biggest problem: limiting the number of pages
    used for write caching.

    Some small details remained, however, which this largish patch attempts to
    address. It provides a page writeback implementation for fuse, which is
    completely safe against VM related deadlocks. Performance may not be very
    good for certain usage patterns, but generally it should be acceptable.

    It has been tested extensively with fsx-linux and bash-shared-mapping.

    Fuse page writeback design
    --------------------------

    fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM.
    It copies the contents of the original page, and queues a WRITE request to the
    userspace filesystem using this temp page.

    The writeback is finished instantly from the MM's point of view: the page is
    removed from the radix trees, and the PageDirty and PageWriteback flags are
    cleared.

    For the duration of the actual write, the NR_WRITEBACK_TEMP counter is
    incremented. The per-bdi writeback count is not decremented until the actual
    write completes.

    On dirtying the page, fuse waits for a previous write to finish before
    proceeding. This makes sure, there can only be one temporary page used at a
    time for one cached page.

    This approach is wasteful in both memory and CPU bandwidth, so why is this
    complication needed?

    The basic problem is that there can be no guarantee about the time in which
    the userspace filesystem will complete a write. It may be buggy or even
    malicious, and fail to complete WRITE requests. We don't want unrelated parts
    of the system to grind to a halt in such cases.

    Also a filesystem may need additional resources (particularly memory) to
    complete a WRITE request. There's a great danger of a deadlock if that
    allocation may wait for the writepage to finish.

    Currently there are several cases where the kernel can block on page
    writeback:

    - allocation order is larger than PAGE_ALLOC_COSTLY_ORDER
    - page migration
    - throttle_vm_writeout (through NR_WRITEBACK)
    - sync(2)

    Of course in some cases (fsync, msync) we explicitly want to allow blocking.
    So for these cases new code has to be added to fuse, since the VM is not
    tracking writeback pages for us any more.

    As an extra safetly measure, the maximum dirty ratio allocated to a single
    fuse filesystem is set to 1% by default. This way one (or several) buggy or
    malicious fuse filesystems cannot slow down the rest of the system by hogging
    dirty memory.

    With appropriate privileges, this limit can be raised through
    '/sys/class/bdi//max_ratio'.

    Signed-off-by: Miklos Szeredi
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Register FUSE's backing_dev_info under sysfs with the name "fuse-MAJOR:MINOR"

    Make the fuse control filesystem use s_dev instead of a fuse specific ID.
    This makes it easier to match directories under /sys/fs/fuse/connections/ with
    directories under /sys/class/bdi, and with actual mounts.

    Signed-off-by: Miklos Szeredi
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

25 Apr, 2008

1 commit


24 Feb, 2008

1 commit

  • I added a nasty local variable shadowing bug to fuse in 2.6.24, with the
    result, that the 'default_permissions' mount option is basically ignored.

    How did this happen?

    - old err declaration in inner scope
    - new err getting declared in outer scope
    - 'return err' from inner scope getting removed
    - old declaration not being noticed

    -Wshadow would have saved us, but it doesn't seem practical for
    the kernel :(

    More testing would have also saved us :((

    Signed-off-by: Miklos Szeredi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

09 Feb, 2008

1 commit


08 Feb, 2008

2 commits


07 Feb, 2008

3 commits

  • Libfuse basically creates a new thread for each new request. This is fine for
    synchronous requests, which are naturally limited. However background
    requests (especially writepage) can cause a thread creation storm.

    To avoid this, limit the number of background requests available to userspace.

    This is done by introducing another queue for background requests, and a
    counter for the number of "active" requests, which are currently available for
    userspace.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Move the fields 'dentry' and 'vfsmount' into the request specific union, since
    these are only used for the RELEASE request.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Invalidate attributes on create, since st_ctime is updated. Reported by
    Szabolcs Szakacsits.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

25 Jan, 2008

4 commits


30 Nov, 2007

1 commit