27 Sep, 2006

2 commits

  • This eliminates the i_blksize field from struct inode. Filesystems that want
    to provide a per-inode st_blksize can do so by providing their own getattr
    routine instead of using the generic_fillattr() function.

    Note that some filesystems were providing pretty much random (and incorrect)
    values for i_blksize.

    [bunk@stusta.de: cleanup]
    [akpm@osdl.org: generic_fillattr() fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • The following patches reduce the size of the VFS inode structure by 28 bytes
    on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
    in the inode size on a UP kernel that is configured in a production mode
    (i.e., with no spinlock or other debugging functions enabled; if you want to
    save memory taken up by in-core inodes, the first thing you should do is
    disable the debugging options; they are responsible for a huge amount of bloat
    in the VFS inode structure).

    This patch:

    The filesystem or device-specific pointer in the inode is inside a union,
    which is pretty pointless given that all 30+ users of this field have been
    using the void pointer. Get rid of the union and rename it to i_private, with
    a comment to explain who is allowed to use the void pointer. This is just a
    cleanup, but it allows us to reuse the union 'u' for something something where
    the union will actually be used.

    [judith@osdl.org: powerpc build fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Judith Lebzelter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     

15 Aug, 2006

1 commit


01 Aug, 2006

3 commits

  • Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • It is entirely possible (though rare) that jiffies half-wraps around, while a
    dentry/inode remains in the cache. This could mean that the dentry/inode is
    not invalidated for another half wraparound-time.

    To get around this problem, use 64-bit jiffies. The only problem with this is
    that dentry->d_time is 32 bits on 32-bit archs. So use d_fsdata as the high
    32 bits. This is an ugly hack, but far simpler, than having to allocate
    private data just for this purpose.

    Since 64-bit jiffies can be assumed never to wrap around, simple comparison
    can be used, and a zero time value can represent "invalid".

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • An attribute and entry timeout of zero should mean, that the entity is
    invalidated immediately after the operation. Previously invalidation only
    happened at the next clock tick.

    Reported and tested by Craig Davies.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

29 Jun, 2006

1 commit


26 Jun, 2006

8 commits

  • * git://git.linux-nfs.org/pub/linux/nfs-2.6: (51 commits)
    nfs: remove nfs_put_link()
    nfs-build-fix-99
    git-nfs-build-fixes
    Merge branch 'odirect'
    NFS: alloc nfs_read/write_data as direct I/O is scheduled
    NFS: Eliminate nfs_get_user_pages()
    NFS: refactor nfs_direct_free_user_pages
    NFS: remove user_addr, user_count, and pos from nfs_direct_req
    NFS: "open code" the NFS direct write rescheduler
    NFS: Separate functions for counting outstanding NFS direct I/Os
    NLM: Fix reclaim races
    NLM: sem to mutex conversion
    locks.c: add the fl_owner to nlm_compare_locks
    NFS: Display the chosen RPCSEC_GSS security flavour in /proc/mounts
    NFS: Split fs/nfs/inode.c
    NFS: Fix typo in nfs_do_clone_mount()
    NFS: Fix compile errors introduced by referrals patches
    NFSv4: Ensure that referral mounts bind to a reserved port
    NFSv4: A root pathname is sent as a zero component4
    NFSv4: Follow a referral
    ...

    Linus Torvalds
     
  • VFS uses current->files pointer as lock owner ID, and it wouldn't be
    prudent to expose this value to userspace. So scramble it with XTEA using
    a per connection random key, known only to the kernel. Only one direction
    needs to be implemented, since the ID is never sent in the reverse
    direction.

    The XTEA algorithm is implemented inline since it's simple enough to do so,
    and this adds less complexity than if the crypto API were used.

    Thanks to Jesper Juhl for the idea.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Add synchronous request interruption. This is needed for file locking
    operations which have to be interruptible. However filesystem may implement
    interruptibility of other operations (e.g. like NFS 'intr' mount option).

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Rename the 'interrupted' flag to 'aborted', since it indicates exactly that,
    and next patch will introduce an 'interrupted' flag for a

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • All POSIX locks owned by the current task are removed on close(). If the
    FLUSH request resulting initiated by close() fails to reach userspace, there
    might be locks remaining, which cannot be removed.

    The only reason it could fail, is if allocating the request fails. In this
    case use the request reserved for RELEASE, or if that is currently used by
    another FLUSH, wait for it to become available.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • This patch adds POSIX file locking support to the fuse interface.

    This implementation doesn't keep any locking state in kernel. Unlocking on
    close() is handled by the FLUSH message, which now contains the lock owner id.

    Mandatory locking is not supported. The filesystem may enfoce mandatory
    locking in userspace if needed.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Add a control filesystem to fuse, replacing the attributes currently exported
    through sysfs. An empty directory '/sys/fs/fuse/connections' is still created
    in sysfs, and mounting the control filesystem here provides backward
    compatibility.

    Advantages of the control filesystem over the previous solution:

    - allows the object directory and the attributes to be owned by the
    filesystem owner, hence letting unpriviled users abort the
    filesystem connection

    - does not suffer from module unload race

    [akpm@osdl.org: fix this fs for recent dhowells depredations]
    [akpm@osdl.org: fix 64-bit printk warnings]
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Don't put requests into the background when a fatal interrupt occurs while the
    request is in userspace. This removes a major wart from the implementation.

    Backgrounding of requests was introduced to allow breaking of deadlocks.
    However now the same can be achieved by aborting the filesystem through the
    'abort' sysfs attribute.

    This is a change in the interface, but should not cause problems, since these
    kinds of deadlocks never happen during normal operation.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

25 Jun, 2006

1 commit


23 Jun, 2006

3 commits

  • Pass the POSIX lock owner ID to the flush operation.

    This is useful for filesystems which don't want to store any locking state
    in inode->i_flock but want to handle locking/unlocking POSIX locks
    internally. FUSE is one such filesystem but I think it possible that some
    network filesystems would need this also.

    Also add a flag to indicate that a POSIX locking request was generated by
    close(), so filesystems using the above feature won't send an extra locking
    request in this case.

    Signed-off-by: Miklos Szeredi
    Cc: Trond Myklebust
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Give the statfs superblock operation a dentry pointer rather than a superblock
    pointer.

    This complements the get_sb() patch. That reduced the significance of
    sb->s_root, allowing NFS to place a fake root there. However, NFS does
    require a dentry to use as a target for the statfs operation. This permits
    the root in the vfsmount to be used instead.

    linux/mount.h has been added where necessary to make allyesconfig build
    successfully.

    Interest has also been expressed for use with the FUSE and XFS filesystems.

    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Extend the get_sb() filesystem operation to take an extra argument that
    permits the VFS to pass in the target vfsmount that defines the mountpoint.

    The filesystem is then required to manually set the superblock and root dentry
    pointers. For most filesystems, this should be done with simple_set_mnt()
    which will set the superblock pointer and then set the root dentry to the
    superblock's s_root (as per the old default behaviour).

    The get_sb() op now returns an integer as there's now no need to return the
    superblock pointer.

    This patch permits a superblock to be implicitly shared amongst several mount
    points, such as can be done with NFS to avoid potential inode aliasing. In
    such a case, simple_set_mnt() would not be called, and instead the mnt_root
    and mnt_sb would be set directly.

    The patch also makes the following changes:

    (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
    pointer argument and return an integer, so most filesystems have to change
    very little.

    (*) If one of the convenience function is not used, then get_sb() should
    normally call simple_set_mnt() to instantiate the vfsmount. This will
    always return 0, and so can be tail-called from get_sb().

    (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
    dcache upon superblock destruction rather than shrink_dcache_anon().

    This is required because the superblock may now have multiple trees that
    aren't actually bound to s_root, but that still need to be cleaned up. The
    currently called functions assume that the whole tree is rooted at s_root,
    and that anonymous dentries are not the roots of trees which results in
    dentries being left unculled.

    However, with the way NFS superblock sharing are currently set to be
    implemented, these assumptions are violated: the root of the filesystem is
    simply a dummy dentry and inode (the real inode for '/' may well be
    inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
    with child trees.

    [*] Anonymous until discovered from another tree.

    (*) The documentation has been adjusted, including the additional bit of
    changing ext2_* into foo_* in the documentation.

    [akpm@osdl.org: convert ipath_fs, do other stuff]
    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

09 Jun, 2006

1 commit


26 Apr, 2006

3 commits

  • BKL does not protect against races if the task may sleep between
    checking and setting a value. So move checking of file->private_data
    near to setting it in fuse_fill_super().

    Found by Al Viro.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • A deadlock was possible, when the last reference to the superblock was
    held due to a background request containing a file reference.

    Releasing the file would release the vfsmount which in turn would
    release the superblock. Since sbput_sem is held during the fput() and
    fuse_put_super() tries to acquire this same semaphore, a deadlock
    results.

    The solution is to move the fput() outside the region protected by
    sbput_sem.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • This reverts 73ce8355c243a434524a34c05cc417dd0467996e commit.

    It was wrong, because it didn't take into account the requirement,
    that iput() for background requests must be performed synchronously
    with ->put_super(), otherwise active inodes may remain after unmount.

    The right solution is to keep the sbput_sem and perform iput() within
    the locked region, but move fput() outside sbput_sem.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

12 Apr, 2006

4 commits


11 Apr, 2006

9 commits

  • The previous patch removed limiting the number of outstanding requests. This
    patch adds a much simpler limiting, that is also compatible with file locking
    operations.

    A task may have at most one synchronous request allocated. So these requests
    need not be otherwise limited.

    However the number of background requests (release, forget, asynchronous
    reads, interrupted requests) can grow indefinitely. This can be used by a
    malicous user to cause FUSE to allocate arbitrary amounts of unswappable
    kernel memory, denying service.

    For this reason add a limit for the number of background requests, and block
    allocations of new requests until the number goes bellow the limit.

    Also use this mechanism to block all requests until the INIT reply is
    received.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • FUSE allocated most requests from a fixed size pool filled at mount time.
    However in some cases (release/forget) non-pool requests were used. File
    locking operations aren't well served by the request pool, since they may
    block indefinetly thus exhausting the pool.

    This patch removes the request pool and always allocates requests on demand.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Return consistent error values for the case when the opened device file has no
    mount associated yet.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • Remove the global spinlock in favor of a per-mount one.

    This patch is basically find & replace. The difficult part has already been
    done by the previous patch.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • This is in preparation for removing the global spinlock in favor of a
    per-mount one.

    The only critical part is the interaction between fuse_dev_release() and
    fuse_fill_super(): fuse_dev_release() must see the assignment to
    file->private_data, otherwise it will leak the reference to fuse_conn.

    This is ensured by the fput() operation, which will synchronize the assignment
    with other CPU's that may do a final fput() soon after this.

    Also redundant locking is removed from fuse_fill_super(), where exclusion is
    already ensured by the BKL held for this function by the VFS.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • I don't like duplicating the connected and list_empty tests in fuse_dev_readv,
    but this seemed cleaner than adding the f_flags test to request_wait.

    Signed-off-by: Jeff Dike
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • This adds asynchronous notification to FUSE - a FUSE server can request
    O_ASYNC on a /dev/fuse file descriptor and receive SIGIO when there is input
    available.

    One subtlety - fuse_dev_fasync, which is called when O_ASYNC is requested,
    does no locking, unlink the other methods. I think it's unnecessary, as the
    fuse_conn.fasync list is manipulated only by fasync_helper and kill_fasync,
    which provide their own locking. It would also be wrong to use the fuse_lock,
    as it's a spin lock and fasync_helper can sleep. My one concern with this is
    the fuse_conn going away underneath fuse_dev_fasync - sys_fcntl takes a
    reference on the file struct, so this seems not to be a problem.

    Signed-off-by: Jeff Dike
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • fuse_dev_poll() returned an error value instead of a poll mask. Luckily (or
    unluckily) -ENODEV does contain the POLLERR bit.

    There's also a race if filesystem is unmounted between fuse_get_conn() and
    spin_lock(), in which case this event will be missed by poll().

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • During heavy parallel filesystem activity it was possible to Oops the kernel.
    The reason is that read_cache_pages() could skip pages which have already been
    inserted into the cache by another task. Occasionally this may result in zero
    pages actually being sent, while fuse_send_readpages() relies on at least one
    page being in the request.

    So check this corner case and just free the request instead of trying to send
    it.

    Reported and tested by Konstantin Isakov.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

29 Mar, 2006

1 commit

  • This is a conversion to make the various file_operations structs in fs/
    const. Basically a regexp job, with a few manual fixups

    The goal is both to increase correctness (harder to accidentally write to
    shared datastructures) and reducing the false sharing of cachelines with
    things that get dirty in .data (while .rodata is nicely read only and thus
    cache clean)

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

01 Mar, 2006

1 commit

  • If negative entries (nodeid == 0) were sent in reply to LOOKUP requests,
    two bugs could be triggered:

    - looking up a negative entry would return -EIO,

    - revaildate on an entry which turned negative would send a FORGET
    request with zero nodeid, which would cause an abort() in the
    library.

    The above would only happen if the 'negative_timeout=N' option was used,
    otherwise lookups reply -ENOENT, which worked correctly.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

18 Feb, 2006

1 commit

  • There's a rather theoretical case of the BUG triggering in
    fuse_reset_request():

    - iget() fails because of OOM after a successful CREATE_OPEN request
    - during IO on the resulting RELEASE request the connection is aborted

    Fix and add warning to fuse_reset_request().

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

06 Feb, 2006

1 commit