12 Jun, 2009

40 commits

  • The only user of the i_cindex element in the inode structure is used
    is by the firewire drivers. As part of an attempt to slim down the
    inode structure to save memory --- since a typical Linux system will
    have hundreds of thousands if not millions of inodes cached, a
    reduction in the size inode has high leverage.

    The firewire driver does not need i_cindex in any fast path, so it's
    simple enough to calculate when it is needed, instead of wasting space
    in the inode structure.

    Signed-off-by: "Theodore Ts'o"
    Cc: krh@redhat.com
    Cc: stefanr@s5r6.in-berlin.de
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Al Viro

    Theodore Ts'o
     
  • Push down lock_super into ->write_super instances and remove it from the
    caller.

    Following filesystem don't need ->s_lock in ->write_super and are skipped:

    * bfs, nilfs2 - no other uses of s_lock and have internal locks in
    ->write_super
    * ext2 - uses BKL in ext2_write_super and has internal calls without s_lock
    * reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in
    ->write_super
    * xfs - no other uses of s_lock and uses internal lock (buffer lock on
    superblock buffer) to serialize ->write_super. Also xfs_fs_write_super
    is superflous and will go away in the next merge window

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • jffs2_write_super is only called from super.c and doesn't use any
    functionality from fs.c. So move it over to super.c and make it
    static there.

    [should go in through the vfs tree as it is a requirement for the
    next patch]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • [folded fix from Jiri Slaby]

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Note that since we can't run into contention between remount_fs and write_super
    (due to exclusion on s_umount), we have to care only about filesystems that
    touch lock_super() on their own. Out of those ext3, ext4, hpfs, sysv and ufs
    do need it; fat doesn't since its ->remount_fs() only accesses assign-once
    data (basically, it's "we have no atime on directories and only have atime on
    files for vfat; force nodiratime and possibly noatime into *flags").

    [folded a build fix from hch]

    Signed-off-by: Al Viro

    Al Viro
     
  • Move BKL into ->put_super from the only caller. A couple of
    filesystems had trivial enough ->put_super (only kfree and NULLing of
    s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
    hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
    of them probably don't need it, but I'd rather sort that out individually.
    Preferably after all the other BKL pushdowns in that area.

    [AV: original used to move lock_super() down as well; these changes are
    removed since we don't do lock_super() at all in generic_shutdown_super()
    now]
    [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • We can't run into contention on it. All other callers of lock_super()
    either hold s_umount (and we have it exclusive) or hold an active
    reference to superblock in question, which prevents the call of
    generic_shutdown_super() while the reference is held. So we can
    replace lock_super(s) with get_fs_excl() in generic_shutdown_super()
    (and corresponding change for unlock_super(), of course).

    Since ext4 expects s_lock held for its put_super, take lock_super()
    into it. The rest of filesystems do not care at all.

    Signed-off-by: Al Viro

    Al Viro
     
  • do_remount_sb() is fs/internal.h fodder, fsync_no_super() is long gone.

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Make sure a superblock really is writeable by checking MS_RDONLY
    under s_umount. sync_filesystems needed some re-arragement for
    that, but all but one sync_filesystem caller had the correct locking
    already so that we could add that check there. cachefiles grew
    s_umount locking.

    I've also added a WARN_ON to sync_filesystem to assert this for
    future callers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Merge the write_super helper into sync_super and move the check for
    ->write_super earlier so that we can avoid grabbing a reference to
    a superblock that doesn't have it.

    While we're at it also add a little comment documenting sync_supers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • d_unlinked() will be used in middle-term to ban checkpointing when opened
    but unlinked file is detected, and in long term, to detect such situation
    and special case on it.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Al Viro

    Alexey Dobriyan
     
  • We just did a full fs writeout using sync_filesystem before, and if
    that's not enough for the filesystem it can perform it's own writeout
    in ->put_super, which many filesystems already do.

    Move a call to foofs_write_super into every foofs_put_super for now to
    guarantee identical behaviour until it's cleaned up by the individual
    filesystem maintainers.

    Exceptions:

    - affs already has identical copy & pasted code at the beginning of
    affs_put_super so no need to do it twice.
    - xfs does the right thing without it and I have changes pending for
    the xfs tree touching this are so I don't really need conflicts
    here..

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Acked-by: Joel Becker
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Acked-by: Steven Whitehouse
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Introduce this function which just writes all the quota structures but
    avoids all the syncing and cache pruning work to expose quota structures
    to userspace. Use this function from __sync_filesystem when wait == 0.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Currently the VFS calls vfs_dq_sync to sync out disk quotas for a given
    superblock. This is a small wrapper around sync_dquots which for the
    case of a non-NULL superblock is a small wrapper around quota_sync_sb.

    Just make quota_sync_sb global (rename it to sync_quota_sb) and call it
    directly. Also call it directly for those cases in quota.c that have a
    superblock and leave sync_dquots purely an iterator over sync_quota_sb and
    remove it's superblock argument.

    To make this nicer move the check for the lack of a quota_sync method
    from the callers into sync_quota_sb.

    [folded build fix from Alexander Beregalov ]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Rename the function so that it better describe what it really does. Also
    remove the unnecessary include of buffer_head.h.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Move sync_filesystems(), __fsync_super(), fsync_super() from
    super.c to sync.c where it fits better.

    [build fixes folded]

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • It is unnecessarily fragile to have two places (fsync_super() and do_sync())
    doing data integrity sync of the filesystem. Alter __fsync_super() to
    accommodate needs of both callers and use it. So after this patch
    __fsync_super() is the only place where we gather all the calls needed to
    properly send all data on a filesystem to disk.

    Nice bonus is that we get a complete livelock avoidance and write_supers()
    is now only used for periodic writeback of superblocks.

    sync_blockdevs() introduced a couple of patches ago is gone now.

    [build fixes folded]

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • __fsync_super() does the same thing as fsync_super(). So change the only
    caller to use fsync_super() and make __fsync_super() static. This removes
    unnecessarily duplicated call to sync_blockdev() and prepares ground
    for the changes to __fsync_super() in the following patches.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • sync_filesystems() has a condition that if wait == 0 and s_dirt == 0, then
    ->sync_fs() isn't called. This does not really make much sence since s_dirt is
    generally used by a filesystem to mean that ->write_super() needs to be called.
    But ->sync_fs() does different things. I even suspect that some filesystems
    (btrfs?) sets s_dirt just to fool this logic.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • So far, do_sync() called:
    sync_inodes(0);
    sync_supers();
    sync_filesystems(0);
    sync_filesystems(1);
    sync_inodes(1);

    This ordering makes it kind of hard for filesystems as sync_inodes(0) need not
    submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing
    transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has
    not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2
    and others are hit by this) when racing e.g. with background writeback. A
    similar problem hits also other filesystems (e.g. ext2) because of
    write_supers() being called before the sync_inodes(1).

    Change the ordering of calls in do_sync() - this requires a new function
    sync_blockdevs() to preserve the property that block devices are always synced
    after write_super() / sync_fs() call.

    The same issue is fixed in __fsync_super() function used on umount /
    remount read-only.

    [AV: build fixes]

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Remove the unused s_async_list in the superblock, a leftover of the
    broken async inode deletion code that leaked into mainline. Having this
    in the middle of the sync/unmount path is not helpful for the following
    cleanups.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • This function walks the s_files lock, and operates primarily on the
    files in a superblock, so it better belongs here (eg. see also
    fs_may_remount_ro).

    [AV: ... and it shouldn't be static after that move]

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     
  • This patch speeds up lmbench lat_mmap test by about another 2% after the
    first patch.

    Before:
    avg = 462.286
    std = 5.46106

    After:
    avg = 453.12
    std = 9.58257

    (50 runs of each, stddev gives a reasonable confidence)

    It does this by introducing mnt_clone_write, which avoids some heavyweight
    operations of mnt_want_write if called on a vfsmount which we know already
    has a write count; and mnt_want_write_file, which can call mnt_clone_write
    if the file is open for write.

    After these two patches, mnt_want_write and mnt_drop_write go from 7% on
    the profile down to 1.3% (including mnt_clone_write).

    [AV: mnt_want_write_file() should take file alone and derive mnt from it;
    not only all callers have that form, but that's the only mnt about which
    we know that it's already held for write if file is opened for write]

    Cc: Dave Hansen
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     
  • This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up
    basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it.
    A microbenchmark yes, but it exercises some important paths in the mm.

    Before:
    avg = 501.9
    std = 14.7773

    After:
    avg = 462.286
    std = 5.46106

    (50 runs of each, stddev gives a reasonable confidence, but there is quite
    a bit of variation there still)

    It does this by removing the complex per-cpu locking and counter-cache and
    replaces it with a percpu counter in struct vfsmount. This makes the code
    much simpler, and avoids spinlocks (although the msync is still pretty
    costly, unfortunately). It results in about 900 bytes smaller code too. It
    does increase the size of a vfsmount, however.

    It should also give a speedup on large systems if CPUs are frequently operating
    on different mounts (because the existing scheme has to operate on an atomic in
    the struct vfsmount when switching between mounts). But I'm most interested in
    the single threaded path performance for the moment.

    [AV: minor cleanup]

    Cc: Dave Hansen
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro