12 Jun, 2009

40 commits

  • * have directory operations use mark_buffer_dirty_inode(),
    so that sync_mapping_buffers() would get those.
    * make qnx4_write_inode() honour its last argument.
    * get rid of insane copies of very ancient "walk the indirect blocks"
    in qnx4/fsync - they never matched the actual fs layout and, fortunately,
    never'd been called. Again, all this junk is not needed; ->fsync()
    should just do sync_mapping_buffers + sync_inode (and if we implement
    block allocation for qnx4, we'll need to use mark_buffer_dirty_inode()
    for extent blocks)

    Signed-off-by: Al Viro

    Al Viro
     
  • writes associated buffers, then does sync_inode() to write
    the inode itself (and to make it clean). Depends on
    ->write_inode() honouring the second argument.

    Signed-off-by: Al Viro

    Al Viro
     
  • [xfs, btrfs, capifs, shmem don't need BKL, exempt]

    Signed-off-by: Alessio Igor Bogani
    Signed-off-by: Al Viro

    Alessio Igor Bogani
     
  • I think the block_dump output in __mark_inode_dirty is missing dentry locking.
    Surely the i_dentry list can change any time, so we may not even *get* a
    dentry there. If we do get one by chance, then it would appear to be able to
    go away or get renamed at any time...

    Signed-off-by: Al Viro

    Nick Piggin
     
  • Some filesystems can call in to sync an inode that is still in the
    I_NEW state (eg. ext family, when mounted with -osync). This is OK
    because the filesystem has sole access to the new inode, so it can
    modify i_state without races (because no other thread should be
    modifying it, by definition of I_NEW). Ie. a false positive, so
    remove the warnings.

    The races are described here 7ef0d7377cb287e08f3ae94cebc919448e1f5dff,
    which is also where the warnings were introduced.

    Reported-by: Stephen Hemminger
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     
  • Signed-off-by: Mike Frysinger
    CC: Alexander Viro
    Signed-off-by: Al Viro

    Mike Frysinger
     
  • the write_super method is used for

    (1) writing back the superblock periodically from pdflush
    (2) called just before ->sync_fs for data integerity syncs

    We don't need (1) because we have our own peridoc writeout through xfssyncd,
    and we don't need (2) because xfs_fs_sync_fs performs a proper synchronous
    superblock writeout after all other data and metadata has been written out.

    Also remove ->s_dirt tracking as it's only used to decide when too call
    ->write_super.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Eric Sandeen
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • This should not trigger anymore, so kill it.

    Acked-by: Anton Altaparmakov
    Signed-off-by: Jens Axboe
    Signed-off-by: Al Viro

    Jens Axboe
     
  • Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Al Viro

    Theodore Ts'o
     
  • The only user of the i_cindex element in the inode structure is used
    is by the firewire drivers. As part of an attempt to slim down the
    inode structure to save memory --- since a typical Linux system will
    have hundreds of thousands if not millions of inodes cached, a
    reduction in the size inode has high leverage.

    The firewire driver does not need i_cindex in any fast path, so it's
    simple enough to calculate when it is needed, instead of wasting space
    in the inode structure.

    Signed-off-by: "Theodore Ts'o"
    Cc: krh@redhat.com
    Cc: stefanr@s5r6.in-berlin.de
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Al Viro

    Theodore Ts'o
     
  • Push down lock_super into ->write_super instances and remove it from the
    caller.

    Following filesystem don't need ->s_lock in ->write_super and are skipped:

    * bfs, nilfs2 - no other uses of s_lock and have internal locks in
    ->write_super
    * ext2 - uses BKL in ext2_write_super and has internal calls without s_lock
    * reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in
    ->write_super
    * xfs - no other uses of s_lock and uses internal lock (buffer lock on
    superblock buffer) to serialize ->write_super. Also xfs_fs_write_super
    is superflous and will go away in the next merge window

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • jffs2_write_super is only called from super.c and doesn't use any
    functionality from fs.c. So move it over to super.c and make it
    static there.

    [should go in through the vfs tree as it is a requirement for the
    next patch]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • [folded fix from Jiri Slaby]

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Note that since we can't run into contention between remount_fs and write_super
    (due to exclusion on s_umount), we have to care only about filesystems that
    touch lock_super() on their own. Out of those ext3, ext4, hpfs, sysv and ufs
    do need it; fat doesn't since its ->remount_fs() only accesses assign-once
    data (basically, it's "we have no atime on directories and only have atime on
    files for vfat; force nodiratime and possibly noatime into *flags").

    [folded a build fix from hch]

    Signed-off-by: Al Viro

    Al Viro
     
  • Move BKL into ->put_super from the only caller. A couple of
    filesystems had trivial enough ->put_super (only kfree and NULLing of
    s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
    hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
    of them probably don't need it, but I'd rather sort that out individually.
    Preferably after all the other BKL pushdowns in that area.

    [AV: original used to move lock_super() down as well; these changes are
    removed since we don't do lock_super() at all in generic_shutdown_super()
    now]
    [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • We can't run into contention on it. All other callers of lock_super()
    either hold s_umount (and we have it exclusive) or hold an active
    reference to superblock in question, which prevents the call of
    generic_shutdown_super() while the reference is held. So we can
    replace lock_super(s) with get_fs_excl() in generic_shutdown_super()
    (and corresponding change for unlock_super(), of course).

    Since ext4 expects s_lock held for its put_super, take lock_super()
    into it. The rest of filesystems do not care at all.

    Signed-off-by: Al Viro

    Al Viro
     
  • do_remount_sb() is fs/internal.h fodder, fsync_no_super() is long gone.

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Make sure a superblock really is writeable by checking MS_RDONLY
    under s_umount. sync_filesystems needed some re-arragement for
    that, but all but one sync_filesystem caller had the correct locking
    already so that we could add that check there. cachefiles grew
    s_umount locking.

    I've also added a WARN_ON to sync_filesystem to assert this for
    future callers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Merge the write_super helper into sync_super and move the check for
    ->write_super earlier so that we can avoid grabbing a reference to
    a superblock that doesn't have it.

    While we're at it also add a little comment documenting sync_supers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • d_unlinked() will be used in middle-term to ban checkpointing when opened
    but unlinked file is detected, and in long term, to detect such situation
    and special case on it.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Al Viro

    Alexey Dobriyan
     
  • We just did a full fs writeout using sync_filesystem before, and if
    that's not enough for the filesystem it can perform it's own writeout
    in ->put_super, which many filesystems already do.

    Move a call to foofs_write_super into every foofs_put_super for now to
    guarantee identical behaviour until it's cleaned up by the individual
    filesystem maintainers.

    Exceptions:

    - affs already has identical copy & pasted code at the beginning of
    affs_put_super so no need to do it twice.
    - xfs does the right thing without it and I have changes pending for
    the xfs tree touching this are so I don't really need conflicts
    here..

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Acked-by: Joel Becker
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Acked-by: Steven Whitehouse
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Introduce this function which just writes all the quota structures but
    avoids all the syncing and cache pruning work to expose quota structures
    to userspace. Use this function from __sync_filesystem when wait == 0.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Currently the VFS calls vfs_dq_sync to sync out disk quotas for a given
    superblock. This is a small wrapper around sync_dquots which for the
    case of a non-NULL superblock is a small wrapper around quota_sync_sb.

    Just make quota_sync_sb global (rename it to sync_quota_sb) and call it
    directly. Also call it directly for those cases in quota.c that have a
    superblock and leave sync_dquots purely an iterator over sync_quota_sb and
    remove it's superblock argument.

    To make this nicer move the check for the lack of a quota_sync method
    from the callers into sync_quota_sb.

    [folded build fix from Alexander Beregalov ]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Rename the function so that it better describe what it really does. Also
    remove the unnecessary include of buffer_head.h.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Move sync_filesystems(), __fsync_super(), fsync_super() from
    super.c to sync.c where it fits better.

    [build fixes folded]

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • It is unnecessarily fragile to have two places (fsync_super() and do_sync())
    doing data integrity sync of the filesystem. Alter __fsync_super() to
    accommodate needs of both callers and use it. So after this patch
    __fsync_super() is the only place where we gather all the calls needed to
    properly send all data on a filesystem to disk.

    Nice bonus is that we get a complete livelock avoidance and write_supers()
    is now only used for periodic writeback of superblocks.

    sync_blockdevs() introduced a couple of patches ago is gone now.

    [build fixes folded]

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • __fsync_super() does the same thing as fsync_super(). So change the only
    caller to use fsync_super() and make __fsync_super() static. This removes
    unnecessarily duplicated call to sync_blockdev() and prepares ground
    for the changes to __fsync_super() in the following patches.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • sync_filesystems() has a condition that if wait == 0 and s_dirt == 0, then
    ->sync_fs() isn't called. This does not really make much sence since s_dirt is
    generally used by a filesystem to mean that ->write_super() needs to be called.
    But ->sync_fs() does different things. I even suspect that some filesystems
    (btrfs?) sets s_dirt just to fool this logic.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • So far, do_sync() called:
    sync_inodes(0);
    sync_supers();
    sync_filesystems(0);
    sync_filesystems(1);
    sync_inodes(1);

    This ordering makes it kind of hard for filesystems as sync_inodes(0) need not
    submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing
    transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has
    not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2
    and others are hit by this) when racing e.g. with background writeback. A
    similar problem hits also other filesystems (e.g. ext2) because of
    write_supers() being called before the sync_inodes(1).

    Change the ordering of calls in do_sync() - this requires a new function
    sync_blockdevs() to preserve the property that block devices are always synced
    after write_super() / sync_fs() call.

    The same issue is fixed in __fsync_super() function used on umount /
    remount read-only.

    [AV: build fixes]

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Remove the unused s_async_list in the superblock, a leftover of the
    broken async inode deletion code that leaked into mainline. Having this
    in the middle of the sync/unmount path is not helpful for the following
    cleanups.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • This function walks the s_files lock, and operates primarily on the
    files in a superblock, so it better belongs here (eg. see also
    fs_may_remount_ro).

    [AV: ... and it shouldn't be static after that move]

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     
  • This patch speeds up lmbench lat_mmap test by about another 2% after the
    first patch.

    Before:
    avg = 462.286
    std = 5.46106

    After:
    avg = 453.12
    std = 9.58257

    (50 runs of each, stddev gives a reasonable confidence)

    It does this by introducing mnt_clone_write, which avoids some heavyweight
    operations of mnt_want_write if called on a vfsmount which we know already
    has a write count; and mnt_want_write_file, which can call mnt_clone_write
    if the file is open for write.

    After these two patches, mnt_want_write and mnt_drop_write go from 7% on
    the profile down to 1.3% (including mnt_clone_write).

    [AV: mnt_want_write_file() should take file alone and derive mnt from it;
    not only all callers have that form, but that's the only mnt about which
    we know that it's already held for write if file is opened for write]

    Cc: Dave Hansen
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de