Eric Lee / smarc-fsl-linux-kernel

12 Jun, 2009

40 commits

79d257675 Sanitize qnx4 fsync handling ... Browse Code »

* have directory operations use mark_buffer_dirty_inode(),
so that sync_mapping_buffers() would get those.
* make qnx4_write_inode() honour its last argument.
* get rid of insane copies of very ancient "walk the indirect blocks"
in qnx4/fsync - they never matched the actual fs layout and, fortunately,
never'd been called. Again, all this junk is not needed; ->fsync()
should just do sync_mapping_buffers + sync_inode (and if we implement
block allocation for qnx4, we'll need to use mark_buffer_dirty_inode()
for extent blocks)

Signed-off-by: Al Viro

Al Viro
2009-06-12 09:36:11 +0800
d5aacad54 New helper - simple_fsync() ... Browse Code »

writes associated buffers, then does sync_inode() to write
the inode itself (and to make it clean). Depends on
->write_inode() honouring the second argument.

Signed-off-by: Al Viro

Al Viro
2009-06-12 09:36:11 +0800
337eb00a2 Push BKL down into ->remount_fs() ... Browse Code »

[xfs, btrfs, capifs, shmem don't need BKL, exempt]

Signed-off-by: Alessio Igor Bogani
Signed-off-by: Al Viro

Alessio Igor Bogani
2009-06-12 09:36:11 +0800
4195f73d1 fs: block_dump missing dentry locking ... Browse Code »

I think the block_dump output in __mark_inode_dirty is missing dentry locking.
Surely the i_dentry list can change any time, so we may not even *get* a
dentry there. If we do get one by chance, then it would appear to be able to
go away or get renamed at any time...

Signed-off-by: Al Viro

Nick Piggin
2009-06-12 09:36:10 +0800
545b9fd3d fs: remove incorrect I_NEW warnings ... Browse Code »

Some filesystems can call in to sync an inode that is still in the
I_NEW state (eg. ext family, when mounted with -osync). This is OK
because the filesystem has sole access to the new inode, so it can
modify i_state without races (because no other thread should be
modifying it, by definition of I_NEW). Ie. a false positive, so
remove the warnings.

The races are described here 7ef0d7377cb287e08f3ae94cebc919448e1f5dff,
which is also where the warnings were introduced.

Reported-by: Stephen Hemminger
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

Nick Piggin
2009-06-12 09:36:10 +0800
8688b8635 linux/magic.h: move cramfs magic out of cramfs_fs.h ... Browse Code »

Signed-off-by: Mike Frysinger
CC: Alexander Viro
Signed-off-by: Al Viro

Mike Frysinger
2009-06-12 09:36:10 +0800
f95022161 xfs: remove ->write_super and stop maintaining ->s_dirt ... Browse Code »

the write_super method is used for

(1) writing back the superblock periodically from pdflush
(2) called just before ->sync_fs for data integerity syncs

We don't need (1) because we have our own peridoc writeout through xfssyncd,
and we don't need (2) because xfs_fs_sync_fs performs a proper synchronous
superblock writeout after all other data and metadata has been written out.

Also remove ->s_dirt tracking as it's only used to decide when too call
->write_super.

Signed-off-by: Christoph Hellwig
Reviewed-by: Eric Sandeen
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:10 +0800
13205fb92 ntfs: remove old debug check for dirty data in ntfs_put_super() ... Browse Code »

This should not trigger anymore, so kill it.

Acked-by: Anton Altaparmakov
Signed-off-by: Jens Axboe
Signed-off-by: Al Viro

Jens Axboe
2009-06-12 09:36:10 +0800
28ad0c118 fs: Rearrange inode structure elements to avoid waste due to padding ... Browse Code »

Signed-off-by: "Theodore Ts'o"
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro

Theodore Ts'o
2009-06-12 09:36:09 +0800
9fd5746fd fs: Remove i_cindex from struct inode ... Browse Code »

The only user of the i_cindex element in the inode structure is used
is by the firewire drivers. As part of an attempt to slim down the
inode structure to save memory --- since a typical Linux system will
have hundreds of thousands if not millions of inodes cached, a
reduction in the size inode has high leverage.

The firewire driver does not need i_cindex in any fast path, so it's
simple enough to calculate when it is needed, instead of wasting space
in the inode structure.

Signed-off-by: "Theodore Ts'o"
Cc: krh@redhat.com
Cc: stefanr@s5r6.in-berlin.de
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro

Theodore Ts'o
2009-06-12 09:36:09 +0800
ebc1ac164 ->write_super lock_super pushdown ... Browse Code »

Push down lock_super into ->write_super instances and remove it from the
caller.

Following filesystem don't need ->s_lock in ->write_super and are skipped:

* bfs, nilfs2 - no other uses of s_lock and have internal locks in
->write_super
* ext2 - uses BKL in ext2_write_super and has internal calls without s_lock
* reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in
->write_super
* xfs - no other uses of s_lock and uses internal lock (buffer lock on
superblock buffer) to serialize ->write_super. Also xfs_fs_write_super
is superflous and will go away in the next merge window

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:09 +0800
01ba68757 jffs2: move jffs2_write_super to super.c ... Browse Code »

jffs2_write_super is only called from super.c and doesn't use any
functionality from fs.c. So move it over to super.c and make it
static there.

[should go in through the vfs tree as it is a requirement for the
next patch]

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:09 +0800
4aa98cf76 Push BKL down into do_remount_sb() ... Browse Code »

[folded fix from Jiri Slaby]

Signed-off-by: Al Viro

Al Viro
2009-06-12 09:36:08 +0800
7f78d4cd4 Push BKL down beyond VFS-only parts of do_mount() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2009-06-12 09:36:08 +0800
6fac98dd2 Push BKL into do_mount() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2009-06-12 09:36:08 +0800
bbd6851a3 Push lock_super() into the ->remount_fs() of filesystems that care about it ... Browse Code »

Note that since we can't run into contention between remount_fs and write_super
(due to exclusion on s_umount), we have to care only about filesystems that
touch lock_super() on their own. Out of those ext3, ext4, hpfs, sysv and ufs
do need it; fat doesn't since its ->remount_fs() only accesses assign-once
data (basically, it's "we have no atime on directories and only have atime on
files for vfat; force nodiratime and possibly noatime into *flags").

[folded a build fix from hch]

Signed-off-by: Al Viro

Al Viro
2009-06-12 09:36:08 +0800
6cfd01484 push BKL down into ->put_super ... Browse Code »

Move BKL into ->put_super from the only caller. A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.

[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:07 +0800
a9e220f83 No need to do lock_super() for exclusion in generic_shutdown_super() ... Browse Code »

We can't run into contention on it. All other callers of lock_super()
either hold s_umount (and we have it exclusive) or hold an active
reference to superblock in question, which prevents the call of
generic_shutdown_super() while the reference is held. So we can
replace lock_super(s) with get_fs_excl() in generic_shutdown_super()
(and corresponding change for unlock_super(), of course).

Since ext4 expects s_lock held for its put_super, take lock_super()
into it. The rest of filesystems do not care at all.

Signed-off-by: Al Viro

Al Viro
2009-06-12 09:36:07 +0800
62c6943b4 Trim a bit of crap from fs.h ... Browse Code »

do_remount_sb() is fs/internal.h fodder, fsync_no_super() is long gone.

Signed-off-by: Al Viro

Al Viro
2009-06-12 09:36:07 +0800
443b94baa Make sure that all callers of remount hold s_umount exclusive ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2009-06-12 09:36:07 +0800
5af7926ff enforce ->sync_fs is only called for rw superblock ... Browse Code »

Make sure a superblock really is writeable by checking MS_RDONLY
under s_umount. sync_filesystems needed some re-arragement for
that, but all but one sync_filesystem caller had the correct locking
already so that we could add that check there. cachefiles grew
s_umount locking.

I've also added a WARN_ON to sync_filesystem to assert this for
future callers.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:06 +0800
e50047533 cleanup sync_supers ... Browse Code »

Merge the write_super helper into sync_super and move the check for
->write_super earlier so that we can avoid grabbing a reference to
a superblock that doesn't have it.

While we're at it also add a little comment documenting sync_supers.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:06 +0800
f3da392e9 dcache: extrace and use d_unlinked() ... Browse Code »

d_unlinked() will be used in middle-term to ban checkpointing when opened
but unlinked file is detected, and in long term, to detect such situation
and special case on it.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Al Viro

Alexey Dobriyan
2009-06-12 09:36:06 +0800
8c85e1251 remove ->write_super call in generic_shutdown_super ... Browse Code »
88

We just did a full fs writeout using sync_filesystem before, and if
that's not enough for the filesystem it can perform it's own writeout
in ->put_super, which many filesystems already do.

Move a call to foofs_write_super into every foofs_put_super for now to
guarantee identical behaviour until it's cleaned up by the individual
filesystem maintainers.

Exceptions:

- affs already has identical copy & pasted code at the beginning of
affs_put_super so no need to do it twice.
- xfs does the right thing without it and I have changes pending for
the xfs tree touching this are so I don't really need conflicts
here..

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:06 +0800
517bfae28 qnx4: remove ->write_super ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:05 +0800
94cb993f2 ocfs2: remove ->write_super and stop maintaining ->s_dirt ... Browse Code »

Signed-off-by: Christoph Hellwig
Acked-by: Joel Becker
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:05 +0800
b7d245de2 gfs2: remove ->write_super and stop maintaining ->s_dirt ... Browse Code »

Signed-off-by: Christoph Hellwig
Acked-by: Steven Whitehouse
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:05 +0800
ca41f7b91 ext3: remove ->write_super and stop maintaining ->s_dirt ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:05 +0800
59d697b70 btrfs: remove ->write_super and stop maintaining ->s_dirt ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:05 +0800
c3f8a40c1 quota: Introduce writeout_quota_sb() (version 4) ... Browse Code »

Introduce this function which just writes all the quota structures but
avoids all the syncing and cache pruning work to expose quota structures
to userspace. Use this function from __sync_filesystem when wait == 0.

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2009-06-12 09:36:04 +0800
850b201b0 quota: cleanup dquota sync functions (version 4) ... Browse Code »

Currently the VFS calls vfs_dq_sync to sync out disk quotas for a given
superblock. This is a small wrapper around sync_dquots which for the
case of a non-NULL superblock is a small wrapper around quota_sync_sb.

Just make quota_sync_sb global (rename it to sync_quota_sb) and call it
directly. Also call it directly for those cases in quota.c that have a
superblock and leave sync_dquots purely an iterator over sync_quota_sb and
remove it's superblock argument.

To make this nicer move the check for the lack of a quota_sync method
from the callers into sync_quota_sb.

[folded build fix from Alexander Beregalov ]

Signed-off-by: Christoph Hellwig
Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:04 +0800
60b0680fa vfs: Rename fsync_super() to sync_filesystem() (version 4) ... Browse Code »

Rename the function so that it better describe what it really does. Also
remove the unnecessary include of buffer_head.h.

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2009-06-12 09:36:04 +0800
c15c54f5f vfs: Move syncing code from super.c to sync.c (version 4) ... Browse Code »

Move sync_filesystems(), __fsync_super(), fsync_super() from
super.c to sync.c where it fits better.

[build fixes folded]

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2009-06-12 09:36:04 +0800
5cee5815d vfs: Make sys_sync() use fsync_super() (version 4) ... Browse Code »

It is unnecessarily fragile to have two places (fsync_super() and do_sync())
doing data integrity sync of the filesystem. Alter __fsync_super() to
accommodate needs of both callers and use it. So after this patch
__fsync_super() is the only place where we gather all the calls needed to
properly send all data on a filesystem to disk.

Nice bonus is that we get a complete livelock avoidance and write_supers()
is now only used for periodic writeback of superblocks.

sync_blockdevs() introduced a couple of patches ago is gone now.

[build fixes folded]

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2009-06-12 09:36:03 +0800
429479f03 vfs: Make __fsync_super() a static function (version 4) ... Browse Code »

__fsync_super() does the same thing as fsync_super(). So change the only
caller to use fsync_super() and make __fsync_super() static. This removes
unnecessarily duplicated call to sync_blockdev() and prepares ground
for the changes to __fsync_super() in the following patches.

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2009-06-12 09:36:03 +0800
bfe881255 vfs: Call ->sync_fs() even if s_dirt is 0 (version 4) ... Browse Code »

sync_filesystems() has a condition that if wait == 0 and s_dirt == 0, then
->sync_fs() isn't called. This does not really make much sence since s_dirt is
generally used by a filesystem to mean that ->write_super() needs to be called.
But ->sync_fs() does different things. I even suspect that some filesystems
(btrfs?) sets s_dirt just to fool this logic.

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2009-06-12 09:36:03 +0800
5a3e5cb8e vfs: Fix sys_sync() and fsync_super() reliability (version 4) ... Browse Code »

So far, do_sync() called:
sync_inodes(0);
sync_supers();
sync_filesystems(0);
sync_filesystems(1);
sync_inodes(1);

This ordering makes it kind of hard for filesystems as sync_inodes(0) need not
submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing
transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has
not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2
and others are hit by this) when racing e.g. with background writeback. A
similar problem hits also other filesystems (e.g. ext2) because of
write_supers() being called before the sync_inodes(1).

Change the ordering of calls in do_sync() - this requires a new function
sync_blockdevs() to preserve the property that block devices are always synced
after write_super() / sync_fs() call.

The same issue is fixed in __fsync_super() function used on umount /
remount read-only.

[AV: build fixes]

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2009-06-12 09:36:03 +0800
876a9f76a remove s_async_list ... Browse Code »

Remove the unused s_async_list in the superblock, a leftover of the
broken async inode deletion code that leaked into mainline. Having this
in the middle of the sync/unmount path is not helpful for the following
cleanups.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-06-12 09:36:02 +0800
864d7c4c0 fs: move mark_files_ro into file_table.c ... Browse Code »

This function walks the s_files lock, and operates primarily on the
files in a superblock, so it better belongs here (eg. see also
fs_may_remount_ro).

[AV: ... and it shouldn't be static after that move]

Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

npiggin@suse.de
2009-06-12 09:36:02 +0800
96029c4e0 fs: introduce mnt_clone_write ... Browse Code »

This patch speeds up lmbench lat_mmap test by about another 2% after the
first patch.

Before:
avg = 462.286
std = 5.46106

After:
avg = 453.12
std = 9.58257

(50 runs of each, stddev gives a reasonable confidence)

It does this by introducing mnt_clone_write, which avoids some heavyweight
operations of mnt_want_write if called on a vfsmount which we know already
has a write count; and mnt_want_write_file, which can call mnt_clone_write
if the file is open for write.

After these two patches, mnt_want_write and mnt_drop_write go from 7% on
the profile down to 1.3% (including mnt_clone_write).

[AV: mnt_want_write_file() should take file alone and derive mnt from it;
not only all callers have that form, but that's the only mnt about which
we know that it's already held for write if file is opened for write]

Cc: Dave Hansen
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro

npiggin@suse.de
2009-06-12 09:36:02 +0800