12 Jun, 2009
5 commits
-
__fsync_super() does the same thing as fsync_super(). So change the only
caller to use fsync_super() and make __fsync_super() static. This removes
unnecessarily duplicated call to sync_blockdev() and prepares ground
for the changes to __fsync_super() in the following patches.Signed-off-by: Jan Kara
Signed-off-by: Al Viro -
sync_filesystems() has a condition that if wait == 0 and s_dirt == 0, then
->sync_fs() isn't called. This does not really make much sence since s_dirt is
generally used by a filesystem to mean that ->write_super() needs to be called.
But ->sync_fs() does different things. I even suspect that some filesystems
(btrfs?) sets s_dirt just to fool this logic.Signed-off-by: Jan Kara
Signed-off-by: Al Viro -
So far, do_sync() called:
sync_inodes(0);
sync_supers();
sync_filesystems(0);
sync_filesystems(1);
sync_inodes(1);This ordering makes it kind of hard for filesystems as sync_inodes(0) need not
submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing
transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has
not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2
and others are hit by this) when racing e.g. with background writeback. A
similar problem hits also other filesystems (e.g. ext2) because of
write_supers() being called before the sync_inodes(1).Change the ordering of calls in do_sync() - this requires a new function
sync_blockdevs() to preserve the property that block devices are always synced
after write_super() / sync_fs() call.The same issue is fixed in __fsync_super() function used on umount /
remount read-only.[AV: build fixes]
Signed-off-by: Jan Kara
Signed-off-by: Al Viro -
Remove the unused s_async_list in the superblock, a leftover of the
broken async inode deletion code that leaked into mainline. Having this
in the middle of the sync/unmount path is not helpful for the following
cleanups.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
This function walks the s_files lock, and operates primarily on the
files in a superblock, so it better belongs here (eg. see also
fs_may_remount_ro).[AV: ... and it shouldn't be static after that move]
Signed-off-by: Nick Piggin
Signed-off-by: Al Viro
09 May, 2009
2 commits
-
Signed-off-by: H Hartley Sweeten
Cc: Subrata Modak
Signed-off-by: Al Viro -
Does equivalent of up_write(&s->s_umount); deactivate_super(s);
However, it does not does not unlock it until it's all over.
As the result, it's safe to use to dispose of new superblock on ->get_sb()
failure exits - nobody will see the sucker until it's all over.
Equivalent using up_write/deactivate_super is safe for that purpose
if superblock is either safe to use or has NULL ->s_root when we unlock.
Normally filesystems take the required precautions, but
a) we do have bugs in that area in some of them.
b) up_write/deactivate_super sequence is extremely common,
so the helper makes sense anyway.Signed-off-by: Al Viro
07 Apr, 2009
1 commit
-
The mqueuefs filesystem will use this helper as well. Proc's main get_sb
could also be made to use it, but that will require a bit more rework.Signed-off-by: Serge E. Hallyn
Cc: Cedric Le Goater
Cc: Alexey Dobriyan
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Apr, 2009
1 commit
-
Export a number of functions for CacheFiles's use.
Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Rik van Riel
Acked-by: Al Viro
Tested-by: Daire Byrne
28 Mar, 2009
3 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits)
fs: avoid I_NEW inodes
Merge code for single and multiple-instance mounts
Remove get_init_pts_sb()
Move common mknod_ptmx() calls into caller
Parse mount options just once and copy them to super block
Unroll essentials of do_remount_sb() into devpts
vfs: simple_set_mnt() should return void
fs: move bdev code out of buffer.c
constify dentry_operations: rest
constify dentry_operations: configfs
constify dentry_operations: sysfs
constify dentry_operations: JFS
constify dentry_operations: OCFS2
constify dentry_operations: GFS2
constify dentry_operations: FAT
constify dentry_operations: FUSE
constify dentry_operations: procfs
constify dentry_operations: ecryptfs
constify dentry_operations: CIFS
constify dentry_operations: AFS
... -
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: (27 commits)
ext2: Zero our b_size in ext2_quota_read()
trivial: fix typos/grammar errors in fs/Kconfig
quota: Coding style fixes
quota: Remove superfluous inlines
quota: Remove uppercase aliases for quota functions.
nfsd: Use lowercase names of quota functions
jfs: Use lowercase names of quota functions
udf: Use lowercase names of quota functions
ufs: Use lowercase names of quota functions
reiserfs: Use lowercase names of quota functions
ext4: Use lowercase names of quota functions
ext3: Use lowercase names of quota functions
ext2: Use lowercase names of quota functions
ramfs: Remove quota call
vfs: Use lowercase names of quota functions
quota: Remove dqbuf_t and other cleanups
quota: Remove NODQUOT macro
quota: Make global quota locks cacheline aligned
quota: Move quota files into separate directory
ext4: quota reservation for delayed allocation
... -
simple_set_mnt() is defined as returning 'int' but always returns 0.
Callers assume simple_set_mnt() never fails and don't properly cleanup if
it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev()
should:up_write(sb->s_unmount);
deactivate_super(sb);if simple_set_mnt() fails.
Since simple_set_mnt() never fails, would be cleaner if it did not
return anything.[akpm@linux-foundation.org: fix build]
Signed-off-by: Sukadev Bhattiprolu
Acked-by: Serge Hallyn
Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro
26 Mar, 2009
2 commits
-
Opencode a cheasy approach with kevent. The idea here is that we'll
add some generic delayed work infrastructure, which probably wont be
based on pdflush (or maybe it will, in which case we can just add it
back).This is in preparation for getting rid of pdflush completely.
Signed-off-by: Jens Axboe
-
Use lowercase names of quota functions instead of old uppercase ones.
Signed-off-by: Jan Kara
CC: Alexander Viro
13 Mar, 2009
1 commit
-
In sget(), destroy_super(s) is called with s->s_umount held, which makes
lockdep unhappy.Signed-off-by: Li Zefan
Cc: Al Viro
Acked-by: Peter Zijlstra
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
19 Feb, 2009
1 commit
-
Li Zefan said:
Thread 1:
for ((; ;))
{
mount -t cpuset xxx /mnt > /dev/null 2>&1
cat /mnt/cpus > /dev/null 2>&1
umount /mnt > /dev/null 2>&1
}Thread 2:
for ((; ;))
{
mount -t cpuset xxx /mnt > /dev/null 2>&1
umount /mnt > /dev/null 2>&1
}(Note: It is irrelevant which cgroup subsys is used.)
After a while a lockdep warning showed up:
=============================================
[ INFO: possible recursive locking detected ]
2.6.28 #479
---------------------------------------------
mount/13554 is trying to acquire lock:
(&type->s_umount_key#19){--..}, at: [] sget+0x5e/0x321but task is already holding lock:
(&type->s_umount_key#19){--..}, at: [] sget+0x1e2/0x321other info that might help us debug this:
1 lock held by mount/13554:
#0: (&type->s_umount_key#19){--..}, at: [] sget+0x1e2/0x321stack backtrace:
Pid: 13554, comm: mount Not tainted 2.6.28-mc #479
Call Trace:
[] validate_chain+0x4c6/0xbbd
[] __lock_acquire+0x676/0x700
[] lock_acquire+0x5d/0x7a
[] ? sget+0x5e/0x321
[] down_write+0x34/0x50
[] ? sget+0x5e/0x321
[] sget+0x5e/0x321
[] ? cgroup_set_super+0x0/0x3e
[] ? cgroup_test_super+0x0/0x2f
[] cgroup_get_sb+0x98/0x2e7
[] cpuset_get_sb+0x4a/0x5f
[] vfs_kern_mount+0x40/0x7b
[] do_kern_mount+0x37/0xbf
[] do_mount+0x5c3/0x61a
[] ? copy_mount_options+0x2c/0x111
[] sys_mount+0x69/0xa0
[] sysenter_do_call+0x12/0x31The cause is after alloc_super() and then retry, an old entry in list
fs_supers is found, so grab_super(old) is called, but both functions hold
s_umount lock:struct super_block *sget(...)
{
...
retry:
spin_lock(&sb_lock);
if (test) {
list_for_each_entry(old, &type->fs_supers, s_instances) {
if (!test(old, data))
continue;
if (!grab_super(old)) s_umount);
goto retry;
if (s)
destroy_super(s);
return old;
}
}
if (!s) {
spin_unlock(&sb_lock);
s = alloc_super(type); s_umount)
if (!s)
return ERR_PTR(-ENOMEM);
goto retry;
}
...
}It seems like a false positive, and seems like VFS but not cgroup needs to
be fixed.Peter said:
We can simply put the new s_umount instance in a but lockdep doesn't
particularly cares about subclass order.If there's any issue with the callers of sget() assuming the s_umount lock
being of sublcass 0, then there is another annotation we can use to fix
that, but lets not bother with that if this is sufficient.Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12673
Signed-off-by: Peter Zijlstra
Tested-by: Li Zefan
Reported-by: Li Zefan
Cc: Al Viro
Cc: Paul Menage
Cc: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Feb, 2009
1 commit
-
Rename the async_*_special() functions to async_*_domain(), which
describes the purpose of these functions much better.
[Broke up long lines to silence checkpatch]Signed-off-by: Cornelia Huck
Signed-off-by: Arjan van de Ven
14 Jan, 2009
1 commit
-
Signed-off-by: Heiko Carstens
09 Jan, 2009
2 commits
-
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits)
jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs
ext4: Remove "extents" mount option
block: Add Kconfig help which notes that ext4 needs CONFIG_LBD
ext4: Make printk's consistently prefixed with "EXT4-fs: "
ext4: Add sanity checks for the superblock before mounting the filesystem
ext4: Add mount option to set kjournald's I/O priority
jbd2: Submit writes to the journal using WRITE_SYNC
jbd2: Add pid and journal device name to the "kjournald2 starting" message
ext4: Add markers for better debuggability
ext4: Remove code to create the journal inode
ext4: provide function to release metadata pages under memory pressure
ext3: provide function to release metadata pages under memory pressure
add releasepage hooks to block devices which can be used by file systems
ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
ext4: Init the complete page while building buddy cache
ext4: Don't allow new groups to be added during block allocation
ext4: mark the blocks/inode bitmap beyond end of group as used
ext4: Use new buffer_head flag to check uninit group bitmaps initialization
ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
ext4: code cleanup
... -
sync_filesystems() shouldn't be calling async_synchronize_full_special
while holding a spinlock. The second while loop in that function is the
right place for this anyway.Signed-off-by: Dave Kleikamp
Cc: Arjan van de Ven
Reported-by: Grissiom
Signed-off-by: Linus Torvalds
08 Jan, 2009
1 commit
-
this makes "rm -rf" on a (names cached) kernel tree go from
11.6 to 8.6 seconds on an ext3 filesystemSigned-off-by: Arjan van de Ven
03 Jan, 2009
1 commit
-
Implement blkdev_releasepage() to release the buffer_heads and pages
after we release private data belonging to a mounted filesystem.Cc: Toshiyuki Okajima
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: "Theodore Ts'o"
20 Dec, 2008
1 commit
-
Pass mount flags to security_sb_kern_mount(), so security modules
can determine if a mount operation is being performed by the kernel.Signed-off-by: James Morris
Acked-by: Stephen Smalley
24 Oct, 2008
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
[PATCH] kill the rest of struct file propagation in block ioctls
[PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
[PATCH] get rid of blkdev_locked_ioctl()
[PATCH] get rid of blkdev_driver_ioctl()
[PATCH] sanitize blkdev_get() and friends
[PATCH] remember mode of reiserfs journal
[PATCH] propagate mode through swsusp_close()
[PATCH] propagate mode through open_bdev_excl/close_bdev_excl
[PATCH] pass fmode_t to blkdev_put()
[PATCH] kill the unused bsize on the send side of /dev/loop
[PATCH] trim file propagation in block/compat_ioctl.c
[PATCH] end of methods switch: remove the old ones
[PATCH] switch sr
[PATCH] switch sd
[PATCH] switch ide-scsi
[PATCH] switch tape_block
[PATCH] switch dcssblk
[PATCH] switch dasd
[PATCH] switch mtd_blkdevs
[PATCH] switch mmc
...
23 Oct, 2008
2 commits
-
Signed-off-by: Alexey Dobriyan
-
Signed-off-by: Alexey Dobriyan
21 Oct, 2008
1 commit
-
replace open_bdev_excl/close_bdev_excl with variants taking fmode_t.
superblock gets the value used to mount it stored in sb->s_modeSigned-off-by: Al Viro
25 Jul, 2008
1 commit
-
[Summary]
Split LRU-list of unused dentries to one per superblock to avoid soft
lock up during NFS mounts and remounting of any filesystem.Previously I posted here:
http://lkml.org/lkml/2008/3/5/590[Descriptions]
- background
dentry_unused is a list of dentries which are not referenced.
dentry_unused grows up when references on directories or files are
released. This list can be very long if there is huge free memory.- the problem
When shrink_dcache_sb() is called, it scans all dentry_unused linearly
under spin_lock(), and if dentry->d_sb is differnt from given
superblock, scan next dentry. This scan costs very much if there are
many entries, and very ineffective if there are many superblocks.IOW, When we need to shrink unused dentries on one dentry, but scans
unused dentries on all superblocks in the system. For example, we scan
500 dentries to unmount a filesystem, but scans 1,000,000 or more unused
dentries on other superblocks.In our case , At mounting NFS*, shrink_dcache_sb() is called to shrink
unused dentries on NFS, but scans 100,000,000 unused dentries on
superblocks in the system such as local ext3 filesystems. I hear NFS
mounting took 1 min on some system in use.* : NFS uses virtual filesystem in rpc layer, so NFS is affected by
this problem.100,000,000 is possible number on large systems.
Per-superblock LRU of unused dentried can reduce the cost in
reasonable manner.- How to fix
I found this problem is solved by David Chinner's "Per-superblock
unused dentry LRU lists V3"(1), so I rebase it and add some fix to
reclaim with fairness, which is in Andrew Morton's comments(2).1) http://lkml.org/lkml/2006/5/25/318
2) http://lkml.org/lkml/2006/5/25/320Split LRU-list of unused dentries to each superblocks. Then, NFS
mounting will check dentries under a superblock instead of all. But
this spliting will break LRU of dentry-unused. So, I've attempted to
make reclaim unused dentrins with fairness by calculate number of
dentries to scan on this sb based on following waynumber of dentries to scan on this sb =
count * (number of dentries on this sb / number of dentries in the machine)- ToDo
- I have to measuring performance number and do stress tests.- When unmount occurs during prune_dcache(), scanning on same
superblock, It is unable to reach next superblock because it is gone
away. We restart scannig superblock from first one, it causes
unfairness of reclaim unused dentries on first superblock. But I think
this happens very rarely.- Test Results
Result on 6GB boxes with excessive unused dentries.
Without patch:
$ cat /proc/sys/fs/dentry-state
10181835 10180203 45 0 0 0
# mount -t nfs 10.124.60.70:/work/kernel-src nfs
real 0m1.830s
user 0m0.001s
sys 0m1.653sWith this patch:
$ cat /proc/sys/fs/dentry-state
10236610 10234751 45 0 0 0
# mount -t nfs 10.124.60.70:/work/kernel-src nfs
real 0m0.106s
user 0m0.002s
sys 0m0.032s[akpm@linux-foundation.org: fix comments]
Signed-off-by: Kentaro Makita
Cc: Neil Brown
Cc: Trond Myklebust
Cc: David Chinner
Cc: "J. Bruce Fields"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Apr, 2008
1 commit
-
Make the needlessly global __put_super() static.
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Apr, 2008
1 commit
-
Currently, we just turn quotas off on remount of filesystem to read-only
state. The patch below adds necessary framework so that we can turn quotas
off on remount RO but we are able to automatically reenable them again when
filesystem is remounted to RW state. All we need to do is to keep references
to inodes of quota files when remounting RO and using these references to
reenable quotas when remounting RW.Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Apr, 2008
1 commit
-
Signed-off-by: Al Viro
19 Apr, 2008
2 commits
-
There have been a few oopses caused by 'struct file's with NULL f_vfsmnts.
There was also a set of potentially missed mnt_want_write()s from
dentry_open() calls.This patch provides a very simple debugging framework to catch these kinds of
bugs. It will WARN_ON() them, but should stop us from having any oopses or
mnt_writer count imbalances.I'm quite convinced that this is a good thing because it found bugs in the
stuff I was working on as soon as I wrote it.[hch: made it conditional on a debug option.
But it's still a little bit too ugly][hch: merged forced remount r/o fix from Dave and akpm's fix for the fix]
Signed-off-by: Dave Hansen
Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro -
The emergency remount code forcibly removes FMODE_WRITE from
filps. The r/o bind mount code notices that this was done
without a proper mnt_drop_write() and properly gives a
warning.This patch does a mnt_drop_write() to keep everything
balanced.Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Hansen
Signed-off-by: Al Viro
25 Mar, 2008
1 commit
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] get stack footprint of pathname resolution back to relative sanity
[PATCH] double iput() on failure exit in hugetlb
[PATCH] double dput() on failure exit in tiny-shmem
[PATCH] fix up new filp allocators
[PATCH] check for null vfsmount in dentry_open()
[PATCH] reiserfs: eliminate private use of struct file in xattr
[PATCH] sanitize hppfs
hppfs pass vfsmount to dentry_open()
[PATCH] restore export of do_kern_mount()
20 Mar, 2008
1 commit
-
Fix kernel-doc notation warnings in fs/.
Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
* mark_files_ro
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
* lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
* lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
* lookup_one_len: filesystem helper to lookup single pathname component
Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
* bh_uptodate_or_lock: Test whether the buffer is uptodate
Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
* bh_submit_read: Submit a locked buffer for reading
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
* writeback_acquire: attempt to get exclusive writeback access to a device
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
* writeback_in_progress: determine whether there is writeback in progress
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
* writeback_release: relinquish exclusive writeback access against a device.
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
* void journal_invalidatepage()Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Mar, 2008
1 commit
-
vfs_kern_mount() requires having a reference to fs type, which
makes it impossible for module to create procfs, etc. private
mount. Open-coding is not an option, since e.g. put_filesystem()
is _not_ exported, and for a good reason.Signed-off-by: Al Viro
06 Mar, 2008
1 commit
-
Introduce new LSM interfaces to allow an FS to deal with their own mount
options. This includes a new string parsing function exported from the
LSM that an FS can use to get a security data blob and a new security
data blob. This is particularly useful for an FS which uses binary
mount data, like NFS, which does not pass strings into the vfs to be
handled by the loaded LSM. Also fix a BUG() in both SELinux and SMACK
when dealing with binary mount data. If the binary mount data is less
than one page the copy_page() in security_sb_copy_data() can cause an
illegal page fault and boom. Remove all NFSisms from the SELinux code
since they were broken by past NFS changes.Signed-off-by: Eric Paris
Acked-by: Stephen Smalley
Acked-by: Casey Schaufler
Signed-off-by: James Morris
09 Feb, 2008
2 commits
-
Turn off quotas before filesystem is remounted read only. Otherwise quota
will try to write to read-only filesystem which does no good... We could
also just refuse to remount ro when quota is enabled but turning quota off
is consistent with what we do on umount.Signed-off-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a new s_options field to struct super_block. Filesystems can save
mount options passed to them in mount or remount. It is automatically
freed when the superblock is destroyed.A new helper function, generic_show_options() is introduced, which uses
this field to display the mount options in /proc/mounts.Another helper function, save_mount_options() may be used by
filesystems to save the options in the super block.Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Oct, 2007
1 commit
-
* Convert files to UTF-8.
* Also correct some people's names
(one example is Eißfeldt, which was found in a source file.
Given that the author used an ß at all in a source file
indicates that the real name has in fact a 'ß' and not an 'ss',
which is commonly used as a substitute for 'ß' when limited to
7bit.)* Correct town names (Goettingen -> Göttingen)
* Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)
Signed-off-by: Jan Engelhardt
Signed-off-by: Adrian Bunk