Eric Lee / smarc-fsl-linux-kernel

25 Jul, 2008

1 commit

da3bbdd46 fix soft lock up at NFS mount via per-SB LRU-list of unused dentries ... Browse Code »

[Summary]

Split LRU-list of unused dentries to one per superblock to avoid soft
lock up during NFS mounts and remounting of any filesystem.

Previously I posted here:
http://lkml.org/lkml/2008/3/5/590

[Descriptions]

- background

dentry_unused is a list of dentries which are not referenced.
dentry_unused grows up when references on directories or files are
released. This list can be very long if there is huge free memory.

- the problem

When shrink_dcache_sb() is called, it scans all dentry_unused linearly
under spin_lock(), and if dentry->d_sb is differnt from given
superblock, scan next dentry. This scan costs very much if there are
many entries, and very ineffective if there are many superblocks.

IOW, When we need to shrink unused dentries on one dentry, but scans
unused dentries on all superblocks in the system. For example, we scan
500 dentries to unmount a filesystem, but scans 1,000,000 or more unused
dentries on other superblocks.

In our case , At mounting NFS*, shrink_dcache_sb() is called to shrink
unused dentries on NFS, but scans 100,000,000 unused dentries on
superblocks in the system such as local ext3 filesystems. I hear NFS
mounting took 1 min on some system in use.

* : NFS uses virtual filesystem in rpc layer, so NFS is affected by
this problem.

100,000,000 is possible number on large systems.

Per-superblock LRU of unused dentried can reduce the cost in
reasonable manner.

- How to fix

I found this problem is solved by David Chinner's "Per-superblock
unused dentry LRU lists V3"(1), so I rebase it and add some fix to
reclaim with fairness, which is in Andrew Morton's comments(2).

1) http://lkml.org/lkml/2006/5/25/318
2) http://lkml.org/lkml/2006/5/25/320

Split LRU-list of unused dentries to each superblocks. Then, NFS
mounting will check dentries under a superblock instead of all. But
this spliting will break LRU of dentry-unused. So, I've attempted to
make reclaim unused dentrins with fairness by calculate number of
dentries to scan on this sb based on following way

number of dentries to scan on this sb =
count * (number of dentries on this sb / number of dentries in the machine)

- ToDo
- I have to measuring performance number and do stress tests.

- When unmount occurs during prune_dcache(), scanning on same
superblock, It is unable to reach next superblock because it is gone
away. We restart scannig superblock from first one, it causes
unfairness of reclaim unused dentries on first superblock. But I think
this happens very rarely.

- Test Results

Result on 6GB boxes with excessive unused dentries.

Without patch:

$ cat /proc/sys/fs/dentry-state
10181835 10180203 45 0 0 0
# mount -t nfs 10.124.60.70:/work/kernel-src nfs
real 0m1.830s
user 0m0.001s
sys 0m1.653s

With this patch:
$ cat /proc/sys/fs/dentry-state
10236610 10234751 45 0 0 0
# mount -t nfs 10.124.60.70:/work/kernel-src nfs
real 0m0.106s
user 0m0.002s
sys 0m0.032s

[akpm@linux-foundation.org: fix comments]
Signed-off-by: Kentaro Makita
Cc: Neil Brown
Cc: Trond Myklebust
Cc: David Chinner
Cc: "J. Bruce Fields"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kentaro Makita
2008-07-25 01:47:15 +0800

29 Apr, 2008

1 commit

6b09ae669 make __put_super() static ... Browse Code »

Make the needlessly global __put_super() static.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-04-29 23:06:00 +0800

28 Apr, 2008

1 commit

0ff5af834 quota: quota core changes for quotaon on remount ... Browse Code »

Currently, we just turn quotas off on remount of filesystem to read-only
state. The patch below adds necessary framework so that we can turn quotas
off on remount RO but we are able to automatically reenable them again when
filesystem is remounted to RW state. All we need to do is to keep references
to inodes of quota files when remounting RO and using these references to
reenable quotas when remounting RW.

Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2008-04-28 23:58:33 +0800

22 Apr, 2008

1 commit

6d59e7f58 [PATCH] move a bunch of declarations to fs/internal.h ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-04-22 11:11:01 +0800

19 Apr, 2008

2 commits

ad775f5a8 [PATCH] r/o bind mounts: debugging for missed calls ... Browse Code »

There have been a few oopses caused by 'struct file's with NULL f_vfsmnts.
There was also a set of potentially missed mnt_want_write()s from
dentry_open() calls.

This patch provides a very simple debugging framework to catch these kinds of
bugs. It will WARN_ON() them, but should stop us from having any oopses or
mnt_writer count imbalances.

I'm quite convinced that this is a good thing because it found bugs in the
stuff I was working on as soon as I wrote it.

[hch: made it conditional on a debug option.
But it's still a little bit too ugly]

[hch: merged forced remount r/o fix from Dave and akpm's fix for the fix]

Signed-off-by: Dave Hansen
Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Dave Hansen
2008-04-19 12:29:28 +0800
49e0d02cf [PATCH] r/o bind mounts: drop write during emergency remount ... Browse Code »

The emergency remount code forcibly removes FMODE_WRITE from
filps. The r/o bind mount code notices that this was done
without a proper mnt_drop_write() and properly gives a
warning.

This patch does a mnt_drop_write() to keep everything
balanced.

Acked-by: Al Viro
Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Hansen
Signed-off-by: Al Viro

Dave Hansen
2008-04-19 12:25:33 +0800

25 Mar, 2008

1 commit

7ed7fe5e8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] get stack footprint of pathname resolution back to relative sanity
[PATCH] double iput() on failure exit in hugetlb
[PATCH] double dput() on failure exit in tiny-shmem
[PATCH] fix up new filp allocators
[PATCH] check for null vfsmount in dentry_open()
[PATCH] reiserfs: eliminate private use of struct file in xattr
[PATCH] sanitize hppfs
hppfs pass vfsmount to dentry_open()
[PATCH] restore export of do_kern_mount()

Linus Torvalds
2008-03-25 23:57:47 +0800

20 Mar, 2008

1 commit

a6b91919e fs: fix kernel-doc notation warnings ... Browse Code »

Fix kernel-doc notation warnings in fs/.

Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
* mark_files_ro
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
* lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
* lease_get_mtime
Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
* lookup_one_len: filesystem helper to lookup single pathname component
Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
* bh_uptodate_or_lock: Test whether the buffer is uptodate
Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
* bh_submit_read: Submit a locked buffer for reading
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
* writeback_acquire: attempt to get exclusive writeback access to a device
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
* writeback_in_progress: determine whether there is writeback in progress
Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
* writeback_release: relinquish exclusive writeback access against a device.
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
* void journal_invalidatepage()

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2008-03-20 09:53:36 +0800

18 Mar, 2008

1 commit

8a4e98d9d [PATCH] restore export of do_kern_mount() ... Browse Code »

vfs_kern_mount() requires having a reference to fs type, which
makes it impossible for module to create procfs, etc. private
mount. Open-coding is not an option, since e.g. put_filesystem()
is _not_ exported, and for a good reason.

Signed-off-by: Al Viro

Al Viro
2008-03-18 10:58:04 +0800

06 Mar, 2008

1 commit

e00075298 LSM/SELinux: Interfaces to allow FS to control mount options ... Browse Code »

Introduce new LSM interfaces to allow an FS to deal with their own mount
options. This includes a new string parsing function exported from the
LSM that an FS can use to get a security data blob and a new security
data blob. This is particularly useful for an FS which uses binary
mount data, like NFS, which does not pass strings into the vfs to be
handled by the loaded LSM. Also fix a BUG() in both SELinux and SMACK
when dealing with binary mount data. If the binary mount data is less
than one page the copy_page() in security_sb_copy_data() can cause an
illegal page fault and boom. Remove all NFSisms from the SELinux code
since they were broken by past NFS changes.

Signed-off-by: Eric Paris
Acked-by: Stephen Smalley
Acked-by: Casey Schaufler
Signed-off-by: James Morris

Eric Paris
2008-03-06 05:40:53 +0800

09 Feb, 2008

2 commits

66191dc62 quota: turn quotas off when remounting read-only ... Browse Code »

Turn off quotas before filesystem is remounted read only. Otherwise quota
will try to write to read-only filesystem which does no good... We could
also just refuse to remount ro when quota is enabled but turning quota off
is consistent with what we do on umount.

Signed-off-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2008-02-09 01:22:44 +0800
b3b304a23 mount options: add generic_show_options() ... Browse Code »

Add a new s_options field to struct super_block. Filesystems can save
mount options passed to them in mount or remount. It is automatically
freed when the superblock is destroyed.

A new helper function, generic_show_options() is introduced, which uses
this field to display the mount options in /proc/mounts.

Another helper function, save_mount_options() may be used by
filesystems to save the options in the super block.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2008-02-09 01:22:39 +0800

20 Oct, 2007

5 commits

96de0e252 Convert files to UTF-8 and some cleanups ... Browse Code »

* Convert files to UTF-8.

* Also correct some people's names
(one example is Eißfeldt, which was found in a source file.
Given that the author used an ß at all in a source file
indicates that the real name has in fact a 'ß' and not an 'ss',
which is commonly used as a substitute for 'ß' when limited to
7bit.)

* Correct town names (Goettingen -> Göttingen)

* Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)

Signed-off-by: Jan Engelhardt
Signed-off-by: Adrian Bunk

Jan Engelhardt
2007-10-20 05:21:04 +0800
3a4fa0a25 Fix misspellings of "system", "controller", "interrupt" and "necessary". ... Browse Code »

Fix the various misspellings of "system", controller", "interrupt" and
"[un]necessary".

Signed-off-by: Robert P. J. Day
Signed-off-by: Adrian Bunk

Robert P. J. Day
2007-10-20 05:10:43 +0800
8bf9725c2 pid namespaces: introduce MS_KERNMOUNT flag ... Browse Code »

This flag tells the .get_sb callback that this is a kern_mount() call so that
it can trust *data pointer to be valid in-kernel one. If this flag is passed
from the user process, it is cleared since the *data pointer is not a valid
kernel object.

Running a few steps forward - this will be needed for proc to create the
superblock and store a valid pid namespace on it during the namespace
creation. The reason, why the namespace cannot live without proc mount is
described in the appropriate patch.

Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: Sukadev Bhattiprolu
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:38 +0800
d47301271 fs/super.c: use list_for_each_entry() instead of list_for_each() ... Browse Code »

fs/super.c: use list_for_each_entry() instead of list_for_each() in
sget()

[akpm@linux-foundation.org: clean up some crap while we're there]
Signed-off-by: Matthias Kaehlcke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthias Kaehlcke
2007-10-20 02:53:38 +0800
c18479fe0 put declaration of put_filesystem() in fs.h ... Browse Code »

Declarations go into headers.

Signed-off-by: Miklos Szeredi
Cc: Ram Pai
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-10-20 02:53:33 +0800

17 Oct, 2007

1 commit

0e0f4fc22 writeback: fix periodic superblock dirty inode flushing ... Browse Code »

Current -mm tree has bucketful of bug fixes in periodic writeback path.
However, we still hit a glitch where dirty pages on a given inode aren't
completely flushed to the disk, and system will accumulate large amount of
dirty pages beyond what dirty_expire_interval is designed for.

The problem is __sync_single_inode() will move an inode to sb->s_dirty list
even when there are more pending dirty pages on that inode. If there is
another inode with a small number of dirty pages, we hit a case where the loop
iteration in wb_kupdate() terminates prematurely because wbc.nr_to_write > 0.
Thus leaving the inode that has large amount of dirty pages behind and it has
to wait for another dirty_writeback_interval before we flush it again. We
effectively only write out MAX_WRITEBACK_PAGES every dirty_writeback_interval.
If the rate of dirtying is sufficiently high, the system will start
accumulate a large number of dirty pages.

So fix it by having another sb->s_more_io list on which to park the inode
while we iterate through sb->s_io and to allow each dirty inode which resides
on that sb to have an equal chance of flushing some amount of dirty pages.

Signed-off-by: Ken Chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken Chen
2007-10-17 23:43:02 +0800

17 Jul, 2007

1 commit

b4c07bce7 hugetlbfs: handle empty options string ... Browse Code »

I was seeing a null pointer deref in fs/super.c:vfs_kern_mount().
Some file system get_sb() handler was returning NULL mnt_sb with
a non-negative return value. I also noticed a "hugetlbfs: Bad
mount option:" message in the log.

Turns out that hugetlbfs_parse_options() was not checking for an
empty option string after call to strsep(). On failure,
hugetlbfs_parse_options() returns 1. hugetlbfs_fill_super() just
passed this return code back up the call stack where
vfs_kern_mount() missed the error and proceeded with a NULL mnt_sb.

Apparently introduced by patch:
hugetlbfs-use-lib-parser-fix-docs.patch

The problem was exposed by this line in my fstab:

none /huge hugetlbfs defaults 0 0

It can also be demonstrated by invoking mount of hugetlbfs
directly with no options or a bogus option.

This patch:

1) adds the check for empty option to hugetlbfs_parse_options(),
2) enhances the error message to bracket any unrecognized
option with quotes ,
3) modifies hugetlbfs_parse_options() to return -EINVAL on any
unrecognized option,
4) adds a BUG_ON() to vfs_kern_mount() to catch any get_sb()
handler that returns a NULL mnt->mnt_sb with a return value
>= 0.

Signed-off-by: Lee Schermerhorn
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2007-07-17 00:05:46 +0800

09 May, 2007

1 commit

79c0b2df7 add filesystem subtype support ... Browse Code »

There's a slight problem with filesystem type representation in fuse
based filesystems.

From the kernel's view, there are just two filesystem types: fuse and
fuseblk. From the user's view there are lots of different filesystem
types. The user is not even much concerned if the filesystem is fuse based
or not. So there's a conflict of interest in how this should be
represented in fstab, mtab and /proc/mounts.

The current scheme is to encode the real filesystem type in the mount
source. So an sshfs mount looks like this:

sshfs#user@server:/ /mnt/server fuse rw,nosuid,nodev,...

This url-ish syntax works OK for sshfs and similar filesystems. However
for block device based filesystems (ntfs-3g, zfs) it doesn't work, since
the kernel expects the mount source to be a real device name.

A possibly better scheme would be to encode the real type in the type
field as "type.subtype". So fuse mounts would look like this:

/dev/hda1 /mnt/windows fuseblk.ntfs-3g rw,...
user@server:/ /mnt/server fuse.sshfs rw,nosuid,nodev,...

This patch adds the necessary code to the kernel so that this can be
correctly displayed in /proc/mounts.

Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2007-05-09 02:15:01 +0800

28 Apr, 2007

1 commit

3106d46f5 the overdue removal of the mount/umount uevents ... Browse Code »

This patch contains the overdue removal of the mount/umount uevents.

Signed-off-by: Adrian Bunk
Signed-off-by: Greg Kroah-Hartman

Adrian Bunk
2007-04-28 01:57:31 +0800

13 Feb, 2007

1 commit

ee9b6d61a [PATCH] Mark struct super_operations const ... Browse Code »

This patch is inspired by Arjan's "Patch series to mark struct
file_operations and struct inode_operations const".

Compile tested with gcc & sparse.

Signed-off-by: Josef 'Jeff' Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef 'Jeff' Sipek
2007-02-13 01:48:47 +0800

12 Jan, 2007

1 commit

f73ca1b76 [PATCH] Revert bd_mount_mutex back to a semaphore ... Browse Code »

Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.

(XFS unlocks the semaphore from a different task, by design. The mutex
code warns about this)

Signed-off-by: Dave Chinner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Chinner
2007-01-12 10:18:21 +0800

09 Dec, 2006

1 commit

0f7fc9e4d [PATCH] VFS: change struct file to use struct path ... Browse Code »

This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.

Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef "Jeff" Sipek
2006-12-09 00:28:41 +0800

04 Dec, 2006

1 commit

914e26379 [PATCH] severing fs.h, radix-tree.h -> sched.h ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2006-12-04 15:00:24 +0800

12 Oct, 2006

1 commit

c636ebdb1 [PATCH] VFS: Destroy the dentries contributed by a superblock on unmounting ... Browse Code »

The attached patch destroys all the dentries attached to a superblock in one go
by:

(1) Destroying the tree rooted at s_root.

(2) Destroying every entry in the anon list, one at a time.

(3) Each entry in the anon list has its subtree consumed from the leaves
inwards.

This reduces the amount of work generic_shutdown_super() does, and avoids
iterating through the dentry_unused list.

Note that locking is almost entirely absent in the shrink_dcache_for_umount*()
functions added by this patch. This is because:

(1) at the point the filesystem calls generic_shutdown_super(), it is not
permitted to further touch the superblock's set of dentries, and nor may
it remove aliases from inodes;

(2) the dcache memory shrinker now skips dentries that are being unmounted;
and

(3) the superblock no longer has any external references through which the VFS
can reach it.

Given these points, the only locking we need to do is when we remove dentries
from the unused list and the name hashes, which we do a directory's worth at a
time.

We also don't need to guard against reference counts going to zero unexpectedly
and removing bits of the tree we're working on as nothing else can call dput().

A cut down version of dentry_iput() has been folded into
shrink_dcache_for_umount_subtree() function. Apart from not needing to unlock
things, it also doesn't need to check for inotify watches.

In this version of the patch, the complaint about a dentry still being in use
has been expanded from a single BUG_ON() and now gives much more information.

Signed-off-by: David Howells
Acked-by: NeilBrown
Acked-by: Ian Kent
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-10-12 02:14:25 +0800

01 Oct, 2006

2 commits

9361401eb [PATCH] BLOCK: Make it possible to disable the block layer [try #6] ... Browse Code »

Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.

This patch does the following:

(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.

(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:

(*) Block I/O tracing.

(*) Disk partition code.

(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.

(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.

(*) MTD blockdev handling and FTL.

(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.

(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.

(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.

(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.

(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:

(*) Default blockdev file operations (to give error ENODEV on opening).

(*) Makes some /proc changes:

(*) /proc/devices does not list any blockdevs.

(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.

(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).

(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.

Signed-Off-By: David Howells
Signed-off-by: Jens Axboe

David Howells
2006-10-01 02:52:31 +0800
cf9a2ae8d [PATCH] BLOCK: Move functions out of buffer code [try #6] ... Browse Code »

Move some functions out of the buffering code that aren't strictly buffering
specific. This is a precursor to being able to disable the block layer.

(*) Moved some stuff out of fs/buffer.c:

(*) The file sync and general sync stuff moved to fs/sync.c.

(*) The superblock sync stuff moved to fs/super.c.

(*) do_invalidatepage() moved to mm/truncate.c.

(*) try_to_release_page() moved to mm/filemap.c.

(*) Moved some related declarations between header files:

(*) declarations for do_invalidatepage() and try_to_release_page() moved
to linux/mm.h.

(*) __set_page_dirty_buffers() moved to linux/buffer_head.h.

Signed-Off-By: David Howells
Signed-off-by: Jens Axboe

David Howells
2006-10-01 02:31:19 +0800

30 Sep, 2006

1 commit

9c4dbee79 [PATCH] fs: add lock annotation to grab_super ... Browse Code »

grab_super gets called with sb_lock held, and releases it. Add a lock
annotation to this function so that sparse can check callers for lock
pairing, and so that sparse will not complain about this function since it
intentionally uses the lock in this manner.

Signed-off-by: Josh Triplett
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josh Triplett
2006-09-30 00:18:08 +0800

07 Sep, 2006

1 commit

fe2bbc483 [PATCH] add missing desctiption in super.c ... Browse Code »

Adds kernel-doc for alloc_super() type in fs/super.c.

Signed-off-by: Henrik Kretzschmar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Henrik Kretzschmar
2006-09-07 02:00:01 +0800

04 Jul, 2006

2 commits

897c6ff95 [PATCH] lockdep: annotate sb ->s_umount ... Browse Code »

The s_umount rwsem needs to be classified as per-superblock since it's
perfectly legit to keep multiple of those recursively in the VFS locking
rules.

Has no effect on non-lockdep kernels.

Signed-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2006-07-04 06:27:09 +0800
cf5162499 [PATCH] lockdep: annotate ->s_lock ... Browse Code »

Teach special (per-filesystem) locking code to the lock validator.

Minimal effect on non-lockdep kernels: one extra parameter to alloc_super().

Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-07-04 06:27:09 +0800

01 Jul, 2006

1 commit

6ab3d5624 Remove obsolete #include <linux/config.h> ... Browse Code »

Signed-off-by: Jörn Engel
Signed-off-by: Adrian Bunk

Jörn Engel
2006-07-01 01:25:36 +0800

25 Jun, 2006

1 commit

816724e65 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ ... Browse Code »

Conflicts:

fs/nfs/inode.c
fs/super.c

Fix conflicts between patch 'NFS: Split fs/nfs/inode.c' and patch
'VFS: Permit filesystem to override root dentry on mount'

Trond Myklebust
2006-06-25 01:07:53 +0800

23 Jun, 2006

3 commits

726c33422 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry ... Browse Code »

Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800
454e2398b [PATCH] VFS: Permit filesystem to override root dentry on mount ... Browse Code »

Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.

(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().

(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().

This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.

However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.

[*] Anonymous until discovered from another tree.

(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Cc: Roland Dreier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800
0feae5c47 [PATCH] Fix dcache race during umount ... Browse Code »

The race is that the shrink_dcache_memory shrinker could get called while a
filesystem is being unmounted, and could try to prune a dentry belonging to
that filesystem.

If it does, then it will call in to iput on the inode while the dentry is
no longer able to be found by the umounting process. If iput takes a
while, generic_shutdown_super could get all the way though
shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without
ever waiting on this particular inode.

Eventually the superblock gets freed anyway and if the iput tried to touch
it (which some filesystems certainly do), it will lose. The promised
"Self-destruct in 5 seconds" doesn't lead to a nice day.

The race is closed by holding s_umount while calling prune_one_dentry on
someone else's dentry. As a down_read_trylock is used,
shrink_dcache_memory will no longer try to prune the dentry of a filesystem
that is being unmounted, and unmount will not be able to start until any
such active prune_one_dentry completes.

This requires that prune_dcache *knows* which filesystem (if any) it is
doing the prune on behalf of so that it can be careful of other
filesystems. shrink_dcache_memory isn't called it on behalf of any
filesystem, and so is careful of everything.

shrink_dcache_anon is now passed a super_block rather than the s_anon list
out of the superblock, so it can get the s_anon list itself, and can pass
the superblock down to prune_dcache.

If prune_dcache finds a dentry that it cannot free, it leaves it where it
is (at the tail of the list) and exits, on the assumption that some other
thread will be removing that dentry soon. To try to make sure that some
work gets done, a limited number of dnetries which are untouchable are
skipped over while choosing the dentry to work on.

I believe this race was first found by Kirill Korotaev.

Cc: Jan Blunck
Acked-by: Kirill Korotaev
Cc: Olaf Hering
Acked-by: Balbir Singh
Signed-off-by: Neil Brown
Signed-off-by: Balbir Singh
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-23 06:05:57 +0800

09 Jun, 2006

2 commits

1f5ce9e93 VFS: Unexport do_kern_mount() and clean up simple_pin_fs() ... Browse Code »

Replace all module uses with the new vfs_kern_mount() interface, and fix up
simple_pin_fs().

Signed-off-by: Trond Myklebust

Trond Myklebust
2006-06-09 21:34:16 +0800
bb4a58bf4 VFS: Add GPL_EXPORTED function vfs_kern_mount() ... Browse Code »

do_kern_mount() does not allow the kernel to use private mount interfaces
without exposing the same interfaces to userland. The problem is that the
filesystem is referenced by name, thus meaning that it and its mount
interface must be registered in the global filesystem list.

vfs_kern_mount() passes the struct file_system_type as an explicit
parameter in order to overcome this limitation.

Signed-off-by: Trond Myklebust

Trond Myklebust
2006-06-09 21:34:15 +0800

27 Mar, 2006

1 commit

353ab6e97 [PATCH] sem2mutex: fs/ ... Browse Code »

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar
Cc: Eric Van Hensbergen
Cc: Robert Love
Cc: Thomas Gleixner
Cc: David Woodhouse
Cc: Neil Brown
Cc: Trond Myklebust
Cc: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-03-27 00:56:55 +0800