17 Jul, 2007

1 commit

  • I was seeing a null pointer deref in fs/super.c:vfs_kern_mount().
    Some file system get_sb() handler was returning NULL mnt_sb with
    a non-negative return value. I also noticed a "hugetlbfs: Bad
    mount option:" message in the log.

    Turns out that hugetlbfs_parse_options() was not checking for an
    empty option string after call to strsep(). On failure,
    hugetlbfs_parse_options() returns 1. hugetlbfs_fill_super() just
    passed this return code back up the call stack where
    vfs_kern_mount() missed the error and proceeded with a NULL mnt_sb.

    Apparently introduced by patch:
    hugetlbfs-use-lib-parser-fix-docs.patch

    The problem was exposed by this line in my fstab:

    none /huge hugetlbfs defaults 0 0

    It can also be demonstrated by invoking mount of hugetlbfs
    directly with no options or a bogus option.

    This patch:

    1) adds the check for empty option to hugetlbfs_parse_options(),
    2) enhances the error message to bracket any unrecognized
    option with quotes ,
    3) modifies hugetlbfs_parse_options() to return -EINVAL on any
    unrecognized option,
    4) adds a BUG_ON() to vfs_kern_mount() to catch any get_sb()
    handler that returns a NULL mnt->mnt_sb with a return value
    >= 0.

    Signed-off-by: Lee Schermerhorn
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

09 May, 2007

1 commit

  • There's a slight problem with filesystem type representation in fuse
    based filesystems.

    From the kernel's view, there are just two filesystem types: fuse and
    fuseblk. From the user's view there are lots of different filesystem
    types. The user is not even much concerned if the filesystem is fuse based
    or not. So there's a conflict of interest in how this should be
    represented in fstab, mtab and /proc/mounts.

    The current scheme is to encode the real filesystem type in the mount
    source. So an sshfs mount looks like this:

    sshfs#user@server:/ /mnt/server fuse rw,nosuid,nodev,...

    This url-ish syntax works OK for sshfs and similar filesystems. However
    for block device based filesystems (ntfs-3g, zfs) it doesn't work, since
    the kernel expects the mount source to be a real device name.

    A possibly better scheme would be to encode the real type in the type
    field as "type.subtype". So fuse mounts would look like this:

    /dev/hda1 /mnt/windows fuseblk.ntfs-3g rw,...
    user@server:/ /mnt/server fuse.sshfs rw,nosuid,nodev,...

    This patch adds the necessary code to the kernel so that this can be
    correctly displayed in /proc/mounts.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

28 Apr, 2007

1 commit


13 Feb, 2007

1 commit


12 Jan, 2007

1 commit

  • Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
    xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.

    (XFS unlocks the semaphore from a different task, by design. The mutex
    code warns about this)

    Signed-off-by: Dave Chinner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Chinner
     

09 Dec, 2006

1 commit

  • This patch changes struct file to use struct path instead of having
    independent pointers to struct dentry and struct vfsmount, and converts all
    users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

    Additionally, it adds two #define's to make the transition easier for users of
    the f_dentry and f_vfsmnt.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

04 Dec, 2006

1 commit


12 Oct, 2006

1 commit

  • The attached patch destroys all the dentries attached to a superblock in one go
    by:

    (1) Destroying the tree rooted at s_root.

    (2) Destroying every entry in the anon list, one at a time.

    (3) Each entry in the anon list has its subtree consumed from the leaves
    inwards.

    This reduces the amount of work generic_shutdown_super() does, and avoids
    iterating through the dentry_unused list.

    Note that locking is almost entirely absent in the shrink_dcache_for_umount*()
    functions added by this patch. This is because:

    (1) at the point the filesystem calls generic_shutdown_super(), it is not
    permitted to further touch the superblock's set of dentries, and nor may
    it remove aliases from inodes;

    (2) the dcache memory shrinker now skips dentries that are being unmounted;
    and

    (3) the superblock no longer has any external references through which the VFS
    can reach it.

    Given these points, the only locking we need to do is when we remove dentries
    from the unused list and the name hashes, which we do a directory's worth at a
    time.

    We also don't need to guard against reference counts going to zero unexpectedly
    and removing bits of the tree we're working on as nothing else can call dput().

    A cut down version of dentry_iput() has been folded into
    shrink_dcache_for_umount_subtree() function. Apart from not needing to unlock
    things, it also doesn't need to check for inotify watches.

    In this version of the patch, the complaint about a dentry still being in use
    has been expanded from a single BUG_ON() and now gives much more information.

    Signed-off-by: David Howells
    Acked-by: NeilBrown
    Acked-by: Ian Kent
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

01 Oct, 2006

2 commits

  • Make it possible to disable the block layer. Not all embedded devices require
    it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
    the block layer to be present.

    This patch does the following:

    (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
    support.

    (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
    an item that uses the block layer. This includes:

    (*) Block I/O tracing.

    (*) Disk partition code.

    (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

    (*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
    block layer to do scheduling. Some drivers that use SCSI facilities -
    such as USB storage - end up disabled indirectly from this.

    (*) Various block-based device drivers, such as IDE and the old CDROM
    drivers.

    (*) MTD blockdev handling and FTL.

    (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
    taking a leaf out of JFFS2's book.

    (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
    linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
    however, still used in places, and so is still available.

    (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
    parts of linux/fs.h.

    (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

    (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

    (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
    is not enabled.

    (*) fs/no-block.c is created to hold out-of-line stubs and things that are
    required when CONFIG_BLOCK is not set:

    (*) Default blockdev file operations (to give error ENODEV on opening).

    (*) Makes some /proc changes:

    (*) /proc/devices does not list any blockdevs.

    (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

    (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

    (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
    given command other than Q_SYNC or if a special device is specified.

    (*) In init/do_mounts.c, no reference is made to the blockdev routines if
    CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

    (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
    error ENOSYS by way of cond_syscall if so).

    (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
    CONFIG_BLOCK is not set, since they can't then happen.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     
  • Move some functions out of the buffering code that aren't strictly buffering
    specific. This is a precursor to being able to disable the block layer.

    (*) Moved some stuff out of fs/buffer.c:

    (*) The file sync and general sync stuff moved to fs/sync.c.

    (*) The superblock sync stuff moved to fs/super.c.

    (*) do_invalidatepage() moved to mm/truncate.c.

    (*) try_to_release_page() moved to mm/filemap.c.

    (*) Moved some related declarations between header files:

    (*) declarations for do_invalidatepage() and try_to_release_page() moved
    to linux/mm.h.

    (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

30 Sep, 2006

1 commit

  • grab_super gets called with sb_lock held, and releases it. Add a lock
    annotation to this function so that sparse can check callers for lock
    pairing, and so that sparse will not complain about this function since it
    intentionally uses the lock in this manner.

    Signed-off-by: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     

07 Sep, 2006

1 commit


04 Jul, 2006

2 commits

  • The s_umount rwsem needs to be classified as per-superblock since it's
    perfectly legit to keep multiple of those recursively in the VFS locking
    rules.

    Has no effect on non-lockdep kernels.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Teach special (per-filesystem) locking code to the lock validator.

    Minimal effect on non-lockdep kernels: one extra parameter to alloc_super().

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

01 Jul, 2006

1 commit


25 Jun, 2006

1 commit


23 Jun, 2006

3 commits

  • Give the statfs superblock operation a dentry pointer rather than a superblock
    pointer.

    This complements the get_sb() patch. That reduced the significance of
    sb->s_root, allowing NFS to place a fake root there. However, NFS does
    require a dentry to use as a target for the statfs operation. This permits
    the root in the vfsmount to be used instead.

    linux/mount.h has been added where necessary to make allyesconfig build
    successfully.

    Interest has also been expressed for use with the FUSE and XFS filesystems.

    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Extend the get_sb() filesystem operation to take an extra argument that
    permits the VFS to pass in the target vfsmount that defines the mountpoint.

    The filesystem is then required to manually set the superblock and root dentry
    pointers. For most filesystems, this should be done with simple_set_mnt()
    which will set the superblock pointer and then set the root dentry to the
    superblock's s_root (as per the old default behaviour).

    The get_sb() op now returns an integer as there's now no need to return the
    superblock pointer.

    This patch permits a superblock to be implicitly shared amongst several mount
    points, such as can be done with NFS to avoid potential inode aliasing. In
    such a case, simple_set_mnt() would not be called, and instead the mnt_root
    and mnt_sb would be set directly.

    The patch also makes the following changes:

    (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
    pointer argument and return an integer, so most filesystems have to change
    very little.

    (*) If one of the convenience function is not used, then get_sb() should
    normally call simple_set_mnt() to instantiate the vfsmount. This will
    always return 0, and so can be tail-called from get_sb().

    (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
    dcache upon superblock destruction rather than shrink_dcache_anon().

    This is required because the superblock may now have multiple trees that
    aren't actually bound to s_root, but that still need to be cleaned up. The
    currently called functions assume that the whole tree is rooted at s_root,
    and that anonymous dentries are not the roots of trees which results in
    dentries being left unculled.

    However, with the way NFS superblock sharing are currently set to be
    implemented, these assumptions are violated: the root of the filesystem is
    simply a dummy dentry and inode (the real inode for '/' may well be
    inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
    with child trees.

    [*] Anonymous until discovered from another tree.

    (*) The documentation has been adjusted, including the additional bit of
    changing ext2_* into foo_* in the documentation.

    [akpm@osdl.org: convert ipath_fs, do other stuff]
    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • The race is that the shrink_dcache_memory shrinker could get called while a
    filesystem is being unmounted, and could try to prune a dentry belonging to
    that filesystem.

    If it does, then it will call in to iput on the inode while the dentry is
    no longer able to be found by the umounting process. If iput takes a
    while, generic_shutdown_super could get all the way though
    shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without
    ever waiting on this particular inode.

    Eventually the superblock gets freed anyway and if the iput tried to touch
    it (which some filesystems certainly do), it will lose. The promised
    "Self-destruct in 5 seconds" doesn't lead to a nice day.

    The race is closed by holding s_umount while calling prune_one_dentry on
    someone else's dentry. As a down_read_trylock is used,
    shrink_dcache_memory will no longer try to prune the dentry of a filesystem
    that is being unmounted, and unmount will not be able to start until any
    such active prune_one_dentry completes.

    This requires that prune_dcache *knows* which filesystem (if any) it is
    doing the prune on behalf of so that it can be careful of other
    filesystems. shrink_dcache_memory isn't called it on behalf of any
    filesystem, and so is careful of everything.

    shrink_dcache_anon is now passed a super_block rather than the s_anon list
    out of the superblock, so it can get the s_anon list itself, and can pass
    the superblock down to prune_dcache.

    If prune_dcache finds a dentry that it cannot free, it leaves it where it
    is (at the tail of the list) and exits, on the assumption that some other
    thread will be removing that dentry soon. To try to make sure that some
    work gets done, a limited number of dnetries which are untouchable are
    skipped over while choosing the dentry to work on.

    I believe this race was first found by Kirill Korotaev.

    Cc: Jan Blunck
    Acked-by: Kirill Korotaev
    Cc: Olaf Hering
    Acked-by: Balbir Singh
    Signed-off-by: Neil Brown
    Signed-off-by: Balbir Singh
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

09 Jun, 2006

2 commits


27 Mar, 2006

1 commit

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Ingo Molnar
    Cc: Eric Van Hensbergen
    Cc: Robert Love
    Cc: Thomas Gleixner
    Cc: David Woodhouse
    Cc: Neil Brown
    Cc: Trond Myklebust
    Cc: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

26 Mar, 2006

1 commit


24 Mar, 2006

1 commit

  • The meaning of MS_VERBOSE is backwards; if the bit is set, it really means,
    "don't be verbose". This is confusing and counter-intuitive.

    In addition, there is also no way to set the MS_VERBOSE flag in the
    mount(8) program in util-linux, but interesting, it does define options
    which would do the right thing if MS_SILENT were defined, which
    unfortunately we do not:

    #ifdef MS_SILENT
    { "quiet", 0, 0, MS_SILENT }, /* be quiet */
    { "loud", 0, 1, MS_SILENT }, /* print out messages. */
    #endif

    So the obvious fix is to deprecate the use of MS_VERBOSE and replace it
    with MS_SILENT.

    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     

23 Mar, 2006

3 commits

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Ingo Molnar
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

23 Feb, 2006

1 commit

  • This change reverts the 033b96fd30db52a710d97b06f87d16fc59fee0f1 commit
    from Kay Sievers that removed the mount/umount uevents from the kernel.
    Some older versions of HAL still depend on these events to detect when a
    new device has been mounted. These events are not correctly emitted,
    and are broken by design, and so, should not be relied upon by any
    future program. Instead, the /proc/mounts file should be polled to
    properly detect this kind of event.

    A feature-removal-schedule.txt entry has been added, noting when this
    interface will be removed from the kernel.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

08 Feb, 2006

1 commit

  • We had a user trigger this message on a box that had a lot of different
    mounts, all with different options. It might help narrow down wtf happened
    if we print out which device failed.

    Signed-off-by: Dave Jones
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jones
     

10 Jan, 2006

1 commit


09 Jan, 2006

1 commit


05 Jan, 2006

1 commit

  • The names of these events have been confusing from the beginning
    on, as they have been more like claim/release events. We needed these
    events for noticing HAL if storage devices have been mounted.

    Thanks to Al, we have the proper solution now and can poll()
    /proc/mounts instead to get notfied about mount tree changes.

    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

08 Nov, 2005

1 commit

  • The way we currently deal with quota and process accounting that might
    keep vfsmount busy at umount time is inherently broken; we try to turn
    them off just in case (not quite correctly, at that) and

    a) pray umount doesn't fail (otherwise they'll stay turned off)
    b) pray nobody doesn anything funny just as we turn quota off

    Moreover, LSM provides hooks for doing the same sort of broken logics.

    The proper way to deal with that is to introduce the second kind of
    reference to vfsmount. Semantics:

    - when the last normal reference is dropped, all special ones are
    converted to normal ones and if there had been any, cleanup is done.
    - normal reference can be cloned into a special one
    - special reference can be converted to normal one; that's a no-op if
    we'd already passed the point of no return (i.e. mntput() had
    converted special references to normal and started cleanup).

    The way it works: e.g. starting process accounting converts the vfsmount
    reference pinned by the opened file into special one and turns it back
    to normal when it gets shut down; acct_auto_close() is done when no
    normal references are left. That way it does *not* obstruct umount(2)
    and it silently gets turned off when the last normal reference to
    vfsmount is gone. Which is exactly what we want...

    The same should be done by LSM module that holds some internal
    references to vfsmount and wants to shut them down on umount - it should
    make them special and security_sb_umount_close() will be called exactly
    when the last normal reference to vfsmount is gone.

    quota handling is even simpler - we don't use normal file IO anymore, so
    there's no need to hold vfsmounts at all. DQUOT_OFF() is done from
    deactivate_super(), where it really belongs.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

07 Nov, 2005

1 commit


31 Oct, 2005

1 commit

  • Now that RCU applied on 'struct file' seems stable, we can place f_rcuhead
    in a memory location that is not anymore used at call_rcu(&f->f_rcuhead,
    file_free_rcu) time, to reduce the size of this critical kernel object.

    The trick I used is to move f_rcuhead and f_list in an union called f_u

    The callers are changed so that f_rcuhead becomes f_u.fu_rcuhead and f_list
    becomes f_u.f_list

    Signed-off-by: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

08 Jul, 2005

1 commit

  • This patch sets ->mnt_namespace where it's actually added to the
    namespace.

    Previously mnt_namespace was set in do_kern_mount() even if the filesystem
    was never added to any process's namespace (most kernel-internal
    filesystems).

    This discrepancy doesn't actually cause any problems, but it's cleaner if
    mnt_namespace is NULL for these non exported filesystems.

    Signed-off-by: Miklos Szeredi
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

24 Jun, 2005

1 commit

  • This patch removes O(n^2) super block loops in sync_inodes(),
    sync_filesystems() etc. in favour of using __put_super_and_need_restart()
    which I introduced earlier. We faced a noticably long freezes on sb
    syncing when there are thousands of super blocks in the system.

    Signed-Off-By: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Korotaev
     

22 Jun, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds