12 Jun, 2009

5 commits

  • __fsync_super() does the same thing as fsync_super(). So change the only
    caller to use fsync_super() and make __fsync_super() static. This removes
    unnecessarily duplicated call to sync_blockdev() and prepares ground
    for the changes to __fsync_super() in the following patches.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • sync_filesystems() has a condition that if wait == 0 and s_dirt == 0, then
    ->sync_fs() isn't called. This does not really make much sence since s_dirt is
    generally used by a filesystem to mean that ->write_super() needs to be called.
    But ->sync_fs() does different things. I even suspect that some filesystems
    (btrfs?) sets s_dirt just to fool this logic.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • So far, do_sync() called:
    sync_inodes(0);
    sync_supers();
    sync_filesystems(0);
    sync_filesystems(1);
    sync_inodes(1);

    This ordering makes it kind of hard for filesystems as sync_inodes(0) need not
    submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing
    transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has
    not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2
    and others are hit by this) when racing e.g. with background writeback. A
    similar problem hits also other filesystems (e.g. ext2) because of
    write_supers() being called before the sync_inodes(1).

    Change the ordering of calls in do_sync() - this requires a new function
    sync_blockdevs() to preserve the property that block devices are always synced
    after write_super() / sync_fs() call.

    The same issue is fixed in __fsync_super() function used on umount /
    remount read-only.

    [AV: build fixes]

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Remove the unused s_async_list in the superblock, a leftover of the
    broken async inode deletion code that leaked into mainline. Having this
    in the middle of the sync/unmount path is not helpful for the following
    cleanups.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • This function walks the s_files lock, and operates primarily on the
    files in a superblock, so it better belongs here (eg. see also
    fs_may_remount_ro).

    [AV: ... and it shouldn't be static after that move]

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     

09 May, 2009

2 commits

  • Signed-off-by: H Hartley Sweeten
    Cc: Subrata Modak
    Signed-off-by: Al Viro

    H Hartley Sweeten
     
  • Does equivalent of up_write(&s->s_umount); deactivate_super(s);
    However, it does not does not unlock it until it's all over.
    As the result, it's safe to use to dispose of new superblock on ->get_sb()
    failure exits - nobody will see the sucker until it's all over.
    Equivalent using up_write/deactivate_super is safe for that purpose
    if superblock is either safe to use or has NULL ->s_root when we unlock.
    Normally filesystems take the required precautions, but
    a) we do have bugs in that area in some of them.
    b) up_write/deactivate_super sequence is extremely common,
    so the helper makes sense anyway.

    Signed-off-by: Al Viro

    Al Viro
     

07 Apr, 2009

1 commit


03 Apr, 2009

1 commit


28 Mar, 2009

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits)
    fs: avoid I_NEW inodes
    Merge code for single and multiple-instance mounts
    Remove get_init_pts_sb()
    Move common mknod_ptmx() calls into caller
    Parse mount options just once and copy them to super block
    Unroll essentials of do_remount_sb() into devpts
    vfs: simple_set_mnt() should return void
    fs: move bdev code out of buffer.c
    constify dentry_operations: rest
    constify dentry_operations: configfs
    constify dentry_operations: sysfs
    constify dentry_operations: JFS
    constify dentry_operations: OCFS2
    constify dentry_operations: GFS2
    constify dentry_operations: FAT
    constify dentry_operations: FUSE
    constify dentry_operations: procfs
    constify dentry_operations: ecryptfs
    constify dentry_operations: CIFS
    constify dentry_operations: AFS
    ...

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: (27 commits)
    ext2: Zero our b_size in ext2_quota_read()
    trivial: fix typos/grammar errors in fs/Kconfig
    quota: Coding style fixes
    quota: Remove superfluous inlines
    quota: Remove uppercase aliases for quota functions.
    nfsd: Use lowercase names of quota functions
    jfs: Use lowercase names of quota functions
    udf: Use lowercase names of quota functions
    ufs: Use lowercase names of quota functions
    reiserfs: Use lowercase names of quota functions
    ext4: Use lowercase names of quota functions
    ext3: Use lowercase names of quota functions
    ext2: Use lowercase names of quota functions
    ramfs: Remove quota call
    vfs: Use lowercase names of quota functions
    quota: Remove dqbuf_t and other cleanups
    quota: Remove NODQUOT macro
    quota: Make global quota locks cacheline aligned
    quota: Move quota files into separate directory
    ext4: quota reservation for delayed allocation
    ...

    Linus Torvalds
     
  • simple_set_mnt() is defined as returning 'int' but always returns 0.
    Callers assume simple_set_mnt() never fails and don't properly cleanup if
    it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev()
    should:

    up_write(sb->s_unmount);
    deactivate_super(sb);

    if simple_set_mnt() fails.

    Since simple_set_mnt() never fails, would be cleaner if it did not
    return anything.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Serge Hallyn
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Sukadev Bhattiprolu
     

26 Mar, 2009

2 commits


13 Mar, 2009

1 commit

  • In sget(), destroy_super(s) is called with s->s_umount held, which makes
    lockdep unhappy.

    Signed-off-by: Li Zefan
    Cc: Al Viro
    Acked-by: Peter Zijlstra
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     

19 Feb, 2009

1 commit

  • Li Zefan said:

    Thread 1:
    for ((; ;))
    {
    mount -t cpuset xxx /mnt > /dev/null 2>&1
    cat /mnt/cpus > /dev/null 2>&1
    umount /mnt > /dev/null 2>&1
    }

    Thread 2:
    for ((; ;))
    {
    mount -t cpuset xxx /mnt > /dev/null 2>&1
    umount /mnt > /dev/null 2>&1
    }

    (Note: It is irrelevant which cgroup subsys is used.)

    After a while a lockdep warning showed up:

    =============================================
    [ INFO: possible recursive locking detected ]
    2.6.28 #479
    ---------------------------------------------
    mount/13554 is trying to acquire lock:
    (&type->s_umount_key#19){--..}, at: [] sget+0x5e/0x321

    but task is already holding lock:
    (&type->s_umount_key#19){--..}, at: [] sget+0x1e2/0x321

    other info that might help us debug this:
    1 lock held by mount/13554:
    #0: (&type->s_umount_key#19){--..}, at: [] sget+0x1e2/0x321

    stack backtrace:
    Pid: 13554, comm: mount Not tainted 2.6.28-mc #479
    Call Trace:
    [] validate_chain+0x4c6/0xbbd
    [] __lock_acquire+0x676/0x700
    [] lock_acquire+0x5d/0x7a
    [] ? sget+0x5e/0x321
    [] down_write+0x34/0x50
    [] ? sget+0x5e/0x321
    [] sget+0x5e/0x321
    [] ? cgroup_set_super+0x0/0x3e
    [] ? cgroup_test_super+0x0/0x2f
    [] cgroup_get_sb+0x98/0x2e7
    [] cpuset_get_sb+0x4a/0x5f
    [] vfs_kern_mount+0x40/0x7b
    [] do_kern_mount+0x37/0xbf
    [] do_mount+0x5c3/0x61a
    [] ? copy_mount_options+0x2c/0x111
    [] sys_mount+0x69/0xa0
    [] sysenter_do_call+0x12/0x31

    The cause is after alloc_super() and then retry, an old entry in list
    fs_supers is found, so grab_super(old) is called, but both functions hold
    s_umount lock:

    struct super_block *sget(...)
    {
    ...
    retry:
    spin_lock(&sb_lock);
    if (test) {
    list_for_each_entry(old, &type->fs_supers, s_instances) {
    if (!test(old, data))
    continue;
    if (!grab_super(old)) s_umount);
    goto retry;
    if (s)
    destroy_super(s);
    return old;
    }
    }
    if (!s) {
    spin_unlock(&sb_lock);
    s = alloc_super(type); s_umount)
    if (!s)
    return ERR_PTR(-ENOMEM);
    goto retry;
    }
    ...
    }

    It seems like a false positive, and seems like VFS but not cgroup needs to
    be fixed.

    Peter said:

    We can simply put the new s_umount instance in a but lockdep doesn't
    particularly cares about subclass order.

    If there's any issue with the callers of sget() assuming the s_umount lock
    being of sublcass 0, then there is another annotation we can use to fix
    that, but lets not bother with that if this is sufficient.

    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12673

    Signed-off-by: Peter Zijlstra
    Tested-by: Li Zefan
    Reported-by: Li Zefan
    Cc: Al Viro
    Cc: Paul Menage
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

09 Feb, 2009

1 commit


14 Jan, 2009

1 commit


09 Jan, 2009

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits)
    jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs
    ext4: Remove "extents" mount option
    block: Add Kconfig help which notes that ext4 needs CONFIG_LBD
    ext4: Make printk's consistently prefixed with "EXT4-fs: "
    ext4: Add sanity checks for the superblock before mounting the filesystem
    ext4: Add mount option to set kjournald's I/O priority
    jbd2: Submit writes to the journal using WRITE_SYNC
    jbd2: Add pid and journal device name to the "kjournald2 starting" message
    ext4: Add markers for better debuggability
    ext4: Remove code to create the journal inode
    ext4: provide function to release metadata pages under memory pressure
    ext3: provide function to release metadata pages under memory pressure
    add releasepage hooks to block devices which can be used by file systems
    ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
    ext4: Init the complete page while building buddy cache
    ext4: Don't allow new groups to be added during block allocation
    ext4: mark the blocks/inode bitmap beyond end of group as used
    ext4: Use new buffer_head flag to check uninit group bitmaps initialization
    ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
    ext4: code cleanup
    ...

    Linus Torvalds
     
  • sync_filesystems() shouldn't be calling async_synchronize_full_special
    while holding a spinlock. The second while loop in that function is the
    right place for this anyway.

    Signed-off-by: Dave Kleikamp
    Cc: Arjan van de Ven
    Reported-by: Grissiom
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     

08 Jan, 2009

1 commit


03 Jan, 2009

1 commit


20 Dec, 2008

1 commit


24 Oct, 2008

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
    [PATCH] kill the rest of struct file propagation in block ioctls
    [PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
    [PATCH] get rid of blkdev_locked_ioctl()
    [PATCH] get rid of blkdev_driver_ioctl()
    [PATCH] sanitize blkdev_get() and friends
    [PATCH] remember mode of reiserfs journal
    [PATCH] propagate mode through swsusp_close()
    [PATCH] propagate mode through open_bdev_excl/close_bdev_excl
    [PATCH] pass fmode_t to blkdev_put()
    [PATCH] kill the unused bsize on the send side of /dev/loop
    [PATCH] trim file propagation in block/compat_ioctl.c
    [PATCH] end of methods switch: remove the old ones
    [PATCH] switch sr
    [PATCH] switch sd
    [PATCH] switch ide-scsi
    [PATCH] switch tape_block
    [PATCH] switch dcssblk
    [PATCH] switch dasd
    [PATCH] switch mtd_blkdevs
    [PATCH] switch mmc
    ...

    Linus Torvalds
     

23 Oct, 2008

2 commits


21 Oct, 2008

1 commit


25 Jul, 2008

1 commit

  • [Summary]

    Split LRU-list of unused dentries to one per superblock to avoid soft
    lock up during NFS mounts and remounting of any filesystem.

    Previously I posted here:
    http://lkml.org/lkml/2008/3/5/590

    [Descriptions]

    - background

    dentry_unused is a list of dentries which are not referenced.
    dentry_unused grows up when references on directories or files are
    released. This list can be very long if there is huge free memory.

    - the problem

    When shrink_dcache_sb() is called, it scans all dentry_unused linearly
    under spin_lock(), and if dentry->d_sb is differnt from given
    superblock, scan next dentry. This scan costs very much if there are
    many entries, and very ineffective if there are many superblocks.

    IOW, When we need to shrink unused dentries on one dentry, but scans
    unused dentries on all superblocks in the system. For example, we scan
    500 dentries to unmount a filesystem, but scans 1,000,000 or more unused
    dentries on other superblocks.

    In our case , At mounting NFS*, shrink_dcache_sb() is called to shrink
    unused dentries on NFS, but scans 100,000,000 unused dentries on
    superblocks in the system such as local ext3 filesystems. I hear NFS
    mounting took 1 min on some system in use.

    * : NFS uses virtual filesystem in rpc layer, so NFS is affected by
    this problem.

    100,000,000 is possible number on large systems.

    Per-superblock LRU of unused dentried can reduce the cost in
    reasonable manner.

    - How to fix

    I found this problem is solved by David Chinner's "Per-superblock
    unused dentry LRU lists V3"(1), so I rebase it and add some fix to
    reclaim with fairness, which is in Andrew Morton's comments(2).

    1) http://lkml.org/lkml/2006/5/25/318
    2) http://lkml.org/lkml/2006/5/25/320

    Split LRU-list of unused dentries to each superblocks. Then, NFS
    mounting will check dentries under a superblock instead of all. But
    this spliting will break LRU of dentry-unused. So, I've attempted to
    make reclaim unused dentrins with fairness by calculate number of
    dentries to scan on this sb based on following way

    number of dentries to scan on this sb =
    count * (number of dentries on this sb / number of dentries in the machine)

    - ToDo
    - I have to measuring performance number and do stress tests.

    - When unmount occurs during prune_dcache(), scanning on same
    superblock, It is unable to reach next superblock because it is gone
    away. We restart scannig superblock from first one, it causes
    unfairness of reclaim unused dentries on first superblock. But I think
    this happens very rarely.

    - Test Results

    Result on 6GB boxes with excessive unused dentries.

    Without patch:

    $ cat /proc/sys/fs/dentry-state
    10181835 10180203 45 0 0 0
    # mount -t nfs 10.124.60.70:/work/kernel-src nfs
    real 0m1.830s
    user 0m0.001s
    sys 0m1.653s

    With this patch:
    $ cat /proc/sys/fs/dentry-state
    10236610 10234751 45 0 0 0
    # mount -t nfs 10.124.60.70:/work/kernel-src nfs
    real 0m0.106s
    user 0m0.002s
    sys 0m0.032s

    [akpm@linux-foundation.org: fix comments]
    Signed-off-by: Kentaro Makita
    Cc: Neil Brown
    Cc: Trond Myklebust
    Cc: David Chinner
    Cc: "J. Bruce Fields"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kentaro Makita
     

29 Apr, 2008

1 commit


28 Apr, 2008

1 commit

  • Currently, we just turn quotas off on remount of filesystem to read-only
    state. The patch below adds necessary framework so that we can turn quotas
    off on remount RO but we are able to automatically reenable them again when
    filesystem is remounted to RW state. All we need to do is to keep references
    to inodes of quota files when remounting RO and using these references to
    reenable quotas when remounting RW.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

22 Apr, 2008

1 commit


19 Apr, 2008

2 commits

  • There have been a few oopses caused by 'struct file's with NULL f_vfsmnts.
    There was also a set of potentially missed mnt_want_write()s from
    dentry_open() calls.

    This patch provides a very simple debugging framework to catch these kinds of
    bugs. It will WARN_ON() them, but should stop us from having any oopses or
    mnt_writer count imbalances.

    I'm quite convinced that this is a good thing because it found bugs in the
    stuff I was working on as soon as I wrote it.

    [hch: made it conditional on a debug option.
    But it's still a little bit too ugly]

    [hch: merged forced remount r/o fix from Dave and akpm's fix for the fix]

    Signed-off-by: Dave Hansen
    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Hansen
     
  • The emergency remount code forcibly removes FMODE_WRITE from
    filps. The r/o bind mount code notices that this was done
    without a proper mnt_drop_write() and properly gives a
    warning.

    This patch does a mnt_drop_write() to keep everything
    balanced.

    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Hansen
    Signed-off-by: Al Viro

    Dave Hansen
     

25 Mar, 2008

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    [PATCH] get stack footprint of pathname resolution back to relative sanity
    [PATCH] double iput() on failure exit in hugetlb
    [PATCH] double dput() on failure exit in tiny-shmem
    [PATCH] fix up new filp allocators
    [PATCH] check for null vfsmount in dentry_open()
    [PATCH] reiserfs: eliminate private use of struct file in xattr
    [PATCH] sanitize hppfs
    hppfs pass vfsmount to dentry_open()
    [PATCH] restore export of do_kern_mount()

    Linus Torvalds
     

20 Mar, 2008

1 commit

  • Fix kernel-doc notation warnings in fs/.

    Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
    * mark_files_ro
    Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
    * lease_get_mtime
    Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
    * lease_get_mtime
    Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
    * lookup_one_len: filesystem helper to lookup single pathname component
    Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
    * bh_uptodate_or_lock: Test whether the buffer is uptodate
    Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
    * bh_submit_read: Submit a locked buffer for reading
    Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
    * writeback_acquire: attempt to get exclusive writeback access to a device
    Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
    * writeback_in_progress: determine whether there is writeback in progress
    Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
    * writeback_release: relinquish exclusive writeback access against a device.
    Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
    Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
    Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
    * void journal_invalidatepage()

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

18 Mar, 2008

1 commit

  • vfs_kern_mount() requires having a reference to fs type, which
    makes it impossible for module to create procfs, etc. private
    mount. Open-coding is not an option, since e.g. put_filesystem()
    is _not_ exported, and for a good reason.

    Signed-off-by: Al Viro

    Al Viro
     

06 Mar, 2008

1 commit

  • Introduce new LSM interfaces to allow an FS to deal with their own mount
    options. This includes a new string parsing function exported from the
    LSM that an FS can use to get a security data blob and a new security
    data blob. This is particularly useful for an FS which uses binary
    mount data, like NFS, which does not pass strings into the vfs to be
    handled by the loaded LSM. Also fix a BUG() in both SELinux and SMACK
    when dealing with binary mount data. If the binary mount data is less
    than one page the copy_page() in security_sb_copy_data() can cause an
    illegal page fault and boom. Remove all NFSisms from the SELinux code
    since they were broken by past NFS changes.

    Signed-off-by: Eric Paris
    Acked-by: Stephen Smalley
    Acked-by: Casey Schaufler
    Signed-off-by: James Morris

    Eric Paris
     

09 Feb, 2008

2 commits

  • Turn off quotas before filesystem is remounted read only. Otherwise quota
    will try to write to read-only filesystem which does no good... We could
    also just refuse to remount ro when quota is enabled but turning quota off
    is consistent with what we do on umount.

    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Add a new s_options field to struct super_block. Filesystems can save
    mount options passed to them in mount or remount. It is automatically
    freed when the superblock is destroyed.

    A new helper function, generic_show_options() is introduced, which uses
    this field to display the mount options in /proc/mounts.

    Another helper function, save_mount_options() may be used by
    filesystems to save the options in the super block.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

20 Oct, 2007

1 commit

  • * Convert files to UTF-8.

    * Also correct some people's names
    (one example is Eißfeldt, which was found in a source file.
    Given that the author used an ß at all in a source file
    indicates that the real name has in fact a 'ß' and not an 'ss',
    which is commonly used as a substitute for 'ß' when limited to
    7bit.)

    * Correct town names (Goettingen -> Göttingen)

    * Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)

    Signed-off-by: Jan Engelhardt
    Signed-off-by: Adrian Bunk

    Jan Engelhardt