07 Jul, 2009

1 commit

  • I run many ffsb test cases on JBODs (typically 13/12 disks). Comparing
    with kernel 2.6.30, 2.6.31-rc1 has about 16% regression with
    ffsb_create_4k. The sub test case creates files continuously for 10
    minitues and every file is 1MB.

    Bisect located below patch.

    5cee5815d1564bbbd505fea86f4550f1efdb5cd0 is first bad commit
    commit 5cee5815d1564bbbd505fea86f4550f1efdb5cd0
    Author: Jan Kara
    Date: Mon Apr 27 16:43:51 2009 +0200

    vfs: Make sys_sync() use fsync_super() (version 4)

    It is unnecessarily fragile to have two places (fsync_super() and do_sync())
    doing data integrity sync of the filesystem. Alter __fsync_super() to
    accommodate needs of both callers and use it. So after this patch
    __fsync_super() is the only place where we gather all the calls needed to
    properly send all data on a filesystem to disk.

    As a matter of fact, ffsb calls sys_sync in the end to make sure all data
    is flushed to disks and the flushing is counted into the result. vmstat
    shows ffsb is blocked when syncing for a long time. With 2.6.30, ffsb is
    blocked for a short time.

    I checked the patch and did experiments to recover the original methods.
    Eventually, the root cause is the patch deletes the calling to
    wakeup_pdflush when syncing, so only ffsb is blocked on disk I/O.
    wakeup_pdflush could ask pdflush to write back pages with ffsb at the
    same time.

    [akpm@linux-foundation.org: restore comment too]
    Signed-off-by: Zhang Yanmin
    Cc: Jan Kara
    Cc: Al Viro
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang, Yanmin
     

12 Jun, 2009

9 commits

  • Now that all filesystems provide ->sync_fs methods we can change
    __sync_filesystem to only call ->sync_fs.

    This gives us a clear separation between periodic writeouts which
    are driven by ->write_super and data integrity syncs that go
    through ->sync_fs. (modulo file_fsync which is also going away)

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Push down lock_super into ->write_super instances and remove it from the
    caller.

    Following filesystem don't need ->s_lock in ->write_super and are skipped:

    * bfs, nilfs2 - no other uses of s_lock and have internal locks in
    ->write_super
    * ext2 - uses BKL in ext2_write_super and has internal calls without s_lock
    * reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in
    ->write_super
    * xfs - no other uses of s_lock and uses internal lock (buffer lock on
    superblock buffer) to serialize ->write_super. Also xfs_fs_write_super
    is superflous and will go away in the next merge window

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Make sure a superblock really is writeable by checking MS_RDONLY
    under s_umount. sync_filesystems needed some re-arragement for
    that, but all but one sync_filesystem caller had the correct locking
    already so that we could add that check there. cachefiles grew
    s_umount locking.

    I've also added a WARN_ON to sync_filesystem to assert this for
    future callers.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Introduce this function which just writes all the quota structures but
    avoids all the syncing and cache pruning work to expose quota structures
    to userspace. Use this function from __sync_filesystem when wait == 0.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Currently the VFS calls vfs_dq_sync to sync out disk quotas for a given
    superblock. This is a small wrapper around sync_dquots which for the
    case of a non-NULL superblock is a small wrapper around quota_sync_sb.

    Just make quota_sync_sb global (rename it to sync_quota_sb) and call it
    directly. Also call it directly for those cases in quota.c that have a
    superblock and leave sync_dquots purely an iterator over sync_quota_sb and
    remove it's superblock argument.

    To make this nicer move the check for the lack of a quota_sync method
    from the callers into sync_quota_sb.

    [folded build fix from Alexander Beregalov ]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Rename the function so that it better describe what it really does. Also
    remove the unnecessary include of buffer_head.h.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Move sync_filesystems(), __fsync_super(), fsync_super() from
    super.c to sync.c where it fits better.

    [build fixes folded]

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • It is unnecessarily fragile to have two places (fsync_super() and do_sync())
    doing data integrity sync of the filesystem. Alter __fsync_super() to
    accommodate needs of both callers and use it. So after this patch
    __fsync_super() is the only place where we gather all the calls needed to
    properly send all data on a filesystem to disk.

    Nice bonus is that we get a complete livelock avoidance and write_supers()
    is now only used for periodic writeback of superblocks.

    sync_blockdevs() introduced a couple of patches ago is gone now.

    [build fixes folded]

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • So far, do_sync() called:
    sync_inodes(0);
    sync_supers();
    sync_filesystems(0);
    sync_filesystems(1);
    sync_inodes(1);

    This ordering makes it kind of hard for filesystems as sync_inodes(0) need not
    submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing
    transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has
    not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2
    and others are hit by this) when racing e.g. with background writeback. A
    similar problem hits also other filesystems (e.g. ext2) because of
    write_supers() being called before the sync_inodes(1).

    Change the ordering of calls in do_sync() - this requires a new function
    sync_blockdevs() to preserve the property that block devices are always synced
    after write_super() / sync_fs() call.

    The same issue is fixed in __fsync_super() function used on umount /
    remount read-only.

    [AV: build fixes]

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

28 Mar, 2009

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: (27 commits)
    ext2: Zero our b_size in ext2_quota_read()
    trivial: fix typos/grammar errors in fs/Kconfig
    quota: Coding style fixes
    quota: Remove superfluous inlines
    quota: Remove uppercase aliases for quota functions.
    nfsd: Use lowercase names of quota functions
    jfs: Use lowercase names of quota functions
    udf: Use lowercase names of quota functions
    ufs: Use lowercase names of quota functions
    reiserfs: Use lowercase names of quota functions
    ext4: Use lowercase names of quota functions
    ext3: Use lowercase names of quota functions
    ext2: Use lowercase names of quota functions
    ramfs: Remove quota call
    vfs: Use lowercase names of quota functions
    quota: Remove dqbuf_t and other cleanups
    quota: Remove NODQUOT macro
    quota: Make global quota locks cacheline aligned
    quota: Move quota files into separate directory
    ext4: quota reservation for delayed allocation
    ...

    Linus Torvalds
     

26 Mar, 2009

2 commits


14 Jan, 2009

2 commits


07 Jan, 2009

1 commit

  • Chris Mason notices do_sync_mapping_range didn't actually ask for data
    integrity writeout. Unfortunately, it is advertised as being usable for
    data integrity operations.

    This is a data integrity bug.

    Signed-off-by: Nick Piggin
    Cc: Chris Mason
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

06 Jan, 2009

1 commit

  • Fsync currently has a fdatawrite/fdatawait pair around the method call,
    and a mutex_lock/unlock of the inode mutex. All callers of fsync have
    to duplicate this, but we have a few and most of them don't quite get
    it right. This patch adds a new vfs_fsync that takes care of this.
    It's a little more complicated as usual as ->fsync might get a NULL file
    pointer and just a dentry from nfsd, but otherwise gets afile and we
    want to take the mapping and file operations from it when it is there.

    Notes on the fsync callers:

    - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
    lower file
    - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
    file, and returning 0 when ->fsync was missing
    - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
    taking i_mutex. Now given that shared memory doesn't have disk
    backing not doing anything in fsync seems fine and I left it out of
    the vfs_fsync conversion for now, but in that case we might just
    not pass it through to the lower file at all but just call the no-op
    simple_sync_file directly.

    [and now actually export vfs_fsync]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

25 Jul, 2008

1 commit


29 Apr, 2008

1 commit


29 Jun, 2007

1 commit

  • Not all the world is an i386. Many architectures need 64-bit arguments to be
    aligned in suitable pairs of registers, and the original
    sys_sync_file_range(int, loff_t, loff_t, int) was therefore wasting an
    argument register for padding after the first integer. Since we don't
    normally have more than 6 arguments for system calls, that left no room for
    the final argument on some architectures.

    Fix this by introducing sys_sync_file_range2(int, int, loff_t, loff_t) which
    all fits nicely. In fact, ARM already had that, but called it
    sys_arm_sync_file_range. Move it to fs/sync.c and rename it, then implement
    the needed compatibility routine. And stop the missing syscall check from
    bitching about the absence of sys_sync_file_range() if we've implemented
    sys_sync_file_range2() instead.

    Tested on PPC32 and with 32-bit and 64-bit userspace on PPC64.

    Signed-off-by: David Woodhouse
    Acked-by: Russell King
    Cc: Arnd Bergmann
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Woodhouse
     

09 May, 2007

1 commit

  • Remove do_sync_file_range() and convert callers to just use
    do_sync_mapping_range().

    Signed-off-by: Mark Fasheh
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Fasheh
     

27 Apr, 2007

1 commit

  • do_sync_file_range() accepts a file * from which it takes an address_space to
    sync. Abstract out the bulk of the function into do_sync_mapping_range()
    which takes the address_space directly. This way callers who want to sync an
    address_space directly can take advantage of the functionality provided.

    do_sync_file_range() is preserved as a small wrapper around
    do_sync_mapping_range().

    Ocfs2 in particular would like to use this to initiate a sync of a specific
    inode range during truncate, where a file * may not be available.

    Signed-off-by: Mark Fasheh
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton

    Mark Fasheh
     

09 Dec, 2006

1 commit

  • This patch changes struct file to use struct path instead of having
    independent pointers to struct dentry and struct vfsmount, and converts all
    users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

    Additionally, it adds two #define's to make the transition easier for users of
    the f_dentry and f_vfsmnt.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

04 Dec, 2006

1 commit


01 Oct, 2006

1 commit

  • Move some functions out of the buffering code that aren't strictly buffering
    specific. This is a precursor to being able to disable the block layer.

    (*) Moved some stuff out of fs/buffer.c:

    (*) The file sync and general sync stuff moved to fs/sync.c.

    (*) The superblock sync stuff moved to fs/super.c.

    (*) do_invalidatepage() moved to mm/truncate.c.

    (*) try_to_release_page() moved to mm/filemap.c.

    (*) Moved some related declarations between header files:

    (*) declarations for do_invalidatepage() and try_to_release_page() moved
    to linux/mm.h.

    (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

23 Jun, 2006

1 commit

  • When a writeback_control's `start' and `end' fields are used to
    indicate a one-byte-range starting at file offset zero, the required
    values of .start=0,.end=0 mean that the ->writepages() implementation
    has no way of telling that it is being asked to perform a range
    request. Because we're currently overloading (start == 0 && end == 0)
    to mean "this is not a write-a-range request".

    To make all this sane, the patch changes range of writeback_control.

    So caller does: If it is calling ->writepages() to write pages, it
    sets range (range_start/end or range_cyclic) always.

    And if range_cyclic is true, ->writepages() thinks the range is
    cyclic, otherwise it just uses range_start and range_end.

    This patch does,

    - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
    -1 is usually ok for range_end (type is long long). But, if someone did,

    range_end += val; range_end is "val - 1"
    u64val = range_end >> bits; u64val is "~(0ULL)"

    or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
    things, and uses LLONG_MAX for range_end.

    - All callers of ->writepages() sets range_start/end or range_cyclic.

    - Fix updates of ->writeback_index. It seems already bit strange.
    If it starts at 0 and ended by check of nr_to_write, this last
    index may reduce chance to scan end of file. So, this updates
    ->writeback_index only if range_cyclic is true or whole-file is
    scanned.

    Signed-off-by: OGAWA Hirofumi
    Cc: Nathan Scott
    Cc: Anton Altaparmakov
    Cc: Steven French
    Cc: "Vladimir V. Saveliev"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     

11 Apr, 2006

1 commit


01 Apr, 2006

1 commit

  • Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
    fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
    Reasons:

    - It's more flexible. Things which would require two or three syscalls with
    fadvise() can be done in a single syscall.

    - Using fadvise() in this manner is something not covered by POSIX.

    The patch wires up the syscall for x86.

    The sycall is implemented in the new fs/sync.c. The intention is that we can
    move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.

    Documentation for the syscall is in fs/sync.c.

    A test app (sync_file_range.c) is in
    http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.

    The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
    say NFS_DATA_SYNC or NFS_FILE_SYNC. I can skip the ->fsync call for
    NFS_DATA_SYNC which is hopefully the more common."

    Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
    the queue is congested. This is trivial to fix: add a new flag bit, set
    wbc->nonblocking. But I'm not sure that we want to expose implementation
    details down to that level.

    Note: it's notable that we can sync an fd which wasn't opened for writing.
    Same with fsync() and fdatasync()).

    Note: the code takes some care to handle attempts to sync file contents
    outside the 16TB offset on 32-bit machines. It makes such attempts appear to
    succeed, for best 32-bit/64-bit compatibility. Perhaps it should make such
    requests fail...

    Cc: Nick Piggin
    Cc: Michael Kerrisk
    Cc: Ulrich Drepper
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton