03 Jul, 2013

1 commit

  • For those file systems(btrfs/ext4/ocfs2/tmpfs) that support
    SEEK_DATA/SEEK_HOLE functions, we end up handling the similar
    matter in lseek_execute() to update the current file offset
    to the desired offset if it is valid, ceph also does the
    simliar things at ceph_llseek().

    To reduce the duplications, this patch make lseek_execute()
    public accessible so that we can call it directly from the
    underlying file systems.

    Thanks Dave Chinner for this suggestion.

    [AV: call it vfs_setpos(), don't bring the removed 'inode' argument back]

    v2->v1:
    - Add kernel-doc comments for lseek_execute()
    - Call lseek_execute() in ceph->llseek()

    Signed-off-by: Jie Liu
    Cc: Dave Chinner
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: Ben Myers
    Cc: Ted Tso
    Cc: Hugh Dickins
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Sage Weil
    Signed-off-by: Al Viro

    Jie Liu
     

29 Jun, 2013

1 commit


08 May, 2013

1 commit

  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

03 May, 2013

1 commit

  • Pull xfs update from Ben Myers:
    "For 3.10-rc1 we have a number of bug fixes and cleanups and a
    currently experimental feature from David Chinner, CRCs protection for
    metadata. CRCs are enabled by using mkfs.xfs to create a filesystem
    with the feature bits set.

    - numerous fixes for speculative preallocation
    - don't verify buffers on IO errors
    - rename of random32 to prandom32
    - refactoring/rearrangement in xfs_bmap.c
    - removal of unused m_inode_shrink in struct xfs_mount
    - fix error handling of xfs_bufs and readahead
    - quota driven preallocation throttling
    - fix WARN_ON in xfs_vm_releasepage
    - add ratelimited printk for different alert levels
    - fix spurious forced shutdowns due to freed Extent Free Intents
    - remove some obsolete XLOG_CIL_HARD_SPACE_LIMIT() macros
    - remove some obsoleted comments
    - (experimental) CRC support for metadata"

    * tag 'for-linus-v3.10-rc1' of git://oss.sgi.com/xfs/xfs: (46 commits)
    xfs: fix da node magic number mismatches
    xfs: Remote attr validation fixes and optimisations
    xfs: Teach dquot recovery about CONFIG_XFS_QUOTA
    xfs: add metadata CRC documentation
    xfs: implement extended feature masks
    xfs: add CRC checks to the superblock
    xfs: buffer type overruns blf_flags field
    xfs: add buffer types to directory and attribute buffers
    xfs: add CRC protection to remote attributes
    xfs: split remote attribute code out
    xfs: add CRCs to attr leaf blocks
    xfs: add CRCs to dir2/da node blocks
    xfs: shortform directory offsets change for dir3 format
    xfs: add CRC checking to dir2 leaf blocks
    xfs: add CRC checking to dir2 data blocks
    xfs: add CRC checking to dir2 free blocks
    xfs: add CRC checks to block format directory blocks
    xfs: add CRC checks to remote symlinks
    xfs: split out symlink code into it's own file.
    xfs: add version 3 inode format with CRCs
    ...

    Linus Torvalds
     

28 Apr, 2013

1 commit


10 Apr, 2013

1 commit


23 Feb, 2013

1 commit


30 Nov, 2012

1 commit

  • XFS_IOC_ZERO_RANGE simply does not work properly for non page cache
    aligned ranges. Neither test 242 or 290 exercise this correctly, so
    the behaviour is completely busted even though the tests pass.

    Fix it to support full byte range granularity as was originally
    intended for this ioctl.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

16 Nov, 2012

2 commits


15 Nov, 2012

1 commit

  • It's just a simple wrapper around VFS functionality, and is actually
    bugging in that it doesn't remove mappings before invalidating the
    page cache. Remove it and replace it with the correct VFS
    functionality.

    Signed-off-by: Dave Chinner
    Reviewed-by: Andrew Dahl
    Signed-off-by: Ben Myers

    Dave Chinner
     

18 Oct, 2012

1 commit

  • We don't do any data writeback from XFS any more - the VFS is
    completely responsible for that, including for freeze. We can
    replace the remaining caller with a VFS level function that
    achieves the same thing, but without conflicting with current
    writeback work.

    This means we can remove the flush_work and xfs_flush_inodes() - the
    VFS functionality completely replaces the internal flush queue for
    doing this writeback work in a separate context to avoid stack
    overruns.

    This does have one complication - it cannot be called with page
    locks held. Hence move the flushing of delalloc space when ENOSPC
    occurs back up into xfs_file_aio_buffered_write when we don't hold
    any locks that will stall writeback.

    Unfortunately, writeback_inodes_sb_if_idle() is not sufficient to
    trigger delalloc conversion fast enough to prevent spurious ENOSPC
    whent here are hundreds of writers, thousands of small files and GBs
    of free RAM. Hence we need to use sync_sb_inodes() to block callers
    while we wait for writeback like the previous xfs_flush_inodes
    implementation did.

    That means we have to hold the s_umount lock here, but because this
    call can nest inside i_mutex (the parent directory in the create
    case, held by the VFS), we have to use down_read_trylock() to avoid
    potential deadlocks. In practice, this trylock will succeed on
    almost every attempt as unmount/remount type operations are
    exceedingly rare.

    Note: we always need to pass a count of zero to
    generic_file_buffered_write() as the previously written byte count.
    We only do this by accident before this patch by the virtue of ret
    always being zero when there are no errors. Make this explicit
    rather than needing to specifically zero ret in the ENOSPC retry
    case.

    Signed-off-by: Dave Chinner
    Tested-by: Brian Foster
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

09 Oct, 2012

1 commit

  • Move actual pte filling for non-linear file mappings into the new special
    vma operation: ->remap_pages().

    Filesystems must implement this method to get non-linear mapping support,
    if it uses filemap_fault() then generic_file_remap_pages() can be used.

    Now device drivers can implement this method and obtain nonlinear vma support.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Alexander Viro
    Cc: Carsten Otte
    Cc: Chris Metcalf #arch/tile
    Cc: Cyrill Gorcunov
    Cc: Eric Paris
    Cc: H. Peter Anvin
    Cc: Hugh Dickins
    Cc: Ingo Molnar
    Cc: James Morris
    Cc: Jason Baron
    Cc: Kentaro Takeda
    Cc: Matt Helsley
    Cc: Nick Piggin
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Suresh Siddha
    Cc: Tetsuo Handa
    Cc: Venkatesh Pallipadi
    Acked-by: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

25 Aug, 2012

4 commits


02 Aug, 2012

1 commit

  • Pull second vfs pile from Al Viro:
    "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
    deadlock reproduced by xfstests 068), symlink and hardlink restriction
    patches, plus assorted cleanups and fixes.

    Note that another fsfreeze deadlock (emergency thaw one) is *not*
    dealt with - the series by Fernando conflicts a lot with Jan's, breaks
    userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
    for massive vfsmount leak; this is going to be handled next cycle.
    There probably will be another pull request, but that stuff won't be
    in it."

    Fix up trivial conflicts due to unrelated changes next to each other in
    drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
    delousing target_core_file a bit
    Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
    fs: Remove old freezing mechanism
    ext2: Implement freezing
    btrfs: Convert to new freezing mechanism
    nilfs2: Convert to new freezing mechanism
    ntfs: Convert to new freezing mechanism
    fuse: Convert to new freezing mechanism
    gfs2: Convert to new freezing mechanism
    ocfs2: Convert to new freezing mechanism
    xfs: Convert to new freezing code
    ext4: Convert to new freezing mechanism
    fs: Protect write paths by sb_start_write - sb_end_write
    fs: Skip atime update on frozen filesystem
    fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
    fs: Improve filesystem freezing handling
    switch the protection of percpu_counter list to spinlock
    nfsd: Push mnt_want_write() outside of i_mutex
    btrfs: Push mnt_want_write() outside of i_mutex
    fat: Push mnt_want_write() outside of i_mutex
    ...

    Linus Torvalds
     

31 Jul, 2012

1 commit

  • Generic code now blocks all writers from standard write paths. So we add
    blocking of all writers coming from ioctl (we get a protection of ioctl against
    racing remount read-only as a bonus) and convert xfs_file_aio_write() to a
    non-racy freeze protection. We also keep freeze protection on transaction
    start to block internal filesystem writes such as removal of preallocated
    blocks.

    CC: Ben Myers
    CC: Alex Elder
    CC: xfs@oss.sgi.com
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

15 Jun, 2012

2 commits


02 Jun, 2012

1 commit

  • Btrfs has to make sure we have space to allocate new blocks in order to modify
    the inode, so updating time can fail. We've gotten around this by having our
    own file_update_time but this is kind of a pain, and Christoph has indicated he
    would like to make xfs do something different with atime updates. So introduce
    ->update_time, where we will deal with i_version an a/m/c time updates and
    indicate which changes need to be made. The normal version just does what it
    has always done, updates the time and marks the inode dirty, and then
    filesystems can choose to do something different.

    I've gone through all of the users of file_update_time and made them check for
    errors with the exception of the fault code since it's complicated and I wasn't
    quite sure what to do there, also Jan is going to be pushing the file time
    updates into page_mkwrite for those who have it so that should satisfy btrfs and
    make it not a big deal to check the file_update_time() return code in the
    generic fault path. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

15 May, 2012

5 commits

  • This patch adds lseek(2) SEEK_DATA/SEEK_HOLE functionality to xfs.

    Signed-off-by: Jie Liu
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Jeff Liu
     
  • With the removal of xfs_rw.h and other changes over time, xfs_bit.h
    is being included in many files that don't actually need it. Clean
    up the includes as necessary.

    Also move the only-used-once xfs_ialloc_find_free() static inline
    function out of a header file that is widely included to reduce
    the number of needless dependencies on xfs_bit.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Untangle the header file includes a bit by moving the definition of
    xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
    xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
    xfs_ag.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Instead of calling xfs_zero_eof with the ilock held only take it internally
    for the minimall required critical section around xfs_bmapi_read. This
    also requires changing the calling convention for xfs_zero_last_block
    slightly. The actual zeroing operation is still serialized by the iolock,
    which must be taken exclusively over the call to xfs_zero_eof.

    We could in fact use a shared lock for the xfs_bmapi_read calls as long as
    the extent list has been read in, but given that we already hold the iolock
    exclusively there is little reason to micro optimize this further.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • We do not need the ilock for generic_write_checks and the i_size_read,
    which are protected by i_mutex and/or iolock, so reduce the ilock
    critical section to just the call to xfs_zero_eof.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

14 Mar, 2012

2 commits

  • Add an in-memory only flag to say we logged timestamps only, and use it to
    check if fdatasync can optimize away the log force.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Timestamps on regular files are the last metadata that XFS does not update
    transactionally. Now that we use the delaylog mode exclusively and made
    the log scode scale extremly well there is no need to bypass that code for
    timestamp updates. Logging all updates allows to drop a lot of code, and
    will allow for further performance improvements later on.

    Note that this patch drops optimized handling of fdatasync - it will be
    added back in a separate commit.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

18 Jan, 2012

4 commits

  • With all the size field updates out of the way xfs_file_aio_write can
    be further simplified by pushing all iolock handling into
    xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using
    the generic generic_write_sync helper for synchronous writes.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • While xfs_iunlock is fine with 0 lockflags the calling conventions are much
    cleaner if xfs_file_aio_write_checks never returns without the iolock held.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Now that we use the VFS i_size field throughout XFS there is no need for the
    i_new_size field any more given that the VFS i_size field gets updated
    in ->write_end before unlocking the page, and thus is always uptodate when
    writeback could see a page. Removing i_new_size also has the advantage that
    we will never have to trim back di_size during a failed buffered write,
    given that it never gets updated past i_size.

    Note that currently the generic direct I/O code only updates i_size after
    calling our end_io handler, which requires a small workaround to make
    sure di_size actually makes it to disk. I hope to fix this properly in
    the generic code.

    A downside is that we lose the support for parallel non-overlapping O_DIRECT
    appending writes that recently was added. I don't think keeping the complex
    and fragile i_new_size infrastructure for this is a good tradeoff - if we
    really care about parallel appending writers we should investigate turning
    the iolock into a range lock, which would also allow for parallel
    non-overlapping buffered writers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • There is no fundamental need to keep an in-memory inode size copy in the XFS
    inode. We already have the on-disk value in the dinode, and the separate
    in-memory copy that we need for regular files only in the XFS inode.

    Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
    VFS inode i_size field for regular files. Switch code that was directly
    accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
    we are limited to regular files direct access of the VFS inode i_size field.

    This also allows dropping some fairly complicated code in the write path
    which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
    that is getting updated inside ->write_end.

    Note that we do not bother resetting the VFS i_size when truncating a file
    that gets freed to zero as there is no point in doing so because the VFS inode
    is no longer in use at this point. Just relax the assert in xfs_ifree to
    only check the on-disk size instead.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

02 Dec, 2011

1 commit


12 Oct, 2011

6 commits

  • Directories are only updated transactionally, which means fsync only
    needs to flush the log the inode is currently dirty, but not bother
    with checking for dirty data, non-transactional updates, and most
    importanly doesn't have to flush disk caches except as part of a
    transaction commit.

    While the first two optimizations can't easily be measured, the
    latter actually makes a difference when doing lots of fsync that do
    not actually have to commit the inode, e.g. because an earlier fsync
    already pushed the log far enough.

    The new xfs_dir_fsync is identical to xfs_nfs_commit_metadata except
    for the prototype, but I'm not sure creating a common helper for the
    two is worth it given how simple the functions are.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • There is no reason to keep a reference to the inode even if we unlock
    it during transaction commit because we never drop a reference between
    the ijoin and commit. Also use this fact to merge xfs_trans_ijoin_ref
    back into xfs_trans_ijoin - the third argument decides if an unlock
    is needed now.

    I'm actually starting to wonder if allowing inodes to be unlocked
    at transaction commit really is worth the effort. The only real
    benefit is that they can be unlocked earlier when commiting a
    synchronous transactions, but that could be solved by doing the
    log force manually after the unlock, too.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Only read the LSN we need to push to with the ilock held, and then release
    it before we do the log force to improve concurrency.

    This also removes the only direct caller of _xfs_trans_commit, thus
    allowing it to be merged into the plain xfs_trans_commit again.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • xfs_bmapi() currently handles both extent map reading and
    allocation. As a result, the code is littered with "if (wr)"
    branches to conditionally do allocation operations if required.
    This makes the code much harder to follow and causes significant
    indent issues with the code.

    Given that read mapping is much simpler than allocation, we can
    split out read mapping from xfs_bmapi() and reuse the logic that
    we have already factored out do do all the hard work of handling the
    extent map manipulations. The results in a much simpler function for
    the common extent read operations, and will allow the allocation
    code to be simplified in another commit.

    Once xfs_bmapi_read() is implemented, convert all the callers of
    xfs_bmapi() that are only reading extents to use the new function.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • Currently a buffered reader or writer can add pages to the pagecache
    while we are waiting for the iolock in xfs_file_dio_aio_write. Prevent
    this by re-checking mapping->nrpages after we got the iolock, and if
    nessecary upgrade the lock to exclusive mode. To simplify this a bit
    only take the ilock inside of xfs_file_aio_write_checks.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • We now have an i_dio_count filed and surrounding infrastructure to wait
    for direct I/O completion instead of i_icount, and we have never needed
    to iocount waits for buffered I/O given that we only set the page uptodate
    after finishing all required work. Thus remove i_iocount, and replace
    the actually needed waits with calls to inode_dio_wait.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig