02 Aug, 2012

1 commit

  • Pull second vfs pile from Al Viro:
    "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
    deadlock reproduced by xfstests 068), symlink and hardlink restriction
    patches, plus assorted cleanups and fixes.

    Note that another fsfreeze deadlock (emergency thaw one) is *not*
    dealt with - the series by Fernando conflicts a lot with Jan's, breaks
    userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
    for massive vfsmount leak; this is going to be handled next cycle.
    There probably will be another pull request, but that stuff won't be
    in it."

    Fix up trivial conflicts due to unrelated changes next to each other in
    drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
    delousing target_core_file a bit
    Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
    fs: Remove old freezing mechanism
    ext2: Implement freezing
    btrfs: Convert to new freezing mechanism
    nilfs2: Convert to new freezing mechanism
    ntfs: Convert to new freezing mechanism
    fuse: Convert to new freezing mechanism
    gfs2: Convert to new freezing mechanism
    ocfs2: Convert to new freezing mechanism
    xfs: Convert to new freezing code
    ext4: Convert to new freezing mechanism
    fs: Protect write paths by sb_start_write - sb_end_write
    fs: Skip atime update on frozen filesystem
    fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
    fs: Improve filesystem freezing handling
    switch the protection of percpu_counter list to spinlock
    nfsd: Push mnt_want_write() outside of i_mutex
    btrfs: Push mnt_want_write() outside of i_mutex
    fat: Push mnt_want_write() outside of i_mutex
    ...

    Linus Torvalds
     

31 Jul, 2012

2 commits

  • Generic code now blocks all writers from standard write paths. So we add
    blocking of all writers coming from ioctl (we get a protection of ioctl against
    racing remount read-only as a bonus) and convert xfs_file_aio_write() to a
    non-racy freeze protection. We also keep freeze protection on transaction
    start to block internal filesystem writes such as removal of preallocated
    blocks.

    CC: Ben Myers
    CC: Alex Elder
    CC: xfs@oss.sgi.com
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Pull xfs update from Ben Myers:
    "Numerous cleanups and several bug fixes. Here are some highlights:

    - Discontiguous directory buffer support
    - Inode allocator refactoring
    - Removal of the IO lock in inode reclaim
    - Implementation of .update_time
    - Fix for handling of EOF in xfs_vm_writepage
    - Fix for races in xfsaild, and idle mode is re-enabled
    - Fix for a crash in xfs_buf completion handlers on unmount."

    Fix up trivial conflicts in fs/xfs/{xfs_buf.c,xfs_log.c,xfs_log_priv.h}
    due to duplicate patches that had already been merged for 3.5.

    * tag 'for-linus-v3.6-rc1' of git://oss.sgi.com/xfs/xfs: (44 commits)
    xfs: wait for the write the superblock on unmount
    xfs: re-enable xfsaild idle mode and fix associated races
    xfs: remove iolock lock classes
    xfs: avoid the iolock in xfs_free_eofblocks for evicted inodes
    xfs: do not take the iolock in xfs_inactive
    xfs: remove xfs_inactive_attrs
    xfs: clean up xfs_inactive
    xfs: do not read the AGI buffer in xfs_dialloc until nessecary
    xfs: refactor xfs_ialloc_ag_select
    xfs: add a short cut to xfs_dialloc for the non-NULL agbp case
    xfs: remove the alloc_done argument to xfs_dialloc
    xfs: split xfs_dialloc
    xfs: remove xfs_ialloc_find_free
    Prefix IO_XX flags with XFS_IO_XX to avoid namespace colision.
    xfs: remove xfs_inotobp
    xfs: merge xfs_itobp into xfs_imap_to_bp
    xfs: handle EOF correctly in xfs_vm_writepage
    xfs: implement ->update_time
    xfs: fix comment typo of struct xfs_da_blkinfo.
    xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks
    ...

    Linus Torvalds
     

30 Jul, 2012

12 commits

  • v2: Add the xfs_buf_lock to xfs_quiesce_attr().
    Add explaination why xfs_buf_lock() is used to wait for write.

    xfs_wait_buftarg() does not wait for the completion of the write of the
    uncached superblock. This write can race with the shutdown of the log
    and causes a panic if the write does not win the race.

    During the log write, xfsaild_push() will lock the buffer and set the
    XBF_ASYNC flag. Because the XBF_FLAG is set, complete() is not performed
    on the buffer's iowait entry, we cannot call xfs_buf_iowait() to wait
    for the write to complete. The buffer's lock is held until the write is
    complete, so we can block on a xfs_buf_lock() request to be notified
    that the write is complete.

    Signed-off-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Mark Tinguely
     
  • xfsaild idle mode logic currently leads to a couple hangs:

    1.) If xfsaild is rescheduled in during an incremental scan
    (i.e., tout != 0) and the target has been updated since
    the previous run, we can hit the new target and go into
    idle mode with a still populated ail.
    2.) A wake up is only issued when the target is pushed forward.
    The wake up can race with xfsaild if it is currently in the
    process of entering idle mode, causing future wake up
    events to be lost.

    These hangs have been reproduced and verified as fixed by
    running xfstests 273 in a loop on a slightly modified upstream
    kernel. The kernel is modified to re-enable idle mode as
    previously implemented (when count == 0) and with a revert of
    commit 670ce93f, which includes performance improvements that
    make this harder to reproduce.

    The solution, the algorithm for which has been outlined by
    Dave Chinner, is to modify xfsaild to enter idle mode only when
    the ail is empty and the push target has not been moved forward
    since the last push.

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Brian Foster
     
  • Content-Disposition: inline; filename=xfs-remove-iolock-classes

    Now that we never take the iolock during inode reclaim we don't need
    to play games with lock classes.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Rich Johnston
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Same rational as the last patch - these inodes are not reachable, so
    don't bother with locking.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Rich Johnston
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • An inode that enters xfs_inactive has been removed from all global
    lists but the inode hash, and can't be recycled in xfs_iget before
    it has been marked reclaimable. Thus taking the iolock in here
    is not nessecary at all, and given the amount of lockdep false
    positives it has triggered already I'd rather remove the locking.

    The only change outside of xfs_inactive is relaxing an assert in
    xfs_itruncate_extents.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Rich Johnston
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Remove this helper as the code flow is a lot more obvious when it gets
    merged into its only caller.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Rich Johnston
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • The code to reserve log space and join the inode to the transaction is
    common for all cases, so don't duplicate it. Also remove the trivial
    xfs_inactive_symlink_local helper which can simply be opencode now.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Rich Johnston
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Refactor the AG selection loop in xfs_dialloc to operate on the in-memory
    perag data as much as possible. We only read the AGI buffer once we have
    selected an AG to allocate inodes now instead of for every AG considered.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Loop over the in-core perag structures and prefer using pagi_freecount over
    going out to the AGI buffer where possible.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • In this case we already have selected an AG and know it has free space
    beause the buffer lock never got released. Jump directly into xfs_dialloc_ag
    and short cut the AG selection loop.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • We can simplify check the IO_agbp pointer for being non-NULL instead of
    passing another argument through two layers of function calls.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Move the actual allocation once we have selected an allocation group into a
    separate helper, and make xfs_dialloc a wrapper around it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

24 Jul, 2012

1 commit

  • Pull the big VFS changes from Al Viro:
    "This one is *big* and changes quite a few things around VFS. What's in there:

    - the first of two really major architecture changes - death to open
    intents.

    The former is finally there; it was very long in making, but with
    Miklos getting through really hard and messy final push in
    fs/namei.c, we finally have it. Unlike his variant, this one
    doesn't introduce struct opendata; what we have instead is
    ->atomic_open() taking preallocated struct file * and passing
    everything via its fields.

    Instead of returning struct file *, it returns -E... on error, 0
    on success and 1 in "deal with it yourself" case (e.g. symlink
    found on server, etc.).

    See comments before fs/namei.c:atomic_open(). That made a lot of
    goodies finally possible and quite a few are in that pile:
    ->lookup(), ->d_revalidate() and ->create() do not get struct
    nameidata * anymore; ->lookup() and ->d_revalidate() get lookup
    flags instead, ->create() gets "do we want it exclusive" flag.

    With the introduction of new helper (kern_path_locked()) we are rid
    of all struct nameidata instances outside of fs/namei.c; it's still
    visible in namei.h, but not for long. Come the next cycle,
    declaration will move either to fs/internal.h or to fs/namei.c
    itself. [me, miklos, hch]

    - The second major change: behaviour of final fput(). Now we have
    __fput() done without any locks held by caller *and* not from deep
    in call stack.

    That obviously lifts a lot of constraints on the locking in there.
    Moreover, it's legal now to call fput() from atomic contexts (which
    has immediately simplified life for aio.c). We also don't need
    anti-recursion logics in __scm_destroy() anymore.

    There is a price, though - the damn thing has become partially
    asynchronous. For fput() from normal process we are guaranteed
    that pending __fput() will be done before the caller returns to
    userland, exits or gets stopped for ptrace.

    For kernel threads and atomic contexts it's done via
    schedule_work(), so theoretically we might need a way to make sure
    it's finished; so far only one such place had been found, but there
    might be more.

    There's flush_delayed_fput() (do all pending __fput()) and there's
    __fput_sync() (fput() analog doing __fput() immediately). I hope
    we won't need them often; see warnings in fs/file_table.c for
    details. [me, based on task_work series from Oleg merged last
    cycle]

    - sync series from Jan

    - large part of "death to sync_supers()" work from Artem; the only
    bits missing here are exofs and ext4 ones. As far as I understand,
    those are going via the exofs and ext4 trees resp.; once they are
    in, we can put ->write_super() to the rest, along with the thread
    calling it.

    - preparatory bits from unionmount series (from dhowells).

    - assorted cleanups and fixes all over the place, as usual.

    This is not the last pile for this cycle; there's at least jlayton's
    ESTALE work and fsfreeze series (the latter - in dire need of fixes,
    so I'm not sure it'll make the cut this cycle). I'll probably throw
    symlink/hardlink restrictions stuff from Kees into the next pile, too.
    Plus there's a lot of misc patches I hadn't thrown into that one -
    it's large enough as it is..."

    * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits)
    ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file()
    btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file()
    switch dentry_open() to struct path, make it grab references itself
    spufs: shift dget/mntget towards dentry_open()
    zoran: don't bother with struct file * in zoran_map
    ecryptfs: don't reinvent the wheels, please - use struct completion
    don't expose I_NEW inodes via dentry->d_inode
    tidy up namei.c a bit
    unobfuscate follow_up() a bit
    ext3: pass custom EOF to generic_file_llseek_size()
    ext4: use core vfs llseek code for dir seeks
    vfs: allow custom EOF in generic_file_llseek code
    vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
    vfs: Remove unnecessary flushing of block devices
    vfs: Make sys_sync writeout also block device inodes
    vfs: Create function for iterating over block devices
    vfs: Reorder operations during sys_sync
    quota: Move quota syncing to ->sync_fs method
    quota: Split dquot_quota_sync() to writeback and cache flushing part
    vfs: Move noop_backing_dev_info check from sync into writeback
    ...

    Linus Torvalds
     

23 Jul, 2012

3 commits


22 Jul, 2012

5 commits


14 Jul, 2012

10 commits

  • boolean "does it have to be exclusive?" flag is passed instead;
    Local filesystem should just ignore it - the object is guaranteed
    not to be there yet.

    Signed-off-by: Al Viro

    Al Viro
     
  • Just the flags; only NFS cares even about that, but there are
    legitimate uses for such argument. And getting rid of that
    completely would require splitting ->lookup() into a couple
    of methods (at least), so let's leave that alone for now...

    Signed-off-by: Al Viro

    Al Viro
     
  • xfs_bdstrat_cb only adds a check for a shutdown filesystem over
    xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down
    filesystem a little earlier. In addition the shutdown handling in
    xfs_bdstrat_cb is not very suitable for this caller.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • If the b_iodone handler is run in calling context in xfs_buf_iorequest we
    can run into a recursion where xfs_buf_iodone_callbacks keeps calling back
    into xfs_buf_iorequest because an I/O error happened, which keeps calling
    back into xfs_buf_iorequest. This chain will usually not take long
    because the filesystem gets shut down because of log I/O errors, but even
    over a short time it can cause stack overflows if run on the same context.

    As a short term workaround make sure we always call the iodone handler in
    workqueue context.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Almost all metadata allocations come from shallow stack usage
    situations. Avoid the overhead of switching the allocation to a
    workqueue as we are not in danger of running out of stack when
    making these allocations. Metadata allocations are already marked
    through the args that are passed down, so this is trivial to do.

    Signed-off-by: Dave Chinner
    Reported-by: Mel Gorman
    Tested-by: Mel Gorman
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The current cursor is reallocated when retrying the allocation, so
    the existing cursor needs to be destroyed in both the restart and
    the failure cases.

    Signed-off-by: Dave Chinner
    Tested-by: Mike Snitzer
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_bdstrat_cb only adds a check for a shutdown filesystem over
    xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down
    filesystem a little earlier. In addition the shutdown handling in
    xfs_bdstrat_cb is not very suitable for this caller.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • If the b_iodone handler is run in calling context in xfs_buf_iorequest we
    can run into a recursion where xfs_buf_iodone_callbacks keeps calling back
    into xfs_buf_iorequest because an I/O error happened, which keeps calling
    back into xfs_buf_iorequest. This chain will usually not take long
    because the filesystem gets shut down because of log I/O errors, but even
    over a short time it can cause stack overflows if run on the same context.

    As a short term workaround make sure we always call the iodone handler in
    workqueue context.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Almost all metadata allocations come from shallow stack usage
    situations. Avoid the overhead of switching the allocation to a
    workqueue as we are not in danger of running out of stack when
    making these allocations. Metadata allocations are already marked
    through the args that are passed down, so this is trivial to do.

    Signed-off-by: Dave Chinner
    Reported-by: Mel Gorman
    Tested-by: Mel Gorman
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The current cursor is reallocated when retrying the allocation, so
    the existing cursor needs to be destroyed in both the restart and
    the failure cases.

    Signed-off-by: Dave Chinner
    Tested-by: Mike Snitzer
    Signed-off-by: Ben Myers

    Dave Chinner
     

02 Jul, 2012

6 commits

  • The buffer reading code in xfs_dir2_leaf_getdents is complex and difficult to
    follow due to the readahead and all the context is carries. it is also badly
    indented and so difficult to read. Factor it out into a separate function to
    make it easier to understand and optimise in future patches.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The struct xfs_dabuf now only tracks a single xfs_buf and all the
    information it holds can be gained directly from the xfs_buf. Hence
    we can remove the struct dabuf and pass the xfs_buf around
    everywhere.

    Kill the struct dabuf and the associated infrastructure.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • First step in converting the directory code to use native
    discontiguous buffers and replacing the dabuf construct.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • discontigous buffer in separate buffer format structures. This means log
    recovery will recover all the changes on a per segment basis without
    requiring any knowledge of the fact that it was logged from a
    compound buffer.

    To do this, we need to be able to determine what buffer segment any
    given offset into the compound buffer sits over. This enables us to
    translate the dirty bitmap in the number of separate buffer format
    structures required.

    We also need to be able to determine the number of bitmap elements
    that a given buffer segment has, as this determines the size of the
    buffer format structure. Hence we need to be able to determine the
    both the start offset into the buffer and the length of a given
    segment to be able to calculate this.

    With this information, we can preallocate, build and format the
    correct log vector array for each segment in a compound buffer to
    appear exactly the same as individually logged buffers in the log.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Now that the buffer cache supports discontiguous buffers, add
    support to the transaction buffer interface for getting and reading
    buffers.

    Note that this patch does not convert the buffer item logging to
    support discontiguous buffers. That will be done as a separate
    commit.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • With the internal interfaces supporting discontiguous buffer maps,
    add external lookup, read and get interfaces so they can start to be
    used.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner