26 Sep, 2008

2 commits

  • Yet another bug was found in xfs_iext_irec_compact_full() and while the
    source of the bug was found it wasn't an easy task to track it down
    because the conditions are very difficult to reproduce.

    A HUGE thank-you goes to Russell Cattelan and Eric Sandeen for their
    significant effort in tracking down the source of this corruption.

    xfs_iext_irec_compact_full() and xfs_iext_irec_compact_pages() are almost
    identical - they both compact indirect extent lists by moving extents from
    subsequent buffers into earlier ones. xfs_iext_irec_compact_pages() only
    moves extents if all of the extents in the next buffer will fit into the
    empty space in the buffer before it. xfs_iext_irec_compact_full() will go
    a step further and move part of the next buffer if all the extents wont
    fit. It will then shift the remaining extents in the next buffer up to the
    start of the buffer. The bug here was that we did not update er_extoff and
    this caused extent list corruption.

    It does not appear that this extra functionality gains us much. Calling
    xfs_iext_irec_compact_pages() instead will do a good enough job at
    compacting the indirect list and will be quicker too.

    For the case in xfs_iext_indirect_to_direct() the total number of extents
    in the indirect list will fit into one buffer so we will never need the
    extra functionality of xfs_iext_irec_compact_full() there.

    Also xfs_iext_irec_compact_pages() doesn't need to do a memmove() (the
    buffers will never overlap) so we don't want the performance hit that can
    incur.

    SGI-PV: 987159

    SGI-Modid: xfs-linux-melb:xfs-kern:32166a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Eric Sandeen

    Lachlan McIlroy
     
  • If we don't move all the records from the next buffer into the current
    buffer then we need to update the er_extoff field of the next buffer as we
    shift the remaining records to the start of the buffer.

    SGI-PV: 987159

    SGI-Modid: xfs-linux-melb:xfs-kern:32165a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Eric Sandeen
    Signed-off-by: Russell Cattelan

    Lachlan McIlroy
     

17 Sep, 2008

7 commits

  • When unreserving space with boundaries that are not block aligned we round
    up the start and round down the end boundaries and then use this function,
    xfs_zero_remaining_bytes(), to zero the parts of the blocks that got
    dropped during the rounding. The problem is we don't consider if these
    blocks are beyond eof. Worse still is if we encounter delayed allocations
    beyond eof we will try to use the magic delayed allocation block number as
    a real block number. If the file size is ever extended to expose these
    blocks then we'll go through xfs_zero_eof() to zero them anyway.

    SGI-PV: 983683

    SGI-Modid: xfs-linux-melb:xfs-kern:32055a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig

    Lachlan McIlroy
     
  • We have a use-after-free issue where log completions access buffers via
    the buffer log item and the buffer has already been freed. Fix this by
    taking a reference on the buffer when attaching the buffer log item and
    release the hold when the buffer log item is detached and we no longer
    need the buffer. Also create a new function xfs_buf_item_free() to combine
    some common code.

    SGI-PV: 985757

    SGI-Modid: xfs-linux-melb:xfs-kern:32025a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig

    Lachlan McIlroy
     
  • If we call xfs_lock_two_inodes() to grab both the iolock and the ilock,
    then drop the ilocks on both inodes, then grab them again (as
    xfs_swap_extents() does) then lockdep will report a locking order problem.
    This is a false positive.

    To avoid this, disallow xfs_lock_two_inodes() fom locking both inode locks
    at once - force calers to make two separate calls. This means that nested
    dropping and regaining of the ilocks will retain the same lockdep subclass
    and so lockdep will not see anything wrong with this code.

    SGI-PV: 986238

    SGI-Modid: xfs-linux-melb:xfs-kern:31999a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Peter Leckie
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • The current code in xlog_iodone() uses the wrong macro to check if the
    barrier has been cleared due to an EOPNOTSUPP error form the lower layer.

    SGI-PV: 986143

    SGI-Modid: xfs-linux-melb:xfs-kern:31984a

    Signed-off-by: David Chinner
    Signed-off-by: Nathaniel W. Turner
    Signed-off-by: Peter Leckie
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • With the help from some tracing I found that we try to map extents beyond
    eof when doing a direct I/O read. It appears that the way to inform the
    generic direct I/O path (ie do_direct_IO()) that we have breached eof is
    to return an unmapped buffer from xfs_get_blocks_direct(). This will cause
    do_direct_IO() to jump to the hole handling code where is will check for
    eof and then abort.

    This problem was found because a direct I/O read was trying to map beyond
    eof and was encountering delayed allocations. The delayed allocations
    beyond eof are speculative allocations and they didn't get converted when
    the direct I/O flushed the file because there was only enough space in the
    current AG to convert and write out the dirty pages within eof. Note that
    xfs_iomap_write_allocate() wont necessarily convert all the delayed
    allocation passed to it - it will return after allocating the first extent
    - so if the delayed allocation extends beyond eof then it will stay that
    way.

    SGI-PV: 983683

    SGI-Modid: xfs-linux-melb:xfs-kern:31929a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig

    Lachlan McIlroy
     
  • Logically we would return an error in xfs_fs_remount code to prevent users
    from believing they might have changed mount options using remount which
    can't be changed.

    But unfortunately mount(8) adds all options from mtab and fstab to the
    mount arguments in some cases so we can't blindly reject options, but have
    to check for each specified option if it actually differs from the
    currently set option and only reject it if that's the case.

    Until that is implemented we return success for every remount request, and
    silently ignore all options that we can't actually change.

    SGI-PV: 985710

    SGI-Modid: xfs-linux-melb:xfs-kern:31908a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Memory allocations for log->l_grant_trace and iclog->ic_trace are done on
    demand when the first event is logged. In xlog_state_get_iclog_space() we
    call xlog_trace_iclog() under a spinlock and allocating memory here can
    cause us to sleep with a spinlock held and deadlock the system.

    For the log grant tracing we use KM_NOSLEEP but that means we can lose
    trace entries. Since there is no locking to serialize the log grant
    tracing we could race and have multiple allocations and leak memory.

    So move the allocations to where we initialize the log/iclog structures.
    Use KM_NOFS to avoid recursing into the filesystem and drop log->l_trace
    since it's not even used.

    SGI-PV: 983738

    SGI-Modid: xfs-linux-melb:xfs-kern:31896a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig

    Lachlan McIlroy
     

25 Aug, 2008

2 commits


24 Aug, 2008

1 commit


14 Aug, 2008

1 commit

  • The patches that are intended to introduce copy-on-write credentials for 2.6.28
    require abstraction of access to some fields of the task structure,
    particularly for the case of one task accessing another's credentials where RCU
    will have to be observed.

    Introduced here are trivial no-op versions of the desired accessors for current
    and other tasks so that other subsystems can start to be converted over more
    easily.

    Wrappers are introduced into a new header (linux/cred.h) for UID/GID,
    EUID/EGID, SUID/SGID, FSUID/FSGID, cap_effective and current's subscribed
    user_struct. These wrappers are macros because the ordering between header
    files mitigates against making them inline functions.

    linux/cred.h is #included from linux/sched.h.

    Further, XFS is modified such that it no longer defines and uses parameterised
    versions of current_fs[ug]id(), thus getting rid of the namespace collision
    otherwise incurred.

    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     

13 Aug, 2008

27 commits

  • The ticket allocation code got reworked in 2.6.26 and we now free tickets
    whereas before we used to cache them so the use-after-free went
    undetected.

    SGI-PV: 985525

    SGI-Modid: xfs-linux-melb:xfs-kern:31877a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner

    Lachlan McIlroy
     
  • xfs_bmap_count_leaves and xfs_bmap_disk_count_leaves always return always
    0, make them void.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31844a

    Signed-off-by: Ruben Porras
    Signed-off-by: Donald Douwsma
    Signed-off-by: Lachlan McIlroy

    Ruben Porras
     
  • Use KM_NOFS to prevent recursion back into the filesystem which can cause
    deadlocks.

    In the case of xfs_iread() we hold the lock on the inode cluster buffer
    while allocating memory for the trace buffers. If we recurse back into XFS
    to flush data that may require a transaction to allocate extents which
    needs log space. This can deadlock with the xfsaild thread which can't
    push the tail of the log because it is trying to get the inode cluster
    buffer lock.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31838a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner

    Lachlan McIlroy
     
  • Use KM_MAYFAIL for the m_perag allocation, we can deal with the error
    easily and blocking forever during mount is not a good idea either.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31837a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • xfs_mount_free mostly frees the perag data, which is something that is
    duplicated in the mount error path.

    Move the XFS_QM_DONE call to the caller and remove the useless
    mutex_destroy/spinlock_destroy calls so that we can re-use it for the
    mount error path. Also rename it to xfs_free_perag to reflect what it
    does.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31836a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • xfs_readsb is called before xfs_mount so xfs_freesb should be called after
    xfs_unmountfs, too. This means it now happens after a few things during
    the of xfs_unmount which all have nothing to do with the superblock.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31835a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • xfs_unmounts can't and shouldn't return errors so declare it as returning
    void.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31833a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Remove all the useless flags and code keyed off it in xfs_mountfs.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31831a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • The root inode is allocated in xfs_mountfs so it should be release in
    xfs_unmountfs. For the unmount case that means we do it after the the
    xfs_sync(mp, SYNC_WAIT | SYNC_CLOSE) in the forced shutdown case and the
    dmapi unmount event. Note that both reference the rip variable which might
    be freed by that time in case inode flushing has kicked in, so strictly
    speaking this might count as a bug fix

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31830a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • xfs_ichtime updates the xfs_inode and Linux inode timestamps just fine, no
    need to call file_update_time and then copy the values over to the XFS
    inode. The only additional thing in file_update_time are checks not
    applicable to the write path.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31829a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Port a little optmization from file_update_time to xfs_ichgtime, and only
    update the timestamp and mark the inode dirty if the timestamp actually
    changes in the timer tick resultion supported by the running kernel.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31827a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • In xfs_ialloc we just want to set all timestamps to the current time. We
    don't need to mark the inode dirty like xfs_ichgtime does, and we don't
    need nor want the opimizations in xfs_ichgtime that I will introduce in
    the next patch.

    So just opencode the timestamp update in xfs_ialloc, and remove the new
    unused XFS_ICHGTIME_ACC case in xfs_ichgtime.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31825a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Now that all users of the sema_t are gone from XFS we can finally kill it.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31823a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • Use the new completion flush code to implement the dquot flush lock.
    Removes one of the final users of semaphores in the XFS code base.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31822a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • Use the new completion flush code to implement the inode flush lock.
    Removes one of the final users of semaphores in the XFS code base.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31817a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • The xfs_buf_t b_iodonesema is really just a semaphore that wants to be a
    completion. Change it to a completion and remove the last user of the
    sema_t from XFS.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31815a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • A lot of code has been converted away from semaphores, but there are still
    comments that reference semaphore behaviour. The log code is the worst
    offender. Update the comments to reflect what the code really does now.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31814a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31813a

    Signed-off-by: Harvey Harrison
    Signed-off-by: Lachlan McIlroy

    Harvey Harrison
     
  • SGI-PV: 957103

    SGI-Modid: xfs-linux-melb:xfs-kern:31804a

    Signed-off-by: Lachlan McIlroy

    Lachlan McIlroy
     
  • The alloc and inobt btree use the same agbp/agno pair in the btree_cur
    union. Make them use the same bc_private.a union member so that code for
    these two short form btree implementations can be shared.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31788a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Remove unneeded xfs_btree_get_block forward declaration. Move
    xfs_btree_firstrec next to xfs_btree_lastrec.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31787a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Sanitize setting up the Linux indode.

    Setting up the xfs_inode inode link is opencoded in xfs_iget_core now
    because that's the only place it needs to be done, xfs_initialize_vnode is
    renamed to xfs_setup_inode and loses all superflous paramaters. The check
    for I_NEW is removed because it always is true and the di_mode check moves
    into xfs_iget_core because it's only needed there.

    xfs_set_inodeops and xfs_revalidate_inode are merged into xfs_setup_inode
    and the whole things is moved into xfs_iops.c where it belongs.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31782a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Niv Sardi
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • All remaining bhv_vnode_t instance are in code that's more or less Linux
    specific. (Well, for xfs_acl.c that could be argued, but that code is on
    the removal list, too). So just do an s/bhv_vnode_t/struct inode/ over the
    whole tree. We can clean up variable naming and some useless helpers
    later.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31781a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • In various places we can just move a VFS_I call into the argument list of
    called functions/macros instead of having a local bhv_vnode_t.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31776a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • When multiple inodes are locked in XFS it happens in order of the inode
    number, with the everything but the first inode trylocked if any of the
    previous inodes is in the AIL.

    Except for the sorting of the inodes this logic is implemented in
    xfs_lock_inodes, but also partially duplicated in xfs_lock_dir_and_entry
    in a particularly stupid way adds a lock roundtrip if the inode ordering
    is not optimal.

    This patch adds a new helper xfs_lock_two_inodes that takes two inodes and
    locks them in the most optimal way according to the above locking protocol
    and uses it for all places that want to lock two inodes.

    The only caller of xfs_lock_inodes is xfs_rename which might lock up to
    four inodes.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31772a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Donald Douwsma
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • All the error injection is already enabled through ifdef DEBUG, so kill
    the never set second cpp symbol to activate it without the rest of the
    debugging infrastructure.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31771a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Niv Sardi
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Now that all direct calls to VN_HOLD/VN_RELE are gone we can implement
    IHOLD/IRELE directly.

    For the IHOLD case also replace igrab with a direct increment of i_count
    because we are guaranteed to already have a live and referenced inode by
    the VFS. Also remove the vn_hold statistic because it's been rather
    meaningless for some time with most references done by other callers.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31764a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig