13 Sep, 2013

1 commit

  • Pull xfs update #2 from Ben Myers:
    "Here we have defrag support for v5 superblock, a number of bugfixes
    and a cleanup or two.

    - defrag support for CRC filesystems
    - fix endian worning in xlog_recover_get_buf_lsn
    - fixes for sparse warnings
    - fix for assert in xfs_dir3_leaf_hdr_from_disk
    - fix for log recovery of remote symlinks
    - fix for log recovery of btree root splits
    - fixes formemory allocation failures with ACLs
    - fix for assert in xfs_buf_item_relse
    - fix for assert in xfs_inode_buf_verify
    - fix an assignment in an assert that should be a test in
    xfs_bmbt_change_owner
    - remove dead code in xlog_recover_inode_pass2"

    * tag 'xfs-for-linus-v3.12-rc1-2' of git://oss.sgi.com/xfs/xfs:
    xfs: remove dead code from xlog_recover_inode_pass2
    xfs: = vs == typo in ASSERT()
    xfs: don't assert fail on bad inode numbers
    xfs: aborted buf items can be in the AIL.
    xfs: factor all the kmalloc-or-vmalloc fallback allocations
    xfs: fix memory allocation failures with ACLs
    xfs: ensure we copy buffer type in da btree root splits
    xfs: set remote symlink buffer type for recovery
    xfs: recovery of swap extents operations for CRC filesystems
    xfs: swap extents operations for CRC filesystems
    xfs: check magic numbers in dir3 leaf verifier first
    xfs: fix some minor sparse warnings
    xfs: fix endian warning in xlog_recover_get_buf_lsn()

    Linus Torvalds
     

11 Sep, 2013

2 commits

  • Convert superblock shrinker to use the new count/scan API, and propagate
    the API changes through to the filesystem callouts. The filesystem
    callouts already use a count/scan API, so it's just changing counters to
    longs to match the VM API.

    This requires the dentry and inode shrinker callouts to be converted to
    the count/scan API. This is mainly a mechanical change.

    [glommer@openvz.org: use mult_frac for fractional proportions, build fixes]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Acked-by: Mel Gorman
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Signed-off-by: Al Viro

    Dave Chinner
     
  • This is the recovery side of the btree block owner change operation
    performed by swapext on CRC enabled filesystems. We detect that an
    owner change is needed by the flag that has been placed on the inode
    log format flag field. Because the inode recovery is being replayed
    after the buffers that make up the BMBT in the given checkpoint, we
    can walk all the buffers and directly modify them when we see the
    flag set on an inode.

    Because the inode can be relogged and hence present in multiple
    chekpoints with the "change owner" flag set, we could do multiple
    passes across the inode to do this change. While this isn't optimal,
    we can't directly ignore the flag as there may be multiple
    independent swap extent operations being replayed on the same inode
    in different checkpoints so we can't ignore them.

    Further, because the owner change operation uses ordered buffers, we
    might have buffers that are newer on disk than the current
    checkpoint and so already have the owner changed in them. Hence we
    cannot just peek at a buffer in the tree and check that it has the
    correct owner and assume that the change was completed.

    So, for the moment just brute force the owner change every time we
    see an inode with the flag set. Note that we have to be careful here
    because the owner of the buffers may point to either the old owner
    or the new owner. Currently the verifier can't verify the owner
    directly, so there is no failure case here right now. If we verify
    the owner exactly in future, then we'll have to take this into
    account.

    This was tested in terms of normal operation via xfstests - all of
    the fsr tests now pass without failure. however, we really need to
    modify xfs/227 to stress v3 inodes correctly to ensure we fully
    cover this case for v5 filesystems.

    In terms of recovery testing, I used a hacked version of xfs_fsr
    that held the temp inode open for a few seconds before exiting so
    that the filesystem could be shut down with an open owner change
    recovery flags set on at least the temp inode. fsr leaves the temp
    inode unlinked and in btree format, so this was necessary for the
    owner change to be reliably replayed.

    logprint confirmed the tmp inode in the log had the correct flag set:

    INO: cnt:3 total:3 a:0x69e9e0 len:56 a:0x69ea20 len:176 a:0x69eae0 len:88
    INODE: #regs:3 ino:0x44 flags:0x209 dsize:88
    ^^^^^

    0x200 is set, indicating a data fork owner change needed to be
    replayed on inode 0x44. A printk in the revoery code confirmed that
    the inode change was recovered:

    XFS (vdc): Mounting Filesystem
    XFS (vdc): Starting recovery (logdev: internal)
    recovering owner change ino 0x44
    XFS (vdc): Version 5 superblock detected. This kernel L support enabled!
    Use of these features in this kernel is at your own risk!
    XFS (vdc): Ending recovery (logdev: internal)

    The script used to test this was:

    $ cat ./recovery-fsr.sh
    #!/bin/bash

    dev=/dev/vdc
    mntpt=/mnt/scratch
    testfile=$mntpt/testfile

    umount $mntpt
    mkfs.xfs -f -m crc=1 $dev
    mount $dev $mntpt
    chmod 777 $mntpt

    for i in `seq 10000 -1 0`; do
    xfs_io -f -d -c "pwrite $(($i * 4096)) 4096" $testfile > /dev/null 2>&1
    done
    xfs_bmap -vp $testfile |head -20

    xfs_fsr -d -v $testfile &
    sleep 10
    /home/dave/src/xfstests-dev/src/godown -f $mntpt
    wait
    umount $mntpt

    xfs_logprint -t $dev |tail -20
    time mount $dev $mntpt
    xfs_bmap -vp $testfile
    umount $mntpt
    $

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

16 Aug, 2013

1 commit


13 Aug, 2013

1 commit


27 Jun, 2013

1 commit


09 Nov, 2012

5 commits

  • Create a new mount workqueue and delayed_work to enable background
    scanning and freeing of eofblocks inodes. The scanner kicks in once
    speculative preallocation occurs and stops requeueing itself when
    no eofblocks inodes exist.

    The scan interval is based on the new
    'speculative_prealloc_lifetime' tunable (default to 5m). The
    background scanner performs unfiltered, best effort scans (which
    skips inodes under lock contention or with a dirty cache mapping).

    Signed-off-by: Brian Foster
    Reviewed-by: Mark Tinguely
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Brian Foster
     
  • The XFS_IOC_FREE_EOFBLOCKS ioctl allows users to invoke an EOFBLOCKS
    scan. The xfs_eofblocks structure is defined to support the command
    parameters (scan mode).

    Signed-off-by: Brian Foster
    Reviewed-by: Mark Tinguely
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Brian Foster
     
  • xfs_inodes_free_eofblocks() implements scanning functionality for
    EOFBLOCKS inodes. It uses the AG iterator to walk the tagged inodes
    and free post-EOF blocks via the xfs_inode_free_eofblocks() execute
    function. The scan can be invoked in best-effort mode or wait
    (force) mode.

    A best-effort scan (default) handles all inodes that do not have a
    dirty cache and we successfully acquire the io lock via trylock. In
    wait mode, we continue to cycle through an AG until all inodes are
    handled.

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Brian Foster
     
  • Genericize xfs_inode_ag_walk() to support an optional radix tree tag
    and args argument for the execute function. Create a new wrapper
    called xfs_inode_ag_iterator_tag() that performs a tag based walk
    of perag's and inodes.

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Brian Foster
     
  • Add the XFS_ICI_EOFBLOCKS_TAG inode tag to identify inodes with
    speculatively preallocated blocks beyond EOF. An inode is tagged
    when speculative preallocation occurs and untagged either via
    truncate down or when post-EOF blocks are freed via release or
    reclaim.

    The tag management is intentionally not aggressive to prefer
    simplicity over the complexity of handling all the corner cases
    under which post-EOF blocks could be freed (i.e., forward
    truncation, fallocate, write error conditions, etc.). This means
    that a tagged inode may or may not have post-EOF blocks after a
    period of time. The tag is eventually cleared when the inode is
    released or reclaimed.

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Brian Foster
     

18 Oct, 2012

2 commits

  • The inode cache functions remaining in xfs_iget.c can be moved to xfs_icache.c
    along with the other inode cache functions. This removes all functionality from
    xfs_iget.c, so the file can simply be removed.

    This move results in various functions now only having the scope of a single
    file (e.g. xfs_inode_free()), so clean up all the definitions and exported
    prototypes in xfs_icache.[ch] and xfs_inode.h appropriately.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_sync.c now only contains inode reclaim functions and inode cache
    iteration functions. It is not related to sync operations anymore.
    Rename to xfs_icache.c to reflect it's contents and prepare for
    consolidation with the other inode cache file that exists
    (xfs_iget.c).

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner