03 Jul, 2019

1 commit


21 Feb, 2019

1 commit

  • Add a mode where XFS never overwrites existing blocks in place. This
    is to aid debugging our COW code, and also put infatructure in place
    for things like possible future support for zoned block devices, which
    can't support overwrites.

    This mode is enabled globally by doing a:

    echo 1 > /sys/fs/xfs/debug/always_cow

    Note that the parameter is global to allow running all tests in xfstests
    easily in this mode, which would not easily be possible with a per-fs
    sysfs file.

    In always_cow mode persistent preallocations are disabled, and fallocate
    will fail when called with a 0 mode (with our without
    FALLOC_FL_KEEP_SIZE), and not create unwritten extent for zeroed space
    when called with FALLOC_FL_ZERO_RANGE or FALLOC_FL_UNSHARE_RANGE.

    There are a few interesting xfstests failures when run in always_cow
    mode:

    - generic/392 fails because the bytes used in the file used to test
    hole punch recovery are less after the log replay. This is
    because the blocks written and then punched out are only freed
    with a delay due to the logging mechanism.
    - xfs/170 will fail as the already fragile file streams mechanism
    doesn't seem to interact well with the COW allocator
    - xfs/180 xfs/182 xfs/192 xfs/198 xfs/204 and xfs/208 will claim
    the file system is badly fragmented, but there is not much we
    can do to avoid that when always writing out of place
    - xfs/205 fails because overwriting a file in always_cow mode
    will require new space allocation and the assumption in the
    test thus don't work anymore.
    - xfs/326 fails to modify the file at all in always_cow mode after
    injecting the refcount error, leading to an unexpected md5sum
    after the remount, but that again is expected

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Christoph Hellwig
     

07 Jun, 2018

1 commit

  • Remove the verbose license text from XFS files and replace them
    with SPDX tags. This does not change the license of any of the code,
    merely refers to the common, up-to-date license files in LICENSES/

    This change was mostly scripted. fs/xfs/Makefile and
    fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
    and modified by the following command:

    for f in `git grep -l "GNU General" fs/xfs/` ; do
    echo $f
    cat $f | awk -f hdr.awk > $f.new
    mv -f $f.new $f
    done

    And the hdr.awk script that did the modification (including
    detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
    is as follows:

    $ cat hdr.awk
    BEGIN {
    hdr = 1.0
    tag = "GPL-2.0"
    str = ""
    }

    /^ \* This program is free software/ {
    hdr = 2.0;
    next
    }

    /any later version./ {
    tag = "GPL-2.0+"
    next
    }

    /^ \*\// {
    if (hdr > 0.0) {
    print "// SPDX-License-Identifier: " tag
    print str
    print $0
    str=""
    hdr = 0.0
    next
    }
    print $0
    next
    }

    /^ \* / {
    if (hdr > 1.0)
    next
    if (hdr > 0.0) {
    if (str != "")
    str = str "\n"
    str = str $0
    next
    }
    print $0
    next
    }

    /^ \*/ {
    if (hdr > 0.0)
    next
    print $0
    next
    }

    // {
    if (hdr > 0.0) {
    if (str != "")
    str = str "\n"
    str = str $0
    next
    }
    print $0
    }

    END { }
    $

    Signed-off-by: Dave Chinner
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

16 May, 2018

1 commit

  • Similar to log_recovery_delay, this delay occurs between the VFS
    superblock being initialised and the xfs_mount being fully
    initialised. It also poisons the per-ag radix tree node so that it
    can be used for triggering shrinker races during mount
    such as the following:

    $ cat dirty-mount.sh
    #! /bin/bash

    umount -f /dev/pmem0
    mkfs.xfs -f /dev/pmem0
    mount /dev/pmem0 /mnt/test
    rm -f /mnt/test/foo
    xfs_io -fxc "pwrite 0 4k" -c fsync -c "shutdown" /mnt/test/foo
    umount /dev/pmem0

    # let's crash it now!
    echo 30 > /sys/fs/xfs/debug/mount_delay
    mount /dev/pmem0 /mnt/test
    echo 0 > /sys/fs/xfs/debug/mount_delay
    umount /dev/pmem0
    $ sudo ./dirty-mount.sh
    .....
    [ 60.378118] CPU: 3 PID: 3577 Comm: fs_mark Tainted: G D W 4.16.0-rc5-dgc #440
    [ 60.378120] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    [ 60.378124] RIP: 0010:radix_tree_next_chunk+0x76/0x320
    [ 60.378127] RSP: 0018:ffffc9000276f4f8 EFLAGS: 00010282
    [ 60.383670] RAX: a5a5a5a5a5a5a5a4 RBX: 0000000000000010 RCX: 000000000000001a
    [ 60.385277] RDX: 0000000000000000 RSI: ffffc9000276f540 RDI: 0000000000000000
    [ 60.386554] RBP: 0000000000000000 R08: 0000000000000000 R09: a5a5a5a5a5a5a5a5
    [ 60.388194] R10: 0000000000000006 R11: 0000000000000001 R12: ffffc9000276f598
    [ 60.389288] R13: 0000000000000040 R14: 0000000000000228 R15: ffff880816cd6458
    [ 60.390827] FS: 00007f5c124b9740(0000) GS:ffff88083fc00000(0000) knlGS:0000000000000000
    [ 60.392253] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 60.393423] CR2: 00007f5c11bba0b8 CR3: 000000035580e001 CR4: 00000000000606e0
    [ 60.394519] Call Trace:
    [ 60.395252] radix_tree_gang_lookup_tag+0xc4/0x130
    [ 60.395948] xfs_perag_get_tag+0x37/0xf0
    [ 60.396522] xfs_reclaim_inodes_count+0x32/0x40
    [ 60.397178] xfs_fs_nr_cached_objects+0x11/0x20
    [ 60.397837] super_cache_count+0x35/0xc0
    [ 60.399159] shrink_slab.part.66+0xb1/0x370
    [ 60.400194] shrink_node+0x7e/0x1a0
    [ 60.401058] try_to_free_pages+0x199/0x470
    [ 60.402081] __alloc_pages_slowpath+0x3a1/0xd20
    [ 60.403729] __alloc_pages_nodemask+0x1c3/0x200
    [ 60.404941] cache_grow_begin+0x20b/0x2e0
    [ 60.406164] fallback_alloc+0x160/0x200
    [ 60.407088] kmem_cache_alloc+0x111/0x4e0
    [ 60.408038] ? xfs_buf_rele+0x61/0x430
    [ 60.408925] kmem_zone_alloc+0x61/0xe0
    [ 60.409965] xfs_inode_alloc+0x24/0x1d0
    .....

    Signed-Off-By: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Dave Chinner
     

19 Jun, 2017

1 commit

  • In DEBUG mode, assert failures unconditionally trigger a kernel BUG.
    This is useful in diagnostic situations to panic a system and
    collect detailed state information at the time of a failure.

    This can also cause problems in cases where DEBUG mode code is
    desired but it is preferable not trigger kernel BUGs on assert
    failure. For example, during development of new code or during
    certain xfstests tests that intentionally cause corruption and test
    the kernel for survival (but otherwise may expect to trigger assert
    failures).

    To provide additional flexibility, create the
    /fs/xfs/debug/bug_on_assert tunable to configure assert
    failure behavior at runtime. This tunable is only available in DEBUG
    mode and is enabled by default to preserve existing default
    behavior. When disabled, assert failures in DEBUG mode result in
    kernel warnings.

    Signed-off-by: Brian Foster
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Brian Foster
     

06 Oct, 2016

1 commit

  • Trim CoW reservations made on behalf of a cowextsz hint if they get too
    old or we run low on quota, so long as we don't have dirty data awaiting
    writeback or directio operations in progress.

    Garbage collection of the cowextsize extents are kept separate from
    prealloc extent reaping because setting the CoW prealloc lifetime to a
    (much) higher value than the regular prealloc extent lifetime has been
    useful for combatting CoW fragmentation on VM hosts where the VMs
    experience bursty write behaviors and we can keep the utilization ratios
    low enough that we don't start to run out of space. IOWs, it benefits
    us to keep the CoW fork reservations around for as long as we can unless
    we run out of blocks or hit inode reclaim.

    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig

    Darrick J. Wong
     

09 Sep, 2014

1 commit

  • XFS log recovery has been discovered to have race conditions with
    buffers when I/O errors occur. External tools are available to simulate
    I/O errors to XFS, but this alone is not sufficient for testing log
    recovery. XFS unconditionally resets the inactive region of the log
    prior to log recovery to avoid confusion over processing any partially
    written log records that might have been written before an unclean
    shutdown. Therefore, unconditional write I/O failures at mount time are
    caught by the reset sequence rather than log recovery and hinder the
    ability to test the latter.

    The device-mapper dm-flakey module uses an up/down timer to define a
    cycle for when to fail I/Os. Create a pre log recovery delay tunable
    that can be used to coordinate XFS log recovery with I/O errors
    simulated by dm-flakey. This facilitates coordination in userspace that
    allows the reset of stale log blocks to succeed and writes due to log
    recovery to fail. For example, define a dm-flakey instance with an
    uptime long enough to allow log reset to succeed and a log recovery
    delay long enough to allow the dm-flakey uptime to expire.

    The 'log_recovery_delay' sysfs tunable is exported under
    /sys/fs/xfs/debug and is only enabled for kernels compiled in XFS debug
    mode. The value is exported in units of seconds and allows for a delay
    of up to 60 seconds. Note that this is for XFS debug and test
    instrumentation purposes only and should not be used by applications. No
    delay is enabled by default.

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Brian Foster
     

09 Nov, 2012

1 commit

  • Create a new mount workqueue and delayed_work to enable background
    scanning and freeing of eofblocks inodes. The scanner kicks in once
    speculative preallocation occurs and stops requeueing itself when
    no eofblocks inodes exist.

    The scan interval is based on the new
    'speculative_prealloc_lifetime' tunable (default to 5m). The
    background scanner performs unfiltered, best effort scans (which
    skips inodes under lock contention or with a dirty cache mapping).

    Signed-off-by: Brian Foster
    Reviewed-by: Mark Tinguely
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Brian Foster
     

13 Aug, 2011

1 commit

  • Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
    annoying subdirectories in the XFS source code. Besides the large
    amount of file rename the only changes are to the Makefile, a few
    files including headers with the subdirectory prefix, and the binary
    sysctl compat code that includes a header under fs/xfs/ from
    kernel/.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig