12 Apr, 2014

1 commit

  • Pull second set of btrfs updates from Chris Mason:
    "The most important changes here are from Josef, fixing a btrfs
    regression in 3.14 that can cause corruptions in the extent allocation
    tree when snapshots are in use.

    Josef also fixed some deadlocks in send/recv and other assorted races
    when balance is running"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (23 commits)
    Btrfs: fix compile warnings on on avr32 platform
    btrfs: allow mounting btrfs subvolumes with different ro/rw options
    btrfs: export global block reserve size as space_info
    btrfs: fix crash in remount(thread_pool=) case
    Btrfs: abort the transaction when we don't find our extent ref
    Btrfs: fix EINVAL checks in btrfs_clone
    Btrfs: fix unlock in __start_delalloc_inodes()
    Btrfs: scrub raid56 stripes in the right way
    Btrfs: don't compress for a small write
    Btrfs: more efficient io tree navigation on wait_extent_bit
    Btrfs: send, build path string only once in send_hole
    btrfs: filter invalid arg for btrfs resize
    Btrfs: send, fix data corruption due to incorrect hole detection
    Btrfs: kmalloc() doesn't return an ERR_PTR
    Btrfs: fix snapshot vs nocow writting
    btrfs: Change the expanding write sequence to fix snapshot related bug.
    btrfs: make device scan less noisy
    btrfs: fix lockdep warning with reclaim lock inversion
    Btrfs: hold the commit_root_sem when getting the commit root during send
    Btrfs: remove transaction from send
    ...

    Linus Torvalds
     

08 Apr, 2014

4 commits

  • Introduce a block group type bit for a global reserve and fill the space
    info for SPACE_INFO ioctl. This should replace the newly added ioctl
    (01e219e8069516cdb98594d417b8bb8d906ed30d) to get just the 'size' part
    of the global reserve, while the actual usage can be now visible in the
    'btrfs fi df' output during ENOSPC stress.

    The unpatched userspace tools will show the blockgroup as 'unknown'.

    CC: Jeff Mahoney
    CC: Josef Bacik
    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     
  • btrfs_drop_extents can now return -EINVAL, but only one caller
    in btrfs_clone was checking for it. This adds it to the
    caller for inline extents, which is where we really need it.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Originally following cmds will work:
    # btrfs fi resize -10A
    # btrfs fi resize -10Gaha
    Filter the arg by checking the return pointer of memparse.

    Signed-off-by: Gui Hecheng
    Signed-off-by: Chris Mason

    Gui Hecheng
     
  • The error handling was copy and pasted from memdup_user(). It should be
    checking for NULL obviously.

    Fixes: abccd00f8af2 ('btrfs: Fix 32/64-bit problem with BTRFS_SET_RECEIVED_SUBVOL ioctl')
    Signed-off-by: Dan Carpenter
    Signed-off-by: Chris Mason

    Dan Carpenter
     

05 Apr, 2014

1 commit

  • Pull btrfs changes from Chris Mason:
    "This is a pretty long stream of bug fixes and performance fixes.

    Qu Wenruo has replaced the btrfs async threads with regular kernel
    workqueues. We'll keep an eye out for performance differences, but
    it's nice to be using more generic code for this.

    We still have some corruption fixes and other patches coming in for
    the merge window, but this batch is tested and ready to go"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (108 commits)
    Btrfs: fix a crash of clone with inline extents's split
    btrfs: fix uninit variable warning
    Btrfs: take into account total references when doing backref lookup
    Btrfs: part 2, fix incremental send's decision to delay a dir move/rename
    Btrfs: fix incremental send's decision to delay a dir move/rename
    Btrfs: remove unnecessary inode generation lookup in send
    Btrfs: fix race when updating existing ref head
    btrfs: Add trace for btrfs_workqueue alloc/destroy
    Btrfs: less fs tree lock contention when using autodefrag
    Btrfs: return EPERM when deleting a default subvolume
    Btrfs: add missing kfree in btrfs_destroy_workqueue
    Btrfs: cache extent states in defrag code path
    Btrfs: fix deadlock with nested trans handles
    Btrfs: fix possible empty list access when flushing the delalloc inodes
    Btrfs: split the global ordered extents mutex
    Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock
    Btrfs: reclaim delalloc metadata more aggressively
    Btrfs: remove unnecessary lock in may_commit_transaction()
    Btrfs: remove the unnecessary flush when preparing the pages
    Btrfs: just do dirty page flush for the inode with compression before direct IO
    ...

    Linus Torvalds
     

22 Mar, 2014

1 commit

  • xfstests's btrfs/035 triggers a BUG_ON, which we use to detect the split
    of inline extents in __btrfs_drop_extents().

    For inline extents, we cannot duplicate another EXTENT_DATA item, because
    it breaks the rule of inline extents, that is, 'start offset' needs to be 0.

    We have set limitations for the source inode's compressed inline extents,
    because it needs to decompress and recompress. Now the destination inode's
    inline extents also need similar limitations.

    With this, xfstests btrfs/035 doesn't run into panic.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     

21 Mar, 2014

3 commits

  • When finding new extents during an autodefrag, don't do so many fs tree
    lookups to find an extent with a size smaller then the target treshold.
    Instead, after each fs tree forward search immediately unlock upper
    levels and process the entire leaf while holding a read lock on the leaf,
    since our leaf processing is very fast.
    This reduces lock contention, allowing for higher concurrency when other
    tasks want to write/update items related to other inodes in the fs tree,
    as we're not holding read locks on upper tree levels while processing the
    leaf and we do less tree searches.

    Test:

    sysbench --test=fileio --file-num=512 --file-total-size=16G \
    --file-test-mode=rndrw --num-threads=32 --file-block-size=32768 \
    --file-rw-ratio=3 --file-io-mode=sync --max-time=1800 \
    --max-requests=10000000000 [prepare|run]

    (fileystem mounted with -o autodefrag, averages of 5 runs)

    Before this change: 58.852Mb/sec throughtput, read 77.589Gb, written 25.863Gb
    After this change: 63.034Mb/sec throughtput, read 83.102Gb, written 27.701Gb

    Test machine: quad core intel i5-3570K, 32Gb of RAM, SSD.

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     
  • The error message is confusing:

    # btrfs sub delete /mnt/mysub/
    Delete subvolume '/mnt/mysub'
    ERROR: cannot delete '/mnt/mysub' - Directory not empty

    The error message does not make sense to me: It's not about deleting a
    directory but it's a subvolume, and it doesn't matter if the subvolume is
    empty or not.

    Maybe EPERM or is more appropriate in this case, combined with an explanatory
    kernel log message. (e.g. "subvolume with ID 123 cannot be deleted because
    it is configured as default subvolume.")

    Reported-by: Koen De Wit
    Signed-off-by: Guangyu Sun
    Reviewed-by: David Sterba
    Signed-off-by: Chris Mason

    Guangyu Sun
     
  • When locking file ranges in the inode's io_tree, cache the first
    extent state that belongs to the target range, so that when unlocking
    the range we don't need to search in the io_tree again, reducing cpu
    time and making and therefore holding the io_tree's lock for a shorter
    period.

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     

11 Mar, 2014

6 commits

  • We needn't flush all delalloc inodes when we doesn't get s_umount lock,
    or we would make the tasks wait for a long time.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • If the snapshot creation happened after the nocow write but before the dirty
    data flush, we would fail to flush the dirty data because of no space.

    So we must keep track of when those nocow write operations start and when they
    end, if there are nocow writers, the snapshot creators must wait. In order
    to implement this function, I introduce btrfs_{start, end}_nocow_write(),
    which is similar to mnt_{want,drop}_write().

    These two functions are only used for nocow file write operations.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • When using prealloc extents, a file defragment operation may actually
    fragment the file and increase the amount of data space used by the file.
    This change fixes that behaviour.

    Example:

    $ mkfs.btrfs -f /dev/sdb3
    $ mount /dev/sdb3 /mnt
    $ cd /mnt
    $ xfs_io -f -c 'falloc 0 1048576' foobar && sync
    $ xfs_io -c 'pwrite -S 0xff -b 100000 5000 100000' foobar
    $ xfs_io -c 'pwrite -S 0xac -b 100000 200000 100000' foobar
    $ xfs_io -c 'pwrite -S 0xe1 -b 100000 900000 100000' foobar && sync

    Before defragmenting the file:

    $ btrfs filesystem df /mnt
    Data, single: total=8.00MiB, used=1.25MiB
    System, DUP: total=8.00MiB, used=16.00KiB
    System, single: total=4.00MiB, used=0.00
    Metadata, DUP: total=1.00GiB, used=112.00KiB
    Metadata, single: total=8.00MiB, used=0.00

    $ btrfs-debug-tree /dev/sdb3
    (...)
    item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
    prealloc data disk byte 12845056 nr 1048576
    prealloc data offset 0 nr 4096
    item 7 key (257 EXTENT_DATA 4096) itemoff 15757 itemsize 53
    extent data disk byte 12845056 nr 1048576
    extent data offset 4096 nr 102400 ram 1048576
    extent compression 0
    item 8 key (257 EXTENT_DATA 106496) itemoff 15704 itemsize 53
    prealloc data disk byte 12845056 nr 1048576
    prealloc data offset 106496 nr 90112
    item 9 key (257 EXTENT_DATA 196608) itemoff 15651 itemsize 53
    extent data disk byte 12845056 nr 1048576
    extent data offset 196608 nr 106496 ram 1048576
    extent compression 0
    item 10 key (257 EXTENT_DATA 303104) itemoff 15598 itemsize 53
    prealloc data disk byte 12845056 nr 1048576
    prealloc data offset 303104 nr 593920
    item 11 key (257 EXTENT_DATA 897024) itemoff 15545 itemsize 53
    extent data disk byte 12845056 nr 1048576
    extent data offset 897024 nr 106496 ram 1048576
    extent compression 0
    item 12 key (257 EXTENT_DATA 1003520) itemoff 15492 itemsize 53
    prealloc data disk byte 12845056 nr 1048576
    prealloc data offset 1003520 nr 45056
    (...)

    Now defragmenting the file results in more data space used than before:

    $ btrfs filesystem defragment -f foobar && sync
    $ btrfs filesystem df /mnt
    Data, single: total=8.00MiB, used=1.55MiB
    System, DUP: total=8.00MiB, used=16.00KiB
    System, single: total=4.00MiB, used=0.00
    Metadata, DUP: total=1.00GiB, used=112.00KiB
    Metadata, single: total=8.00MiB, used=0.00

    And the corresponding file extent items are now no longer perfectly sequential
    as before, and we're now needlessly using more space from data block groups:

    $ btrfs-debug-tree /dev/sdb3
    (...)
    item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
    extent data disk byte 12845056 nr 1048576
    extent data offset 0 nr 4096 ram 1048576
    extent compression 0
    item 7 key (257 EXTENT_DATA 4096) itemoff 15757 itemsize 53
    extent data disk byte 13893632 nr 102400
    extent data offset 0 nr 102400 ram 102400
    extent compression 0
    item 8 key (257 EXTENT_DATA 106496) itemoff 15704 itemsize 53
    extent data disk byte 12845056 nr 1048576
    extent data offset 106496 nr 90112 ram 1048576
    extent compression 0
    item 9 key (257 EXTENT_DATA 196608) itemoff 15651 itemsize 53
    extent data disk byte 13996032 nr 106496
    extent data offset 0 nr 106496 ram 106496
    extent compression 0
    item 10 key (257 EXTENT_DATA 303104) itemoff 15598 itemsize 53
    prealloc data disk byte 12845056 nr 1048576
    prealloc data offset 303104 nr 593920
    item 11 key (257 EXTENT_DATA 897024) itemoff 15545 itemsize 53
    extent data disk byte 14102528 nr 106496
    extent data offset 0 nr 106496 ram 106496
    extent compression 0
    item 12 key (257 EXTENT_DATA 1003520) itemoff 15492 itemsize 53
    extent data disk byte 12845056 nr 1048576
    extent data offset 1003520 nr 45056 ram 1048576
    extent compression 0
    (...)

    With this change, the above example will no longer cause allocation of new data
    space nor change the sequentiality of the file extents, that is, defragment will
    be effectless, leaving all extent items pointing to the extent starting at disk
    byte 12845056.

    In a 20Gb filesystem I had, mounted with the autodefrag option and 20 files of
    400Mb each, initially consisting of a single prealloc extent of 400Mb, having
    random writes happening at a low rate, lead to a total of over ~17Gb of data
    space used, not far from eventually reaching an ENOSPC state.

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Josef Bacik

    Filipe Manana
     
  • When the defrag flag BTRFS_DEFRAG_RANGE_START_IO is set and compression
    enabled, we weren't flushing completely, as writing compressed extents
    is a 2 steps process, one to compress the data and another one to write
    the compressed data to disk.

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Josef Bacik

    Filipe Manana
     
  • The structure for BTRFS_SET_RECEIVED_IOCTL packs differently on 32-bit
    and 64-bit systems. This means that it is impossible to use btrfs
    receive on a system with a 64-bit kernel and 32-bit userspace, because
    the structure size (and hence the ioctl number) is different.

    This patch adds a compatibility structure and ioctl to deal with the
    above case.

    Signed-off-by: Hugo Mills
    Signed-off-by: Josef Bacik

    Hugo Mills
     
  • EXDEV seems an appropriate error if an operation fails bacause it
    crosses file system boundaries.

    Reviewed-by: David Sterba
    Signed-off-by: Kusanagi Kouichi
    Signed-off-by: Josef Bacik

    Kusanagi Kouichi
     

17 Feb, 2014

1 commit

  • Pull btrfs fixes from Chris Mason:
    "We have a small collection of fixes in my for-linus branch.

    The big thing that stands out is a revert of a new ioctl. Users
    haven't shipped yet in btrfs-progs, and Dave Sterba found a better way
    to export the information"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: use right clone root offset for compressed extents
    btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105
    Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol
    Btrfs: fix max_inline mount option
    Btrfs: fix a lockdep warning when cleaning up aborted transaction
    Revert "btrfs: add ioctl to export size of global metadata reservation"

    Linus Torvalds
     

15 Feb, 2014

1 commit


10 Feb, 2014

1 commit

  • Pull btrfs fixes from Chris Mason:
    "This is a small collection of fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix data corruption when reading/updating compressed extents
    Btrfs: don't loop forever if we can't run because of the tree mod log
    btrfs: reserve no transaction units in btrfs_ioctl_set_features
    btrfs: commit transaction after setting label and features
    Btrfs: fix assert screwup for the pending move stuff

    Linus Torvalds
     

09 Feb, 2014

2 commits


31 Jan, 2014

1 commit

  • Pull btrfs updates from Chris Mason:
    "This is a pretty big pull, and most of these changes have been
    floating in btrfs-next for a long time. Filipe's properties work is a
    cool building block for inheriting attributes like compression down on
    a per inode basis.

    Jeff Mahoney kicked in code to export filesystem info into sysfs.

    Otherwise, lots of performance improvements, cleanups and bug fixes.

    Looks like there are still a few other small pending incrementals, but
    I wanted to get the bulk of this in first"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits)
    Btrfs: fix spin_unlock in check_ref_cleanup
    Btrfs: setup inode location during btrfs_init_inode_locked
    Btrfs: don't use ram_bytes for uncompressed inline items
    Btrfs: fix btrfs_search_slot_for_read backwards iteration
    Btrfs: do not export ulist functions
    Btrfs: rework ulist with list+rb_tree
    Btrfs: fix memory leaks on walking backrefs failure
    Btrfs: fix send file hole detection leading to data corruption
    Btrfs: add a reschedule point in btrfs_find_all_roots()
    Btrfs: make send's file extent item search more efficient
    Btrfs: fix to catch all errors when resolving indirect ref
    Btrfs: fix protection between walking backrefs and root deletion
    btrfs: fix warning while merging two adjacent extents
    Btrfs: fix infinite path build loops in incremental send
    btrfs: undo sysfs when open_ctree() fails
    Btrfs: fix snprintf usage by send's gen_unique_name
    btrfs: fix defrag 32-bit integer overflow
    btrfs: sysfs: list the NO_HOLES feature
    btrfs: sysfs: don't show reserved incompat feature
    btrfs: call permission checks earlier in ioctls and return EPERM
    ...

    Linus Torvalds
     

29 Jan, 2014

14 commits

  • When defragging a very large file, the cluster variable can wrap its 32-bit
    signed int type and become negative, which eventually gets passed to
    btrfs_force_ra() as a very large unsigned long value. On 32-bit platforms,
    this eventually results in an Oops from the SLAB allocator.

    Change the cluster and max_cluster signed int variables to unsigned long to
    match the readahead functions. This also allows the min() comparison in
    btrfs_defrag_file() to work as intended.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Justin Maggard
     
  • The owner and capability checks in IOC_SUBVOL_SETFLAGS and
    SET_RECEIVED_SUBVOL should be called before any other checks are done.

    Also unify the error code to EPERM.

    Signed-off-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    David Sterba
     
  • Currently, any user can snapshot any subvolume if the path is accessible and
    thus indirectly create and keep files he does not own under his direcotries.
    This is not possible with traditional directories.

    In security context, a user can snapshot root filesystem and pin any
    potentially buggy binaries, even if the updates are applied.

    All the snapshots are visible to the administrator, so it's possible to
    verify if there are suspicious snapshots.

    Another more practical problem is that any user can pin the space used
    by eg. root and cause ENOSPC.

    Original report:
    https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/484786

    CC: stable@vger.kernel.org
    Signed-off-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    David Sterba
     
  • When we are looking for file extent items that intersect the cloning
    range, for each one that falls completely outside the range, don't
    release the path and do another full tree search - just move on
    to the next slot and copy the file extent item into our buffer only
    if the item intersects the cloning range.

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Filipe David Borba Manana
     
  • In the clone ioctl, when the source and target inodes are different,
    we can acquire their mutexes in 2 possible different orders. After
    we're done cloning, we were releasing the mutexes always in the same
    order - the most correct way of doing it is to release them by the
    reverse order they were acquired.

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Filipe David Borba Manana
     
  • We don't have to keep subvolume's block_rsv during transaction commit,
    and within transaction commit, we may also need the free space reclaimed
    from this block_rsv to process delayed refs.

    Signed-off-by: Liu Bo
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Liu Bo
     
  • This change adds infrastructure to allow for generic properties for
    inodes. Properties are name/value pairs that can be associated with
    inodes for different purposes. They are stored as xattrs with the
    prefix "btrfs."

    Properties can be inherited - this means when a directory inode has
    inheritable properties set, these are added to new inodes created
    under that directory. Further, subvolumes can also have properties
    associated with them, and they can be inherited from their parent
    subvolume. Naturally, directory properties have priority over subvolume
    properties (in practice a subvolume property is just a regular
    property associated with the root inode, objectid 256, of the
    subvolume's fs tree).

    This change also adds one specific property implementation, named
    "compression", whose values can be "lzo" or "zlib" and it's an
    inheritable property.

    The corresponding changes to btrfs-progs were also implemented.
    A patch with xfstests for this feature will follow once there's
    agreement on this change/feature.

    Further, the script at the bottom of this commit message was used to
    do some benchmarks to measure any performance penalties of this feature.

    Basically the tests correspond to:

    Test 1 - create a filesystem and mount it with compress-force=lzo,
    then sequentially create N files of 64Kb each, measure how long it took
    to create the files, unmount the filesystem, mount the filesystem and
    perform an 'ls -lha' against the test directory holding the N files, and
    report the time the command took.

    Test 2 - create a filesystem and don't use any compression option when
    mounting it - instead set the compression property of the subvolume's
    root to 'lzo'. Then create N files of 64Kb, and report the time it took.
    The unmount the filesystem, mount it again and perform an 'ls -lha' like
    in the former test. This means every single file ends up with a property
    (xattr) associated to it.

    Test 3 - same as test 2, but uses 4 properties - 3 are duplicates of the
    compression property, have no real effect other than adding more work
    when inheriting properties and taking more btree leaf space.

    Test 4 - same as test 3 but with 10 properties per file.

    Results (in seconds, and averages of 5 runs each), for different N
    numbers of files follow.

    * Without properties (test 1)

    file creation time ls -lha time
    10 000 files 3.49 0.76
    100 000 files 47.19 8.37
    1 000 000 files 518.51 107.06

    * With 1 property (compression property set to lzo - test 2)

    file creation time ls -lha time
    10 000 files 3.63 0.93
    100 000 files 48.56 9.74
    1 000 000 files 537.72 125.11

    * With 4 properties (test 3)

    file creation time ls -lha time
    10 000 files 3.94 1.20
    100 000 files 52.14 11.48
    1 000 000 files 572.70 142.13

    * With 10 properties (test 4)

    file creation time ls -lha time
    10 000 files 4.61 1.35
    100 000 files 58.86 13.83
    1 000 000 files 656.01 177.61

    The increased latencies with properties are essencialy because of:

    *) When creating an inode, we now synchronously write 1 more item
    (an xattr item) for each property inherited from the parent dir
    (or subvolume). This could be done in an asynchronous way such
    as we do for dir intex items (delayed-inode.c), which could help
    reduce the file creation latency;

    *) With properties, we now have larger fs trees. For this particular
    test each xattr item uses 75 bytes of leaf space in the fs tree.
    This could be less by using a new item for xattr items, instead of
    the current btrfs_dir_item, since we could cut the 'location' and
    'type' fields (saving 18 bytes) and maybe 'transid' too (saving a
    total of 26 bytes per xattr item) from the btrfs_dir_item type.

    Also tried batching the xattr insertions (ignoring proper hash
    collision handling, since it didn't exist) when creating files that
    inherit properties from their parent inode/subvolume, but the end
    results were (surprisingly) essentially the same.

    Test script:

    $ cat test.pl
    #!/usr/bin/perl -w

    use strict;
    use Time::HiRes qw(time);
    use constant NUM_FILES => 10_000;
    use constant FILE_SIZES => (64 * 1024);
    use constant DEV => '/dev/sdb4';
    use constant MNT_POINT => '/home/fdmanana/btrfs-tests/dev';
    use constant TEST_DIR => (MNT_POINT . '/testdir');

    system("mkfs.btrfs", "-l", "16384", "-f", DEV) == 0 or die "mkfs.btrfs failed!";

    # following line for testing without properties
    #system("mount", "-o", "compress-force=lzo", DEV, MNT_POINT) == 0 or die "mount failed!";

    # following 2 lines for testing with properties
    system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
    system("btrfs", "prop", "set", MNT_POINT, "compression", "lzo") == 0 or die "set prop failed!";

    system("mkdir", TEST_DIR) == 0 or die "mkdir failed!";
    my ($t1, $t2);

    $t1 = time();
    for (my $i = 1; $i autoflush(1);
    for (my $j = 0; $j < FILE_SIZES; $j += 4096) {
    print $f ('A' x 4096) or die "Error writing to file!";
    }
    close($f);
    }
    $t2 = time();
    print "Time to create " . NUM_FILES . ": " . ($t2 - $t1) . " seconds.\n";
    system("umount", DEV) == 0 or die "umount failed!";
    system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";

    $t1 = time();
    system("bash -c 'ls -lha " . TEST_DIR . " > /dev/null'") == 0 or die "ls failed!";
    $t2 = time();
    print "Time to ls -lha all files: " . ($t2 - $t1) . " seconds.\n";
    system("umount", DEV) == 0 or die "umount failed!";

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Filipe David Borba Manana
     
  • The local variable 'new_size' comes from userspace. If a large number
    was passed, there would be an integer overflow in the following line:
    new_size = old_size + new_size;

    Signed-off-by: Wenliang Fan
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Wenliang Fan
     
  • Convert all applicable cases of printk and pr_* to the btrfs_* macros.

    Fix all uses of the BTRFS prefix.

    Signed-off-by: Frank Holton
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Frank Holton
     
  • All the subvolues that are involved in send must be read-only during the
    whole operation. The ioctl SUBVOL_SETFLAGS could be used to change the
    status to read-write and the result of send stream is undefined if the
    data change unexpectedly.

    Fix that by adding a refcount for all involved roots and verify that
    there's no send in progress during SUBVOL_SETFLAGS ioctl call that does
    read-only -> read-write transition.

    We need refcounts because there are no restrictions on number of send
    parallel operations currently run on a single subvolume, be it source,
    parent or one of the multiple clone sources.

    Kernel is silent when the RO checks fail and returns EPERM. The same set
    of checks is done already in userspace before send starts.

    Signed-off-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    David Sterba
     
  • Clean up btrfs_lookup_dentry() to never return NULL, but PTR_ERR(-ENOENT)
    instead. This keeps the return value convention consistent.

    Callers who use btrfs_lookup_dentry() require a trivial update.

    create_snapshot() in particular looks like it can also lose a BUG_ON(!inode)
    which is not really needed - there seems less harm in returning ENOENT to
    userspace at that point in the stack than there is to crash the machine.

    Signed-off-by: Tsutomu Itoh
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Tsutomu Itoh
     
  • btrfs filesystem df output will show the size of the metadata space
    and how much of it is used, and the user assumes that the difference
    is all usable space. Since that's not actually the case due to the
    global metadata reservation, we should provide the full picture to the
    user.

    This patch adds an ioctl that exports the size of the global metadata
    reservation so that btrfs filesystem df can report it.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Jeff Mahoney
     
  • Now that we have the feature name strings available in the kernel via
    the sysfs attributes, we can use them for printing better failure
    messages from the ioctl path.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Jeff Mahoney
     
  • There are some feature bits that require no offline setup and can
    be enabled online. I've only reviewed extended irefs, but there will
    probably be more.

    We introduce three new ioctls:
    - BTRFS_IOC_GET_SUPPORTED_FEATURES: query the kernel for supported features.
    - BTRFS_IOC_GET_FEATURES: query the kernel for enabled features on a per-fs
    basis, as well as querying for which features are changeable with mounted.
    - BTRFS_IOC_SET_FEATURES: change features on a per-fs basis.

    We introduce two new masks per feature set (_SAFE_SET and _SAFE_CLEAR) that
    allow us to define which features are safe to change at runtime.

    The failure modes for BTRFS_IOC_SET_FEATURES are as follows:
    - Enabling a completely unsupported feature: warns and returns -ENOTSUPP
    - Enabling a feature that can only be done offline: warns and returns -EPERM

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Jeff Mahoney
     

25 Jan, 2014

1 commit

  • * don't assume that ->dest_count won't change between copy_from_user()
    and memdup_user()
    * use fdget instead of fget
    * don't bother comparing superblocks when we'd already compared vfsmounts
    * get rid of excessive goto
    * use file_inode() instead of open-coding the sucker

    Signed-off-by: Al Viro

    Al Viro
     

12 Dec, 2013

1 commit


15 Nov, 2013

2 commits

  • 3 of 4 callers actually want file_inode()...

    Signed-off-by: Al Viro
    Signed-off-by: Chris Mason

    Al Viro
     
  • Heiko Carstens noticed that btrfs was using empty_zero_page
    incorrectly. He explained:

    The definition of empty_zero_page is architecture specific. It
    is (currently) either a character array, an unsigned long
    containing the address of the empty_zero_page, or even worse
    only the address of the struct page belonging to the
    empty_zero_page.

    This commit changes btrfs to use a for-loop instead. On x86
    the resulting .ko is smaller, and we're no longer worrying about
    how each arch builds its zeros.

    Reported-by: Heiko Carstens
    Signed-off-by: Chris Mason

    Chris Mason