12 Apr, 2014

1 commit

  • Pull second set of btrfs updates from Chris Mason:
    "The most important changes here are from Josef, fixing a btrfs
    regression in 3.14 that can cause corruptions in the extent allocation
    tree when snapshots are in use.

    Josef also fixed some deadlocks in send/recv and other assorted races
    when balance is running"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (23 commits)
    Btrfs: fix compile warnings on on avr32 platform
    btrfs: allow mounting btrfs subvolumes with different ro/rw options
    btrfs: export global block reserve size as space_info
    btrfs: fix crash in remount(thread_pool=) case
    Btrfs: abort the transaction when we don't find our extent ref
    Btrfs: fix EINVAL checks in btrfs_clone
    Btrfs: fix unlock in __start_delalloc_inodes()
    Btrfs: scrub raid56 stripes in the right way
    Btrfs: don't compress for a small write
    Btrfs: more efficient io tree navigation on wait_extent_bit
    Btrfs: send, build path string only once in send_hole
    btrfs: filter invalid arg for btrfs resize
    Btrfs: send, fix data corruption due to incorrect hole detection
    Btrfs: kmalloc() doesn't return an ERR_PTR
    Btrfs: fix snapshot vs nocow writting
    btrfs: Change the expanding write sequence to fix snapshot related bug.
    btrfs: make device scan less noisy
    btrfs: fix lockdep warning with reclaim lock inversion
    Btrfs: hold the commit_root_sem when getting the commit root during send
    Btrfs: remove transaction from send
    ...

    Linus Torvalds
     

08 Apr, 2014

2 commits

  • Introduce a block group type bit for a global reserve and fill the space
    info for SPACE_INFO ioctl. This should replace the newly added ioctl
    (01e219e8069516cdb98594d417b8bb8d906ed30d) to get just the 'size' part
    of the global reserve, while the actual usage can be now visible in the
    'btrfs fi df' output during ENOSPC stress.

    The unpatched userspace tools will show the blockgroup as 'unknown'.

    CC: Jeff Mahoney
    CC: Josef Bacik
    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     
  • We currently rely too heavily on roots being read-only to save us from just
    accessing root->commit_root. We can easily balance blocks out from underneath a
    read only root, so to save us from getting screwed make sure we only access
    root->commit_root under the commit root sem. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

07 Apr, 2014

1 commit

  • Lets try this again. We can deadlock the box if we send on a box and try to
    write onto the same fs with the app that is trying to listen to the send pipe.
    This is because the writer could get stuck waiting for a transaction commit
    which is being blocked by the send. So fix this by making sure looking at the
    commit roots is always going to be consistent. We do this by keeping track of
    which roots need to have their commit roots swapped during commit, and then
    taking the commit_root_sem and swapping them all at once. Then make sure we
    take a read lock on the commit_root_sem in cases where we search the commit root
    to make sure we're always looking at a consistent view of the commit roots.
    Previously we had problems with this because we would swap a fs tree commit root
    and then swap the extent tree commit root independently which would cause the
    backref walking code to screw up sometimes. With this patch we no longer
    deadlock and pass all the weird send/receive corner cases. Thanks,

    Reportedy-by: Hugo Mills
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

05 Apr, 2014

1 commit

  • Pull btrfs changes from Chris Mason:
    "This is a pretty long stream of bug fixes and performance fixes.

    Qu Wenruo has replaced the btrfs async threads with regular kernel
    workqueues. We'll keep an eye out for performance differences, but
    it's nice to be using more generic code for this.

    We still have some corruption fixes and other patches coming in for
    the merge window, but this batch is tested and ready to go"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (108 commits)
    Btrfs: fix a crash of clone with inline extents's split
    btrfs: fix uninit variable warning
    Btrfs: take into account total references when doing backref lookup
    Btrfs: part 2, fix incremental send's decision to delay a dir move/rename
    Btrfs: fix incremental send's decision to delay a dir move/rename
    Btrfs: remove unnecessary inode generation lookup in send
    Btrfs: fix race when updating existing ref head
    btrfs: Add trace for btrfs_workqueue alloc/destroy
    Btrfs: less fs tree lock contention when using autodefrag
    Btrfs: return EPERM when deleting a default subvolume
    Btrfs: add missing kfree in btrfs_destroy_workqueue
    Btrfs: cache extent states in defrag code path
    Btrfs: fix deadlock with nested trans handles
    Btrfs: fix possible empty list access when flushing the delalloc inodes
    Btrfs: split the global ordered extents mutex
    Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock
    Btrfs: reclaim delalloc metadata more aggressively
    Btrfs: remove unnecessary lock in may_commit_transaction()
    Btrfs: remove the unnecessary flush when preparing the pages
    Btrfs: just do dirty page flush for the inode with compression before direct IO
    ...

    Linus Torvalds
     

11 Mar, 2014

22 commits

  • We didn't have a lock to protect the access to the delalloc inodes list, that is
    we might access a empty delalloc inodes list if someone start flushing delalloc
    inodes because the delalloc inodes were moved into a other list temporarily.
    Fix it by wrapping the access with a lock.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • When we create a snapshot, we just need wait the ordered extents in
    the source fs/file root, but because we use the global mutex to protect
    this ordered extents list of the source fs/file root to avoid accessing
    a empty list, if someone got the mutex to access the ordered extents list
    of the other fs/file root, we had to wait.

    This patch splits the above global mutex, now every fs/file root has
    its own mutex to protect its own list.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • We needn't flush all delalloc inodes when we doesn't get s_umount lock,
    or we would make the tasks wait for a long time.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • If the snapshot creation happened after the nocow write but before the dirty
    data flush, we would fail to flush the dirty data because of no space.

    So we must keep track of when those nocow write operations start and when they
    end, if there are nocow writers, the snapshot creators must wait. In order
    to implement this function, I introduce btrfs_{start, end}_nocow_write(),
    which is similar to mnt_{want,drop}_write().

    These two functions are only used for nocow file write operations.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • Since the "_struct" suffix is mainly used for distinguish the differnt
    btrfs_work between the original and the newly created one,
    there is no need using the suffix since all btrfs_workers are changed
    into btrfs_workqueue.

    Also this patch fixed some codes whose code style is changed due to the
    too long "_struct" suffix.

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Since all the btrfs_worker is replaced with the newly created
    btrfs_workqueue, the old codes can be easily remove.

    Signed-off-by: Quwenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Replace the fs_info->scrub_* with the newly created
    btrfs_workqueue.

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Replace the fs_info->qgroup_rescan_worker with the newly created
    btrfs_workqueue.

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Replace the fs_info->delayed_workers with the newly created
    btrfs_workqueue.

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Replace the fs_info->fixup_workers with the newly created
    btrfs_workqueue.

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Replace the fs_info->readahead_workers with the newly created
    btrfs_workqueue.

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Replace the fs_info->cache_workers with the newly created
    btrfs_workqueue.

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Replace the fs_info->rmw_workers with the newly created
    btrfs_workqueue.

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Replace the fs_info->endio_* workqueues with the newly created
    btrfs_workqueue.

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Replace the fs_info->submit_workers with the newly created
    btrfs_workqueue.

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Much like the fs_info->workers, replace the fs_info->submit_workers
    use the same btrfs_workqueue.

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Much like the fs_info->workers, replace the fs_info->delalloc_workers
    use the same btrfs_workqueue.

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • Use the newly created btrfs_workqueue_struct to replace the original
    fs_info->workers

    Signed-off-by: Qu Wenruo
    Tested-by: David Sterba
    Signed-off-by: Josef Bacik

    Qu Wenruo
     
  • We might commit the log sub-transaction which didn't contain the metadata we
    logged. It was because we didn't record the log transid and just select
    the current log sub-transaction to commit, but the right one might be
    committed by the other task already. Actually, we needn't do anything
    and it is safe that we go back directly in this case.

    This patch improves the log sync by the above idea. We record the transid
    of the log sub-transaction in which we log the metadata, and the transid
    of the log sub-transaction we have committed. If the committed transid
    is >= the transid we record when logging the metadata, we just go back.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • It is possible that many tasks sync the log tree at the same time, but
    only one task can do the sync work, the others will wait for it. But those
    wait tasks didn't get the result of the log sync, and returned 0 when they
    ended the wait. It caused those tasks skipped the error handle, and the
    serious problem was they told the users the file sync succeeded but in
    fact they failed.

    This patch fixes this problem by introducing a log context structure,
    we insert it into the a global list. When the sync fails, we will set
    the error number of every log context in the list, then the waiting tasks
    get the error number of the log context and handle the error if need.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • The log trans id is initialized to be 0 every time we create a log tree,
    and the log tree need be re-created after a new transaction is started,
    it means the log trans id is unlikely to be a huge number, so we can use
    signed integer instead of unsigned long integer to save a bit space.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • During device replace test, we hit a null pointer deference (It was very easy
    to reproduce it by running xfstests' btrfs/011 on the devices with the virtio
    scsi driver). There were two bugs that caused this problem:
    - We might allocate new chunks on the replaced device after we updated
    the mapping tree. And we forgot to replace the source device in those
    mapping of the new chunks.
    - We might get the mapping information which including the source device
    before the mapping information update. And then submit the bio which was
    based on that mapping information after we freed the source device.

    For the first bug, we can fix it by doing mapping tree update and source
    device remove in the same context of the chunk mutex. The chunk mutex is
    used to protect the allocable device list, the above method can avoid
    the new chunk allocation, and after we remove the source device, all
    the new chunks will be allocated on the new device. So it can fix
    the first bug.

    For the second bug, we need make sure all flighting bios are finished and
    no new bios are produced during we are removing the source device. To fix
    this problem, we introduced a global @bio_counter, we not only inc/dec
    @bio_counter outsize of map_blocks, but also inc it before submitting bio
    and dec @bio_counter when ending bios.

    Since Raid56 is a little different and device replace dosen't support raid56
    yet, it is not addressed in the patch and I add comments to make sure we will
    fix it in the future.

    Reported-by: Qu Wenruo
    Signed-off-by: Wang Shilong
    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     

31 Jan, 2014

1 commit

  • Pull btrfs updates from Chris Mason:
    "This is a pretty big pull, and most of these changes have been
    floating in btrfs-next for a long time. Filipe's properties work is a
    cool building block for inheriting attributes like compression down on
    a per inode basis.

    Jeff Mahoney kicked in code to export filesystem info into sysfs.

    Otherwise, lots of performance improvements, cleanups and bug fixes.

    Looks like there are still a few other small pending incrementals, but
    I wanted to get the bulk of this in first"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits)
    Btrfs: fix spin_unlock in check_ref_cleanup
    Btrfs: setup inode location during btrfs_init_inode_locked
    Btrfs: don't use ram_bytes for uncompressed inline items
    Btrfs: fix btrfs_search_slot_for_read backwards iteration
    Btrfs: do not export ulist functions
    Btrfs: rework ulist with list+rb_tree
    Btrfs: fix memory leaks on walking backrefs failure
    Btrfs: fix send file hole detection leading to data corruption
    Btrfs: add a reschedule point in btrfs_find_all_roots()
    Btrfs: make send's file extent item search more efficient
    Btrfs: fix to catch all errors when resolving indirect ref
    Btrfs: fix protection between walking backrefs and root deletion
    btrfs: fix warning while merging two adjacent extents
    Btrfs: fix infinite path build loops in incremental send
    btrfs: undo sysfs when open_ctree() fails
    Btrfs: fix snprintf usage by send's gen_unique_name
    btrfs: fix defrag 32-bit integer overflow
    btrfs: sysfs: list the NO_HOLES feature
    btrfs: sysfs: don't show reserved incompat feature
    btrfs: call permission checks earlier in ioctls and return EPERM
    ...

    Linus Torvalds
     

29 Jan, 2014

12 commits

  • If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect
    the new size. The fixe uses the size directly from the item header when
    reading uncompressed inlines, and also fixes truncate to update the
    size as it goes.

    Reported-by: Jens Axboe
    Signed-off-by: Chris Mason
    CC: stable@vger.kernel.org

    Chris Mason
     
  • It is better that the position of the lock is close to the data which is
    protected by it, because they may be in the same cache line, we will load
    less cache lines when we access them. So we rearrange the members' position
    of btrfs_space_info structure to make the lock be closer to the its data.

    Signed-off-by: Miao Xie
    Reviewed-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Miao Xie
     
  • Add noinode_cache mount option for btrfs.

    Since inode map cache involves all the btrfs_find_free_ino/return_ino
    things and if just trigger the mount_opt,
    an inode number get from inode map cache will not returned to inode map
    cache.

    To keep the find and return inode both in the same behavior,
    a new bit in mount_opt, CHANGE_INODE_CACHE, is introduced for this idea.
    CHANGE_INODE_CACHE is set/cleared in remounting, and the original
    INODE_MAP_CACHE is set/cleared according to CHANGE_INODE_CACHE after a
    success transaction.
    Since find/return inode is all done between btrfs_start_transaction and
    btrfs_commit_transaction, this will keep consistent behavior.

    Also noinode_cache mount option will not stop the caching_kthread.

    Cc: David Sterba
    Signed-off-by: Miao Xie
    Signed-off-by: Qu Wenruo
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Qu Wenruo
     
  • There is a bug that using btrfs_previous_item() to search metadata extent item.
    This is because in btrfs_previous_item(), we need type match, however, since
    skinny metada was introduced by josef, we may mix this two types. So just
    use btrfs_previous_item() is not working right.

    To keep btrfs_previous_item() like normal tree search, i introduce another
    function btrfs_previous_extent_item().

    Signed-off-by: Wang Shilong
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Wang Shilong
     
  • On one of our gluster clusters we noticed some pretty big lag spikes. This
    turned out to be because our transaction commit was taking like 3 minutes to
    complete. This is because we have like 30 gigs of metadata, so our global
    reserve would end up being the max which is like 512 mb. So our throttling code
    would allow a ridiculous amount of delayed refs to build up and then they'd all
    get run at transaction commit time, and for a cold mounted file system that
    could take up to 3 minutes to run. So fix the throttling to be based on both
    the size of the global reserve and how long it takes us to run delayed refs.
    This patch tracks the time it takes to run delayed refs and then only allows 1
    seconds worth of outstanding delayed refs at a time. This way it will auto-tune
    itself from cold cache up to when everything is in memory and it no longer has
    to go to disk. This makes our transaction commits take much less time to run.
    Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This change adds infrastructure to allow for generic properties for
    inodes. Properties are name/value pairs that can be associated with
    inodes for different purposes. They are stored as xattrs with the
    prefix "btrfs."

    Properties can be inherited - this means when a directory inode has
    inheritable properties set, these are added to new inodes created
    under that directory. Further, subvolumes can also have properties
    associated with them, and they can be inherited from their parent
    subvolume. Naturally, directory properties have priority over subvolume
    properties (in practice a subvolume property is just a regular
    property associated with the root inode, objectid 256, of the
    subvolume's fs tree).

    This change also adds one specific property implementation, named
    "compression", whose values can be "lzo" or "zlib" and it's an
    inheritable property.

    The corresponding changes to btrfs-progs were also implemented.
    A patch with xfstests for this feature will follow once there's
    agreement on this change/feature.

    Further, the script at the bottom of this commit message was used to
    do some benchmarks to measure any performance penalties of this feature.

    Basically the tests correspond to:

    Test 1 - create a filesystem and mount it with compress-force=lzo,
    then sequentially create N files of 64Kb each, measure how long it took
    to create the files, unmount the filesystem, mount the filesystem and
    perform an 'ls -lha' against the test directory holding the N files, and
    report the time the command took.

    Test 2 - create a filesystem and don't use any compression option when
    mounting it - instead set the compression property of the subvolume's
    root to 'lzo'. Then create N files of 64Kb, and report the time it took.
    The unmount the filesystem, mount it again and perform an 'ls -lha' like
    in the former test. This means every single file ends up with a property
    (xattr) associated to it.

    Test 3 - same as test 2, but uses 4 properties - 3 are duplicates of the
    compression property, have no real effect other than adding more work
    when inheriting properties and taking more btree leaf space.

    Test 4 - same as test 3 but with 10 properties per file.

    Results (in seconds, and averages of 5 runs each), for different N
    numbers of files follow.

    * Without properties (test 1)

    file creation time ls -lha time
    10 000 files 3.49 0.76
    100 000 files 47.19 8.37
    1 000 000 files 518.51 107.06

    * With 1 property (compression property set to lzo - test 2)

    file creation time ls -lha time
    10 000 files 3.63 0.93
    100 000 files 48.56 9.74
    1 000 000 files 537.72 125.11

    * With 4 properties (test 3)

    file creation time ls -lha time
    10 000 files 3.94 1.20
    100 000 files 52.14 11.48
    1 000 000 files 572.70 142.13

    * With 10 properties (test 4)

    file creation time ls -lha time
    10 000 files 4.61 1.35
    100 000 files 58.86 13.83
    1 000 000 files 656.01 177.61

    The increased latencies with properties are essencialy because of:

    *) When creating an inode, we now synchronously write 1 more item
    (an xattr item) for each property inherited from the parent dir
    (or subvolume). This could be done in an asynchronous way such
    as we do for dir intex items (delayed-inode.c), which could help
    reduce the file creation latency;

    *) With properties, we now have larger fs trees. For this particular
    test each xattr item uses 75 bytes of leaf space in the fs tree.
    This could be less by using a new item for xattr items, instead of
    the current btrfs_dir_item, since we could cut the 'location' and
    'type' fields (saving 18 bytes) and maybe 'transid' too (saving a
    total of 26 bytes per xattr item) from the btrfs_dir_item type.

    Also tried batching the xattr insertions (ignoring proper hash
    collision handling, since it didn't exist) when creating files that
    inherit properties from their parent inode/subvolume, but the end
    results were (surprisingly) essentially the same.

    Test script:

    $ cat test.pl
    #!/usr/bin/perl -w

    use strict;
    use Time::HiRes qw(time);
    use constant NUM_FILES => 10_000;
    use constant FILE_SIZES => (64 * 1024);
    use constant DEV => '/dev/sdb4';
    use constant MNT_POINT => '/home/fdmanana/btrfs-tests/dev';
    use constant TEST_DIR => (MNT_POINT . '/testdir');

    system("mkfs.btrfs", "-l", "16384", "-f", DEV) == 0 or die "mkfs.btrfs failed!";

    # following line for testing without properties
    #system("mount", "-o", "compress-force=lzo", DEV, MNT_POINT) == 0 or die "mount failed!";

    # following 2 lines for testing with properties
    system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
    system("btrfs", "prop", "set", MNT_POINT, "compression", "lzo") == 0 or die "set prop failed!";

    system("mkdir", TEST_DIR) == 0 or die "mkdir failed!";
    my ($t1, $t2);

    $t1 = time();
    for (my $i = 1; $i autoflush(1);
    for (my $j = 0; $j < FILE_SIZES; $j += 4096) {
    print $f ('A' x 4096) or die "Error writing to file!";
    }
    close($f);
    }
    $t2 = time();
    print "Time to create " . NUM_FILES . ": " . ($t2 - $t1) . " seconds.\n";
    system("umount", DEV) == 0 or die "umount failed!";
    system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";

    $t1 = time();
    system("bash -c 'ls -lha " . TEST_DIR . " > /dev/null'") == 0 or die "ls failed!";
    $t2 = time();
    print "Time to ls -lha all files: " . ($t2 - $t1) . " seconds.\n";
    system("umount", DEV) == 0 or die "umount failed!";

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Filipe David Borba Manana
     
  • When writing to a file we drop existing file extent items that cover the
    write range and then add a new file extent item that represents that write
    range.

    Before this change we were doing a tree lookup to remove the file extent
    items, and then after we did another tree lookup to insert the new file
    extent item.
    Most of the time all the file extent items we need to drop are located
    within a single leaf - this is the leaf where our new file extent item ends
    up at. Therefore, in this common case just combine these 2 operations into
    a single one.

    By avoiding the second btree navigation for insertion of the new file extent
    item, we reduce btree node/leaf lock acquisitions/releases, btree block/leaf
    COW operations, CPU time on btree node/leaf key binary searches, etc.

    Besides for file writes, this is an operation that happens for file fsync's
    as well. However log btrees are much less likely to big as big as regular
    fs btrees, therefore the impact of this change is smaller.

    The following benchmark was performed against an SSD drive and a
    HDD drive, both for random and sequential writes:

    sysbench --test=fileio --file-num=4096 --file-total-size=8G \
    --file-test-mode=[rndwr|seqwr] --num-threads=512 \
    --file-block-size=8192 \ --max-requests=1000000 \
    --file-fsync-freq=0 --file-io-mode=sync [prepare|run]

    All results below are averages of 10 runs of the respective test.

    ** SSD sequential writes

    Before this change: 225.88 Mb/sec
    After this change: 277.26 Mb/sec

    ** SSD random writes

    Before this change: 49.91 Mb/sec
    After this change: 56.39 Mb/sec

    ** HDD sequential writes

    Before this change: 68.53 Mb/sec
    After this change: 69.87 Mb/sec

    ** HDD random writes

    Before this change: 13.04 Mb/sec
    After this change: 14.39 Mb/sec

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Filipe David Borba Manana
     
  • Convert all applicable cases of printk and pr_* to the btrfs_* macros.

    Fix all uses of the BTRFS prefix.

    Signed-off-by: Frank Holton
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Frank Holton
     
  • All the subvolues that are involved in send must be read-only during the
    whole operation. The ioctl SUBVOL_SETFLAGS could be used to change the
    status to read-write and the result of send stream is undefined if the
    data change unexpectedly.

    Fix that by adding a refcount for all involved roots and verify that
    there's no send in progress during SUBVOL_SETFLAGS ioctl call that does
    read-only -> read-write transition.

    We need refcounts because there are no restrictions on number of send
    parallel operations currently run on a single subvolume, be it source,
    parent or one of the multiple clone sources.

    Kernel is silent when the RO checks fail and returns EPERM. The same set
    of checks is done already in userspace before send starts.

    Signed-off-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    David Sterba
     
  • It's not used anywhere, so just drop it.

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Filipe David Borba Manana
     
  • I need to create a fake tree to test qgroups and I don't want to have to setup a
    fake btree_inode. The fact is we only use the radix tree for the fs_info, so
    everybody else who allocates an extent_io_tree is just wasting the space anyway.
    This patch moves the radix tree and its lock into btrfs_fs_info so there is less
    stuff I have to fake to do qgroup sanity tests. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • The kernel macro pr_debug is defined as a empty statement when DEBUG is
    not defined. Make btrfs_debug match pr_debug to avoid spamming
    the kernel log with debug messages

    Signed-off-by: Frank Holton
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Frank Holton