01 Jul, 2019

2 commits


30 Apr, 2019

2 commits

  • Commit 41bd6067692382 ("Btrfs: fix fsync of files with multiple hard links
    in new directories") introduced a path that makes fsync fallback to a full
    transaction commit in order to avoid losing hard links and new ancestors
    of the fsynced inode. That path is triggered only when the inode has more
    than one hard link and either has a new hard link created in the current
    transaction or the inode was evicted and reloaded in the current
    transaction.

    That path ends up getting triggered very often (hundreds of times) during
    the course of pgbench benchmarks, resulting in performance drops of about
    20%.

    This change restores the performance by not triggering the full transaction
    commit in those cases, and instead iterate the fs/subvolume tree in search
    of all possible new ancestors, for all hard links, to log them.

    Reported-by: Zhao Yuhu
    Tested-by: James Wang
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba

    Filipe Manana
     
  • Deduplicate the btrfs file type conversion implementation - file systems
    that use the same file types as defined by POSIX do not need to define
    their own versions and can use the common helper functions decared in
    fs_types.h and implemented in fs_types.c

    Common implementation can be found via commit:
    bbe7449e2599 "fs: common implementation of file type"

    Reviewed-by: Jan Kara
    Signed-off-by: Amir Goldstein
    Signed-off-by: Phillip Potter
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Phillip Potter
     

17 Dec, 2018

4 commits

  • The log tree has a long standing problem that when a file is fsync'ed we
    only check for new ancestors, created in the current transaction, by
    following only the hard link for which the fsync was issued. We follow the
    ancestors using the VFS' dget_parent() API. This means that if we create a
    new link for a file in a directory that is new (or in an any other new
    ancestor directory) and then fsync the file using an old hard link, we end
    up not logging the new ancestor, and on log replay that new hard link and
    ancestor do not exist. In some cases, involving renames, the file will not
    exist at all.

    Example:

    mkfs.btrfs -f /dev/sdb
    mount /dev/sdb /mnt

    mkdir /mnt/A
    touch /mnt/foo
    ln /mnt/foo /mnt/A/bar
    xfs_io -c fsync /mnt/foo

    In this example after log replay only the hard link named 'foo' exists
    and directory A does not exist, which is unexpected. In other major linux
    filesystems, such as ext4, xfs and f2fs for example, both hard links exist
    and so does directory A after mounting again the filesystem.

    Checking if any new ancestors are new and need to be logged was added in
    2009 by commit 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes"),
    however only for the ancestors of the hard link (dentry) for which the
    fsync was issued, instead of checking for all ancestors for all of the
    inode's hard links.

    So fix this by tracking the id of the last transaction where a hard link
    was created for an inode and then on fsync fallback to a full transaction
    commit when an inode has more than one hard link and at least one new hard
    link was created in the current transaction. This is the simplest solution
    since this is not a common use case (adding frequently hard links for
    which there's an ancestor created in the current transaction and then
    fsync the file). In case it ever becomes a common use case, a solution
    that consists of iterating the fs/subvol btree for each hard link and
    check if any ancestor is new, could be implemented.

    This solves many unexpected scenarios reported by Jayashree Mohan and
    Vijay Chidambaram, and for which there is a new test case for fstests
    under review.

    Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes")
    CC: stable@vger.kernel.org # 4.4+
    Reported-by: Vijay Chidambaram
    Reported-by: Jayashree Mohan
    Signed-off-by: Filipe Manana
    Signed-off-by: David Sterba

    Filipe Manana
     
  • The first auto-assigned value to enum is 0, we can use that and not
    initialize all members where the auto-increment does the same. This is
    used for values that are not part of on-disk format.

    Reviewed-by: Omar Sandoval
    Reviewed-by: Qu Wenruo
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: David Sterba

    David Sterba
     
  • Snapshot is expected to be fast. But if there are writers steadily
    creating dirty pages in our subvolume, the snapshot may take a very long
    time to complete. To fix the problem, we use tagged writepage for
    snapshot flusher as we do in the generic write_cache_pages(), so we can
    omit pages dirtied after the snapshot command.

    This does not change the semantics regarding which data get to the
    snapshot, if there are pages being dirtied during the snapshotting
    operation. There's a sync called before snapshot is taken in old/new
    case, any IO in flight just after that may be in the snapshot but this
    depends on other system effects that might still sync the IO.

    We do a simple snapshot speed test on a Intel D-1531 box:

    fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G
    --direct=0 --thread=1 --numjobs=1 --time_based --runtime=120
    --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
    time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio

    original: 1m58sec
    patched: 6.54sec

    This is the best case for this patch since for a sequential write case,
    we omit nearly all pages dirtied after the snapshot command.

    For a multi writers, random write test:

    fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G
    --direct=0 --thread=1 --numjobs=4 --time_based --runtime=120
    --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5;
    time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio

    original: 15.83sec
    patched: 10.35sec

    The improvement is smaller compared to the sequential write case,
    since we omit only half of the pages dirtied after snapshot command.

    Reviewed-by: Nikolay Borisov
    Signed-off-by: Ethan Lien
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Ethan Lien
     
  • This will be used in future patches that remove the optional
    extent_io_ops callbacks.

    Signed-off-by: Nikolay Borisov
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Nikolay Borisov
     

15 Oct, 2018

1 commit

  • There are two members in struct btrfs_root which indicate root's
    objectid: objectid and root_key.objectid.

    They are both set to the same value in __setup_root():

    static void __setup_root(struct btrfs_root *root,
    struct btrfs_fs_info *fs_info,
    u64 objectid)
    {
    ...
    root->objectid = objectid;
    ...
    root->root_key.objectid = objecitd;
    ...
    }

    and not changed to other value after initialization.

    grep in btrfs directory shows both are used in many places:
    $ grep -rI "root->root_key.objectid" | wc -l
    133
    $ grep -rI "root->objectid" | wc -l
    55
    (4.17, inc. some noise)

    It is confusing to have two similar variable names and it seems
    that there is no rule about which should be used in a certain case.

    Since ->root_key itself is needed for tree reloc tree, let's remove
    'objecitd' member and unify code to use ->root_key.objectid in all places.

    Signed-off-by: Misono Tomohiro
    Reviewed-by: Qu Wenruo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Misono Tomohiro
     

06 Aug, 2018

1 commit

  • While the regular inode timestamps all use timespec64 now, the i_otime
    field is btrfs specific and still needs to be converted to correctly
    represent times beyond 2038.

    Signed-off-by: Arnd Bergmann
    Reviewed-by: Nikolay Borisov
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Arnd Bergmann
     

29 May, 2018

3 commits

  • We got rid of BTRFS_INODE_HAS_ORPHAN_ITEM and
    BTRFS_INODE_ORPHAN_META_RESERVED, so we can renumber the flags to make
    them consecutive again.

    Signed-off-by: Omar Sandoval
    [ switch them enums so we don't have to do that again ]
    Signed-off-by: David Sterba

    Omar Sandoval
     
  • Now that we don't keep long-standing reservations for orphan items,
    root->orphan_block_rsv isn't used. We can git rid of it, along with:

    - root->orphan_lock, which was used to protect root->orphan_block_rsv
    - root->orphan_inodes, which was used as a refcount for root->orphan_block_rsv
    - BTRFS_INODE_ORPHAN_META_RESERVED, which was used to track reservations
    in root->orphan_block_rsv
    - btrfs_orphan_commit_root(), which was the last user of any of these
    and does nothing else

    Reviewed-by: Nikolay Borisov
    Signed-off-by: Omar Sandoval
    Signed-off-by: David Sterba

    Omar Sandoval
     
  • Now that we don't add orphan items for truncate, there can't be races on
    adding or deleting an orphan item, so this bit is unnecessary.

    Reviewed-by: Nikolay Borisov
    Signed-off-by: Omar Sandoval
    Signed-off-by: David Sterba

    Omar Sandoval
     

12 Apr, 2018

1 commit


31 Mar, 2018

2 commits


26 Mar, 2018

1 commit

  • delayed_iput_count wa supposed to be used to implement, well, delayed
    iput. The idea is that we keep accumulating the number of iputs we do
    until eventually the inode is deleted. Turns out we never really
    switched the delayed_iput_count from 0 to 1, hence all conditional
    code relying on the value of that member being different than 0 was
    never executed. This, as it turns out, didn't cause any problem due
    to the simple fact that the generic inode's i_count member was always
    used to count the number of iputs. So let's just remove the unused
    member and all unused code. This patch essentially provides no
    functional changes. While at it, also add proper documentation for
    btrfs_add_delayed_iput

    Signed-off-by: Nikolay Borisov
    Reviewed-by: David Sterba
    [ reformat comment ]
    Signed-off-by: David Sterba

    Nikolay Borisov
     

02 Nov, 2017

3 commits

  • The way we handle delalloc metadata reservations has gotten
    progressively more complicated over the years. There is so much cruft
    and weirdness around keeping the reserved count and outstanding counters
    consistent and handling the error cases that it's impossible to
    understand.

    Fix this by making the delalloc block rsv per-inode. This way we can
    calculate the actual size of the outstanding metadata reservations every
    time we make a change, and then reserve the delta based on that amount.
    This greatly simplifies the code everywhere, and makes the error
    handling in btrfs_delalloc_reserve_metadata far less terrifying.

    Signed-off-by: Josef Bacik
    Signed-off-by: David Sterba

    Josef Bacik
     
  • This is handy for tracing problems with modifying the outstanding
    extents counters.

    Signed-off-by: Josef Bacik
    Signed-off-by: David Sterba

    Josef Bacik
     
  • Right now we do a lot of weird hoops around outstanding_extents in order
    to keep the extent count consistent. This is because we logically
    transfer the outstanding_extent count from the initial reservation
    through the set_delalloc_bits. This makes it pretty difficult to get a
    handle on how and when we need to mess with outstanding_extents.

    Fix this by revamping the rules of how we deal with outstanding_extents.
    Now instead everybody that is holding on to a delalloc extent is
    required to increase the outstanding extents count for itself. This
    means we'll have something like this

    btrfs_delalloc_reserve_metadata - outstanding_extents = 1
    btrfs_set_extent_delalloc - outstanding_extents = 2
    btrfs_release_delalloc_extents - outstanding_extents = 1

    for an initial file write. Now take the append write where we extend an
    existing delalloc range but still under the maximum extent size

    btrfs_delalloc_reserve_metadata - outstanding_extents = 2
    btrfs_set_extent_delalloc
    btrfs_set_bit_hook - outstanding_extents = 3
    btrfs_merge_extent_hook - outstanding_extents = 2
    btrfs_delalloc_release_extents - outstanding_extnets = 1

    In order to make the ordered extent transition we of course must now
    make ordered extents carry their own outstanding_extent reservation, so
    for cow_file_range we end up with

    btrfs_add_ordered_extent - outstanding_extents = 2
    clear_extent_bit - outstanding_extents = 1
    btrfs_remove_ordered_extent - outstanding_extents = 0

    This makes all manipulations of outstanding_extents much more explicit.
    Every successful call to btrfs_delalloc_reserve_metadata _must_ now be
    combined with btrfs_release_delalloc_extents, even in the error case, as
    that is the only function that actually modifies the
    outstanding_extents counter.

    The drawback to this is now we are much more likely to have transient
    cases where outstanding_extents is much larger than it actually should
    be. This could happen before as we manipulated the delalloc bits, but
    now it happens basically at every write. This may put more pressure on
    the ENOSPC flushing code, but I think making this code simpler is worth
    the cost. I have another change coming to mitigate this side-effect
    somewhat.

    I also added trace points for the counter manipulation. These were used
    by a bpf script I wrote to help track down leak issues.

    Signed-off-by: Josef Bacik
    Signed-off-by: David Sterba

    Josef Bacik
     

16 Aug, 2017

3 commits

  • Add new value for compression to distinguish between defrag and
    property. Previously, a single variable was used and this caused clashes
    when the per-file 'compression' was set and a defrag -c was called.

    The property-compression is loaded when the file is open, defrag will
    overwrite the same variable and reset to 0 (ie. NONE) at when the file
    defragmentaion is finished. That's considered a usability bug.

    Now we won't touch the property value, use the defrag-compression. The
    precedence of defrag is higher than for property (and whole-filesystem).

    Signed-off-by: David Sterba

    David Sterba
     
  • This is preparatory for separating inode compression requested by defrag
    and set via properties. This will fix a usability bug when defrag will
    reset compression type to NONE. If the file has compression set via
    property, it will not apply anymore (until next mount or reset through
    command line).

    We're going to fix that by adding another variable just for the defrag
    call and won't touch the property. The defrag will have higher priority
    when deciding whether to compress the data.

    Signed-off-by: David Sterba

    David Sterba
     
  • Tracepoint arguments are all read-only. If we mark the arguments
    as const, we're able to keep or convert those arguments to const
    where appropriate.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: David Sterba

    Jeff Mahoney
     

09 Jun, 2017

1 commit

  • Replace bi_error with a new bi_status to allow for a clear conversion.
    Note that device mapper overloaded bi_error with a private value, which
    we'll have to keep arround at least for now and thus propagate to a
    proper blk_status_t value.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

26 Apr, 2017

1 commit

  • Currently when there are buffered writes that were not yet flushed and
    they fall within allocated ranges of the file (that is, not in holes or
    beyond eof assuming there are no prealloc extents beyond eof), btrfs
    simply reports an incorrect number of used blocks through the stat(2)
    system call (or any of its variants), regardless of mount options or
    inode flags (compress, compress-force, nodatacow). This is because the
    number of blocks used that is reported is based on the current number
    of bytes in the vfs inode plus the number of dealloc bytes in the btrfs
    inode. The later covers bytes that both fall within allocated regions
    of the file and holes.

    Example scenarios where the number of reported blocks is wrong while the
    buffered writes are not flushed:

    $ mkfs.btrfs -f /dev/sdc
    $ mount /dev/sdc /mnt/sdc

    $ xfs_io -f -c "pwrite -S 0xaa 0 64K" /mnt/sdc/foo1
    wrote 65536/65536 bytes at offset 0
    64 KiB, 16 ops; 0.0000 sec (259.336 MiB/sec and 66390.0415 ops/sec)

    $ sync

    $ xfs_io -c "pwrite -S 0xbb 0 64K" /mnt/sdc/foo1
    wrote 65536/65536 bytes at offset 0
    64 KiB, 16 ops; 0.0000 sec (192.308 MiB/sec and 49230.7692 ops/sec)

    # The following should have reported 64K...
    $ du -h /mnt/sdc/foo1
    128K /mnt/sdc/foo1

    $ sync

    # After flushing the buffered write, it now reports the correct value.
    $ du -h /mnt/sdc/foo1
    64K /mnt/sdc/foo1

    $ xfs_io -f -c "falloc -k 0 128K" -c "pwrite -S 0xaa 0 64K" /mnt/sdc/foo2
    wrote 65536/65536 bytes at offset 0
    64 KiB, 16 ops; 0.0000 sec (520.833 MiB/sec and 133333.3333 ops/sec)

    $ sync

    $ xfs_io -c "pwrite -S 0xbb 64K 64K" /mnt/sdc/foo2
    wrote 65536/65536 bytes at offset 65536
    64 KiB, 16 ops; 0.0000 sec (260.417 MiB/sec and 66666.6667 ops/sec)

    # The following should have reported 128K...
    $ du -h /mnt/sdc/foo2
    192K /mnt/sdc/foo2

    $ sync

    # After flushing the buffered write, it now reports the correct value.
    $ du -h /mnt/sdc/foo2
    128K /mnt/sdc/foo2

    So the number of used file blocks is simply incorrect, unlike in other
    filesystems such as ext4 and xfs for example, but only while the buffered
    writes are not flushed.

    Fix this by tracking the number of delalloc bytes that fall within holes
    and beyond eof of a file, and use instead this new counter when reporting
    the number of used blocks for an inode.

    Another different problem that exists is that the delalloc bytes counter
    is reset when writeback starts (by clearing the EXTENT_DEALLOC flag from
    the respective range in the inode's iotree) and the vfs inode's bytes
    counter is only incremented when writeback finishes (through
    insert_reserved_file_extent()). Therefore while writeback is ongoing we
    simply report a wrong number of blocks used by an inode if the write
    operation covers a range previously unallocated. While this change does
    not fix this problem, it does minimizes it a lot by shortening that time
    window, as the new dealloc bytes counter (new_delalloc_bytes) is only
    decremented when writeback finishes right before updating the vfs inode's
    bytes counter. Fully fixing this second problem is not trivial and will
    be addressed later by a different patch.

    Signed-off-by: Filipe Manana

    Filipe Manana
     

28 Feb, 2017

5 commits


17 Feb, 2017

1 commit

  • The original csum error message only outputs inode number, offset, check
    sum and expected check sum.

    However no root objectid is outputted, which sometimes makes debugging
    quite painful under multi-subvolume case (including relocation).

    Also the checksum output is decimal, which seldom makes sense for
    users/developers and is hard to read in most time.

    This patch will add root objectid, which will be %lld for rootid larger
    than LAST_FREE_OBJECTID, and hex csum output for better readability.

    Signed-off-by: Qu Wenruo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Qu Wenruo
     

14 Feb, 2017

2 commits

  • Signed-off-by: Nikolay Borisov
    Signed-off-by: David Sterba

    Nikolay Borisov
     
  • Currently btrfs_ino takes a struct inode and this causes a lot of
    internal btrfs functions which consume this ino to take a VFS inode,
    rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
    of VFS structs into the internals of btrfs first it's necessary to
    eliminate all uses of struct inode for the purpose of inode. This patch
    does that by using BTRFS_I to convert an inode to btrfs_inode. With
    this problem eliminated subsequent patches will start eliminating the
    passing of struct inode altogether, eventually resulting in a lot cleaner
    code.

    Signed-off-by: Nikolay Borisov
    [ fix btrfs_get_extent tracepoint prototype ]
    Signed-off-by: David Sterba

    Nikolay Borisov
     

26 Sep, 2016

1 commit

  • We have a lot of random ints in btrfs_fs_info that can be put into flags. This
    is mostly equivalent with the exception of how we deal with quota going on or
    off, now instead we set a flag when we are turning it on or off and deal with
    that appropriately, rather than just having a pending state that the current
    quota_enabled gets set to. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: David Sterba

    Josef Bacik
     

26 May, 2016

2 commits


13 May, 2016

1 commit

  • Due to the optimization of lockless direct IO writes (the inode's i_mutex
    is not held) introduced in commit 38851cc19adb ("Btrfs: implement unlocked
    dio write"), we started having races between such writes with concurrent
    fsync operations that use the fast fsync path. These races were addressed
    in the patches titled "Btrfs: fix race between fsync and lockless direct
    IO writes" and "Btrfs: fix race between fsync and direct IO writes for
    prealloc extents". The races happened because the direct IO path, like
    every other write path, does create extent maps followed by the
    corresponding ordered extents while the fast fsync path collected first
    ordered extents and then it collected extent maps. This made it possible
    to log file extent items (based on the collected extent maps) without
    waiting for the corresponding ordered extents to complete (get their IO
    done). The two fixes mentioned before added a solution that consists of
    making the direct IO path create first the ordered extents and then the
    extent maps, while the fsync path attempts to collect any new ordered
    extents once it collects the extent maps. This was simple and did not
    require adding any synchonization primitive to any data structure (struct
    btrfs_inode for example) but it makes things more fragile for future
    development endeavours and adds an exceptional approach compared to the
    other write paths.

    This change adds a read-write semaphore to the btrfs inode structure and
    makes the direct IO path create the extent maps and the ordered extents
    while holding read access on that semaphore, while the fast fsync path
    collects extent maps and ordered extents while holding write access on
    that semaphore. The logic for direct IO write path is encapsulated in a
    new helper function that is used both for cow and nocow direct IO writes.

    Signed-off-by: Filipe Manana
    Reviewed-by: Josef Bacik

    Filipe Manana
     

07 Jan, 2016

1 commit

  • Inodes for delayed iput allocate a trivial helper structure, let's place
    the list hook directly into the inode and save a kmalloc (killing a
    __GFP_NOFAIL as a bonus) at the cost of increasing size of btrfs_inode.

    The inode can be put into the delayed_iputs list more than once and we
    have to keep the count. This means we can't use the list_splice to
    process a bunch of inodes because we'd lost track of the count if the
    inode is put into the delayed iputs again while it's processed.

    Signed-off-by: David Sterba

    David Sterba
     

22 Sep, 2015

1 commit

  • The following call trace is seen when generic/095 test is executed,

    WARNING: CPU: 3 PID: 2769 at /home/chandan/code/repos/linux/fs/btrfs/inode.c:8967 btrfs_destroy_inode+0x284/0x2a0()
    Modules linked in:
    CPU: 3 PID: 2769 Comm: umount Not tainted 4.2.0-rc5+ #31
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20150306_163512-brownie 04/01/2014
    ffffffff81c08150 ffff8802ec9cbce8 ffffffff81984058 ffff8802ffd8feb0
    0000000000000000 ffff8802ec9cbd28 ffffffff81050385 ffff8802ec9cbd38
    ffff8802d12f8588 ffff8802d12f8588 ffff8802f15ab000 ffff8800bb96c0b0
    Call Trace:
    [] dump_stack+0x45/0x57
    [] warn_slowpath_common+0x85/0xc0
    [] warn_slowpath_null+0x15/0x20
    [] btrfs_destroy_inode+0x284/0x2a0
    [] destroy_inode+0x37/0x60
    [] evict+0x109/0x170
    [] dispose_list+0x35/0x50
    [] evict_inodes+0xaa/0x100
    [] generic_shutdown_super+0x47/0xf0
    [] kill_anon_super+0x11/0x20
    [] btrfs_kill_super+0x13/0x110
    [] deactivate_locked_super+0x39/0x70
    [] deactivate_super+0x5f/0x70
    [] cleanup_mnt+0x3e/0x90
    [] __cleanup_mnt+0xd/0x10
    [] task_work_run+0x96/0xb0
    [] do_notify_resume+0x3d/0x50
    [] int_signal+0x12/0x17

    This means that the inode had non-zero "outstanding extents" during
    eviction. This occurs because, during direct I/O a task which successfully
    used up its reserved data space would set BTRFS_INODE_DIO_READY bit and does
    not clear the bit after finishing the DIO write. A future DIO write could
    actually fail and the unused reserve space won't be freed because of the
    previously set BTRFS_INODE_DIO_READY bit.

    Clearing the BTRFS_INODE_DIO_READY bit in btrfs_direct_IO() caused the
    following issue,
    |-----------------------------------+-------------------------------------|
    | Task A | Task B |
    |-----------------------------------+-------------------------------------|
    | Start direct i/o write on inode X.| |
    | reserve space | |
    | Allocate ordered extent | |
    | release reserved space | |
    | Set BTRFS_INODE_DIO_READY bit. | |
    | | splice() |
    | | Transfer data from pipe buffer to |
    | | destination file. |
    | | - kmap(pipe buffer page) |
    | | - Start direct i/o write on |
    | | inode X. |
    | | - reserve space |
    | | - dio_refill_pages() |
    | | - sdio->blocks_available == 0 |
    | | - Since a kernel address is |
    | | being passed instead of a |
    | | user space address, |
    | | iov_iter_get_pages() returns |
    | | -EFAULT. |
    | | - Since BTRFS_INODE_DIO_READY is |
    | | set, we don't release reserved |
    | | space. |
    | | - Clear BTRFS_INODE_DIO_READY bit.|
    | -EIOCBQUEUED is returned. | |
    |-----------------------------------+-------------------------------------|

    Hence this commit introduces "struct btrfs_dio_data" to track the usage of
    reserved data space. The remaining unused "reserve space" can now be freed
    reliably.

    Signed-off-by: Chandan Rajendra
    Reviewed-by: Liu Bo
    Signed-off-by: Chris Mason

    chandan
     

02 Jul, 2015

1 commit

  • While running generic/019, dmesg got several warnings from
    btrfs_free_reserved_data_space().

    Test generic/019 produces some disk failures so sumbit dio will get errors,
    in which case, btrfs_direct_IO() goes to the error handling and free
    bytes_may_use, but the problem is that bytes_may_use has been free'd
    during get_block().

    This adds a runtime flag to show if we've gone through get_block(), if so,
    don't do the cleanup work.

    Signed-off-by: Liu Bo
    Reviewed-by: Filipe Manana
    Tested-by: Filipe Manana
    Signed-off-by: Chris Mason

    Liu Bo