27 Jul, 2020

1 commit

  • Last touched in 2013 by commit de78b51a2852 ("btrfs: remove cache only
    arguments from defrag path") that was the only code that used the value.
    Now it's only set but never used for anything, so we can remove it.

    Reviewed-by: Nikolay Borisov
    Reviewed-by: Anand Jain
    Signed-off-by: David Sterba

    David Sterba
     

25 May, 2020

1 commit

  • The name BTRFS_ROOT_REF_COWS is not very clear about the meaning.

    In fact, that bit can only be set to those trees:

    - Subvolume roots
    - Data reloc root
    - Reloc roots for above roots

    All other trees won't get this bit set. So just by the result, it is
    obvious that, roots with this bit set can have tree blocks shared with
    other trees. Either shared by snapshots, or by reloc roots (an special
    snapshot created by relocation).

    This patch will rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE to
    make it easier to understand, and update all comment mentioning
    "reference counted" to follow the rename.

    Signed-off-by: Qu Wenruo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Qu Wenruo
     

25 Feb, 2019

1 commit


12 Apr, 2018

1 commit


31 Mar, 2018

1 commit


18 Dec, 2015

1 commit

  • When running fstests btrfs/070, with a higher number of fsstress
    operations, I ran frequently into two different locking bugs when
    defragging directories.

    The first bug produced the following traces:

    [133860.229792] ------------[ cut here ]------------
    [133860.251062] WARNING: CPU: 2 PID: 26057 at fs/btrfs/locking.c:46 btrfs_set_lock_blocking_rw+0x57/0xbd [btrfs]()
    [133860.253576] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 psmouse parport
    [133860.282566] CPU: 2 PID: 26057 Comm: btrfs Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1
    [133860.284393] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
    [133860.286827] 0000000000000000 ffff880207697b78 ffffffff812566f4 0000000000000000
    [133860.288341] ffff880207697bb0 ffffffff8104d0a6 ffffffffa052d4c1 ffff880178f60e00
    [133860.294219] ffff880178f60e00 0000000000000000 00000000000000f6 ffff880207697bc0
    [133860.295831] Call Trace:
    [133860.306518] [] dump_stack+0x4e/0x79
    [133860.307473] [] warn_slowpath_common+0x9f/0xb8
    [133860.308619] [] ? btrfs_set_lock_blocking_rw+0x57/0xbd [btrfs]
    [133860.310068] [] warn_slowpath_null+0x1a/0x1c
    [133860.312552] [] btrfs_set_lock_blocking_rw+0x57/0xbd [btrfs]
    [133860.314630] [] btrfs_set_lock_blocking+0xe/0x10 [btrfs]
    [133860.323596] [] btrfs_realloc_node+0xb3/0x341 [btrfs]
    [133860.325233] [] btrfs_defrag_leaves+0x239/0x2fa [btrfs]
    [133860.332427] [] btrfs_defrag_root+0x63/0xca [btrfs]
    [133860.337259] [] btrfs_ioctl_defrag+0x78/0x14e [btrfs]
    [133860.340147] [] btrfs_ioctl+0x746/0x24c6 [btrfs]
    [133860.344833] [] ? arch_local_irq_save+0x9/0xc
    [133860.346343] [] ? __might_fault+0x4c/0xa7
    [133860.353248] [] ? __might_fault+0x4c/0xa7
    [133860.354242] [] ? __might_fault+0xa5/0xa7
    [133860.355232] [] ? cp_new_stat+0x15d/0x174
    [133860.356237] [] do_vfs_ioctl+0x427/0x4e6
    [133860.358587] [] ? SYSC_newfstat+0x25/0x2e
    [133860.360195] [] ? __fget_light+0x4d/0x71
    [133860.361380] [] SyS_ioctl+0x57/0x79
    [133860.363578] [] entry_SYSCALL_64_fastpath+0x12/0x6f
    [133860.366217] ---[ end trace 2cadb2f653437e49 ]---
    [133860.367399] ------------[ cut here ]------------
    [133860.368162] kernel BUG at fs/btrfs/locking.c:307!
    [133860.369430] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [133860.370205] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 psmouse parport
    [133860.370205] CPU: 2 PID: 26057 Comm: btrfs Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1
    [133860.370205] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
    [133860.370205] task: ffff8800aec6db40 ti: ffff880207694000 task.ti: ffff880207694000
    [133860.370205] RIP: 0010:[] [] btrfs_assert_tree_locked+0x10/0x14 [btrfs]
    [133860.370205] RSP: 0018:ffff880207697bc0 EFLAGS: 00010246
    [133860.370205] RAX: 0000000000000000 RBX: ffff880178f60e00 RCX: 0000000000000000
    [133860.370205] RDX: ffff88023ec4fb50 RSI: 00000000ffffffff RDI: ffff880178f60e00
    [133860.370205] RBP: ffff880207697bc0 R08: 0000000000000001 R09: 0000000000000000
    [133860.370205] R10: 0000160000000000 R11: ffffffff81651000 R12: ffff880178f60e00
    [133860.370205] R13: 0000000000000000 R14: 00000000000000f6 R15: ffff8801ff409000
    [133860.370205] FS: 00007f763efd48c0(0000) GS:ffff88023ec40000(0000) knlGS:0000000000000000
    [133860.370205] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [133860.370205] CR2: 0000000002158048 CR3: 000000003fd6c000 CR4: 00000000000006e0
    [133860.370205] Stack:
    [133860.370205] ffff880207697bd8 ffffffffa052d4d0 0000000000000000 ffff880207697be8
    [133860.370205] ffffffffa04d5787 ffff880207697c80 ffffffffa04d99cb ffff8801ff409590
    [133860.370205] ffff880207697ca8 000000f507697c80 ffff880183c11bb8 0000000000000000
    [133860.370205] Call Trace:
    [133860.370205] [] btrfs_set_lock_blocking_rw+0x66/0xbd [btrfs]
    [133860.370205] [] btrfs_set_lock_blocking+0xe/0x10 [btrfs]
    [133860.370205] [] btrfs_realloc_node+0xb3/0x341 [btrfs]
    [133860.370205] [] btrfs_defrag_leaves+0x239/0x2fa [btrfs]
    [133860.370205] [] btrfs_defrag_root+0x63/0xca [btrfs]
    [133860.370205] [] btrfs_ioctl_defrag+0x78/0x14e [btrfs]
    [133860.370205] [] btrfs_ioctl+0x746/0x24c6 [btrfs]
    [133860.370205] [] ? arch_local_irq_save+0x9/0xc
    [133860.370205] [] ? __might_fault+0x4c/0xa7
    [133860.370205] [] ? __might_fault+0x4c/0xa7
    [133860.370205] [] ? __might_fault+0xa5/0xa7
    [133860.370205] [] ? cp_new_stat+0x15d/0x174
    [133860.370205] [] do_vfs_ioctl+0x427/0x4e6
    [133860.370205] [] ? SYSC_newfstat+0x25/0x2e
    [133860.370205] [] ? __fget_light+0x4d/0x71
    [133860.370205] [] SyS_ioctl+0x57/0x79
    [133860.370205] [] entry_SYSCALL_64_fastpath+0x12/0x6f

    This bug happened because we assumed that by setting keep_locks to 1 in
    our search path, our path after a call to btrfs_search_slot() would have
    all nodes locked, which is not always true because unlock_up() (called by
    btrfs_search_slot()) will unlock a node in a path if the slot of the node
    below it doesn't point to the last item or beyond the last item. For
    example, when the tree has a heigth of 2 and path->slots[0] has a value
    smaller than btrfs_header_nritems(path->nodes[0]) - 1, the node at level 2
    will be unlocked (also because lowest_unlock is set to 1 due to the fact
    that the value passed as ins_len to btrfs_search_slot is 0).
    This resulted in btrfs_find_next_key(), called before btrfs_realloc_node(),
    to release out path and call again btrfs_search_slot(), but this time with
    the cow parameter set to 0, meaning the resulting path got only read locks.
    Therefore when we called btrfs_realloc_node(), with path->nodes[1] having
    a read lock, it resulted in the warning and BUG_ON when calling
    btrfs_set_lock_blocking() against the node, as that function expects the
    node to have a write lock.

    The second bug happened often when the first bug didn't happen, and made
    us hang and hitting the following warning at fs/btrfs/locking.c:

    251 void btrfs_tree_lock(struct extent_buffer *eb)
    252 {
    253 WARN_ON(eb->lock_owner == current->pid);

    This happened because the tree search we made at btrfs_defrag_leaves()
    before calling btrfs_find_next_key() locked a leaf and all the other
    nodes in the path, so btrfs_find_next_key() had no need to release the
    path and make a new search (with path->lowest_level set to 1). This
    made btrfs_realloc_node() attempt to write lock the same leaf again,
    resulting in a hang/deadlock.

    So fix these issues by calling btrfs_find_next_key() after calling
    btrfs_realloc_node() and setting the search path's lowest_level to 1
    to avoid the hang/deadlock when attempting to write lock the leaves
    at btrfs_realloc_node().

    Signed-off-by: Filipe Manana

    Filipe Manana
     

01 Sep, 2015

1 commit


03 Jun, 2015

1 commit

  • Long time ago (2008) the defrag was automatic for new b-tree writes but
    has been disabled after performance problems. There was a leftover in
    tree-defrag.c that effectively stops any defragmentation on b-trees.
    This is a bit unexpected and IMHO undesired. The SSD mode is an
    optimization and defrag is supposed to work if the users asks for it.

    Related commits:

    6702ed490ca0bb44e17131818a5a18b773957c5a
    Btrfs: Add run time btree defrag, and an ioctl to force btree defrag

    e18e4809b10e6c9efb5fe10c1ddcb4ebb690d517
    Btrfs: Add mount -o ssd, which includes optimizations for seek free
    storage

    b3236e68bf86b3ae87f58984a1822369225211cb
    Btrfs: Leave on the tree defragger in mount -o ssd, it still helps there

    9afbb0b752ef30a429c45b9de6706e28ad1a36e1
    Btrfs: Disable tree defrag in SSD mode

    The last three commits switch the defrag+ssd off/on/off and the last one

    3f157a2fd2ad731e1ed9964fecdc5f459f04a4a4
    Btrfs: Online btree defragmentation fixes

    misses the bits from tree-defrag.c to revert to the behaviour introduced
    in e18e4809b10e.

    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     

10 Jun, 2014

1 commit


12 Nov, 2013

2 commits


21 Feb, 2013

1 commit


02 May, 2011

1 commit


30 Oct, 2010

1 commit

  • These are all the cases where a variable is set, but not read which are
    not bugs as far as I can see, but simply leftovers.

    Still needs more review.

    Found by gcc 4.6's new warnings

    Signed-off-by: Andi Kleen
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Chris Mason

    Andi Kleen
     

25 May, 2010

1 commit


25 Mar, 2009

1 commit

  • The extent allocation tree maintains a reference count and full
    back reference information for every extent allocated in the
    filesystem. For subvolume and snapshot trees, every time
    a block goes through COW, the new copy of the block adds a reference
    on every block it points to.

    If a btree node points to 150 leaves, then the COW code needs to go
    and add backrefs on 150 different extents, which might be spread all
    over the extent allocation tree.

    These updates currently happen during btrfs_cow_block, and most COWs
    happen during btrfs_search_slot. btrfs_search_slot has locks held
    on both the parent and the node we are COWing, and so we really want
    to avoid IO during the COW if we can.

    This commit adds an rbtree of pending reference count updates and extent
    allocations. The tree is ordered by byte number of the extent and byte number
    of the parent for the back reference. The tree allows us to:

    1) Modify back references in something close to disk order, reducing seeks
    2) Significantly reduce the number of modifications made as block pointers
    are balanced around
    3) Do all of the extent insertion and back reference modifications outside
    of the performance critical btrfs_search_slot code.

    #3 has the added benefit of greatly reducing the btrfs stack footprint.
    The extent allocation tree modifications are done without the deep
    (and somewhat recursive) call chains used in the past.

    These delayed back reference updates must be done before the transaction
    commits, and so the rbtree is tied to the transaction. Throttling is
    implemented to help keep the queue of backrefs at a reasonable size.

    Since there was a similar mechanism in place for the extent tree
    extents, that is removed and replaced by the delayed reference tree.

    Yan Zheng helped review and fixup this code.

    Signed-off-by: Chris Mason

    Chris Mason
     

04 Feb, 2009

1 commit

  • Most of the btrfs metadata operations can be protected by a spinlock,
    but some operations still need to schedule.

    So far, btrfs has been using a mutex along with a trylock loop,
    most of the time it is able to avoid going for the full mutex, so
    the trylock loop is a big performance gain.

    This commit is step one for getting rid of the blocking locks entirely.
    btrfs_tree_lock takes a spinlock, and the code explicitly switches
    to a blocking lock when it starts an operation that can schedule.

    We'll be able get rid of the blocking locks in smaller pieces over time.
    Tracing allows us to find the most common cause of blocking, so we
    can start with the hot spots first.

    The basic idea is:

    btrfs_tree_lock() returns with the spin lock held

    btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in
    the extent buffer flags, and then drops the spin lock. The buffer is
    still considered locked by all of the btrfs code.

    If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops
    the spin lock and waits on a wait queue for the blocking bit to go away.

    Much of the code that needs to set the blocking bit finishes without actually
    blocking a good percentage of the time. So, an adaptive spin is still
    used against the blocking bit to avoid very high context switch rates.

    btrfs_clear_lock_blocking() clears the blocking bit and returns
    with the spinlock held again.

    btrfs_tree_unlock() can be called on either blocking or spinning locks,
    it does the right thing based on the blocking bit.

    ctree.c has a helper function to set/clear all the locked buffers in a
    path as blocking.

    Signed-off-by: Chris Mason

    Chris Mason
     

06 Jan, 2009

1 commit


30 Oct, 2008

1 commit

  • This patch removes the giant fs_info->alloc_mutex and replaces it with a bunch
    of little locks.

    There is now a pinned_mutex, which is used when messing with the pinned_extents
    extent io tree, and the extent_ins_mutex which is used with the pending_del and
    extent_ins extent io trees.

    The locking for the extent tree stuff was inspired by a patch that Yan Zheng
    wrote to fix a race condition, I cleaned it up some and changed the locking
    around a little bit, but the idea remains the same. Basically instead of
    holding the extent_ins_mutex throughout the processing of an extent on the
    extent_ins or pending_del trees, we just hold it while we're searching and when
    we clear the bits on those trees, and lock the extent for the duration of the
    operations on the extent.

    Also to keep from getting hung up waiting to lock an extent, I've added a
    try_lock_extent so if we cannot lock the extent, move on to the next one in the
    tree and we'll come back to that one. I have tested this heavily and it does
    not appear to break anything. This has to be applied on top of my
    find_free_extent redo patch.

    I tested this patch on top of Yan's space reblancing code and it worked fine.
    The only thing that has changed since the last version is I pulled out all my
    debugging stuff, apparently I forgot to run guilt refresh before I sent the
    last patch out. Thank you,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

30 Sep, 2008

1 commit

  • This improves the comments at the top of many functions. It didn't
    dive into the guts of functions because I was trying to
    avoid merging problems with the new allocator and back reference work.

    extent-tree.c and volumes.c were both skipped, and there is definitely
    more work todo in cleaning and commenting the code.

    Signed-off-by: Chris Mason

    Chris Mason
     

25 Sep, 2008

20 commits