Eric Lee / smarc-fsl-linux-kernel

27 Sep, 2016

1 commit

ab8d0fc48 btrfs: convert pr_* to btrfs_* where possible ... Browse Code »

For many printks, we want to know which file system issued the message.

This patch converts most pr_* calls to use the btrfs_* versions instead.
In some cases, this means adding plumbing to allow call sites access to
an fs_info pointer.

fs/btrfs/check-integrity.c is left alone for another day.

Signed-off-by: Jeff Mahoney
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Jeff Mahoney
2016-09-27 01:37:04 +0800

26 Sep, 2016

1 commit

afcdd129e Btrfs: add a flags field to btrfs_fs_info ... Browse Code »

We have a lot of random ints in btrfs_fs_info that can be put into flags. This
is mostly equivalent with the exception of how we deal with quota going on or
off, now instead we set a flag when we are turning it on or off and deal with
that appropriately, rather than just having a pending state that the current
quota_enabled gets set to. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: David Sterba

Josef Bacik
2016-09-26 23:59:49 +0800

25 Aug, 2016

1 commit

cb93b52cc btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent() ... Browse Code »

Refactor btrfs_qgroup_insert_dirty_extent() function, to two functions:
1. btrfs_qgroup_insert_dirty_extent_nolock()
Almost the same with original code.
For delayed_ref usage, which has delayed refs locked.

Change the return value type to int, since caller never needs the
pointer, but only needs to know if they need to free the allocated
memory.

2. btrfs_qgroup_insert_dirty_extent()
The more encapsulated version.

Will do the delayed_refs lock, memory allocation, quota enabled check
and other things.

The original design is to keep exported functions to minimal, but since
more btrfs hacks exposed, like replacing path in balance, we need to
record dirty extents manually, so we have to add such functions.

Also, add comment for both functions, to info developers how to keep
qgroup correct when doing hacks.

Cc: Mark Fasheh
Signed-off-by: Qu Wenruo
Reviewed-and-Tested-by: Goldwyn Rodrigues
Signed-off-by: David Sterba
Signed-off-by: Chris Mason

Qu Wenruo
2016-08-25 18:58:21 +0800

06 Aug, 2016

1 commit

108388165 Merge branch 'integration-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/… ... Browse Code »

…fdmanana/linux into for-linus-4.8

Chris Mason
2016-08-06 03:25:05 +0800

03 Aug, 2016

1 commit

e65714993 Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve() ... Browse Code »

No longer used as of commit 5846a3c26873 ("btrfs: qgroup: Fix a race in
delayed_ref which leads to abort trans").

Signed-off-by: Filipe Manana

Filipe Manana
2016-08-03 18:02:51 +0800

26 Jul, 2016

2 commits

bc074524e btrfs: prefix fsid to all trace events ... Browse Code »

When using trace events to debug a problem, it's impossible to determine
which file system generated a particular event. This patch adds a
macro to prefix standard information to the head of a trace event.

The extent_state alloc/free events are all that's left without an
fs_info available.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-07-26 19:53:16 +0800
fba4b6977 btrfs: Fix slab accounting flags ... Browse Code »

BTRFS is using a variety of slab caches to satisfy internal needs.
Those slab caches are always allocated with the SLAB_RECLAIM_ACCOUNT,
meaning allocations from the caches are going to be accounted as
SReclaimable. At the same time btrfs is not registering any shrinkers
whatsoever, thus preventing memory from the slabs to be shrunk. This
means those caches are not in fact reclaimable.

To fix this remove the SLAB_RECLAIM_ACCOUNT on all caches apart from the
inode cache, since this one is being freed by the generic VFS super_block
shrinker. Also set the transaction related caches as SLAB_TEMPORARY,
to better document the lifetime of the objects (it just translates
to SLAB_RECLAIM_ACCOUNT).

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2016-07-26 19:52:25 +0800

18 Feb, 2016

1 commit

5598e9005 btrfs: drop null testing before destroy functions ... Browse Code »

Cleanup.

kmem_cache_destroy has support NULL argument checking,
so drop the double null testing before calling it.

Signed-off-by: Kinglong Mee
Signed-off-by: David Sterba

Kinglong Mee
2016-02-18 18:46:03 +0800

07 Jan, 2016

1 commit

35b3ad50b btrfs: better packing of btrfs_delayed_extent_op ... Browse Code »

btrfs_delayed_extent_op can be packed in a better way, it's 40 bytes now
and has 8 unused bytes. Reducing the level type to u8 makes it possible
to squeeze it to the padding byte after key. The bitfields were switched
to bool as there's space to store the full byte without increasing the
whole structure, besides that the generated assembly is smaller.

struct btrfs_delayed_extent_op {
struct btrfs_disk_key key; /* 0 17 */
u8 level; /* 17 1 */
bool update_key; /* 18 1 */
bool update_flags; /* 19 1 */
bool is_data; /* 20 1 */

/* XXX 3 bytes hole, try to pack */

u64 flags_to_set; /* 24 8 */

/* size: 32, cachelines: 1, members: 6 */
/* sum members: 29, holes: 1, sum holes: 3 */
/* last cacheline: 32 bytes */
};

The final size is 32 bytes which gives +26 object per slab page.

text data bss dec hex filename
938811 43670 23144 1005625 f5839 fs/btrfs/btrfs.ko.before
938747 43670 23144 1005561 f57f9 fs/btrfs/btrfs.ko.after

Signed-off-by: David Sterba

David Sterba
2016-01-07 21:26:58 +0800

27 Oct, 2015

1 commit

5846a3c26 btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans ... Browse Code »

Between btrfs_allocerved_file_extent() and
btrfs_add_delayed_qgroup_reserve(), there is a window that delayed_refs
are run and delayed ref head maybe freed before
btrfs_add_delayed_qgroup_reserve().

This will cause btrfs_dad_delayed_qgroup_reserve() to return -ENOENT,
and cause transaction to be aborted.

This patch will record qgroup reserve space info into delayed_ref_head
at btrfs_add_delayed_ref(), to eliminate the race window.

Reported-by: Filipe Manana
Signed-off-by: Qu Wenruo
Signed-off-by: Chris Mason

Qu Wenruo
2015-10-27 10:44:39 +0800

26 Oct, 2015

2 commits

b06c4bf5c Btrfs: fix regression running delayed references when using qgroups ... Browse Code »

In the kernel 4.2 merge window we had a big changes to the implementation
of delayed references and qgroups which made the no_quota field of delayed
references not used anymore. More specifically the no_quota field is not
used anymore as of:

commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.")

Leaving the no_quota field actually prevents delayed references from
getting merged, which in turn cause the following BUG_ON(), at
fs/btrfs/extent-tree.c, to be hit when qgroups are enabled:

static int run_delayed_tree_ref(...)
{
(...)
BUG_ON(node->ref_mod != 1);
(...)
}

This happens on a scenario like the following:

1) Ref1 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added.

2) Ref2 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added.
It's not merged with Ref1 because Ref1->no_quota != Ref2->no_quota.

3) Ref3 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added.
It's not merged with the reference at the tail of the list of refs
for bytenr X because the reference at the tail, Ref2 is incompatible
due to Ref2->no_quota != Ref3->no_quota.

4) Ref4 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added.
It's not merged with the reference at the tail of the list of refs
for bytenr X because the reference at the tail, Ref3 is incompatible
due to Ref3->no_quota != Ref4->no_quota.

5) We run delayed references, trigger merging of delayed references,
through __btrfs_run_delayed_refs() -> btrfs_merge_delayed_refs().

6) Ref1 and Ref3 are merged as Ref1->no_quota = Ref3->no_quota and
all other conditions are satisfied too. So Ref1 gets a ref_mod
value of 2.

7) Ref2 and Ref4 are merged as Ref2->no_quota = Ref4->no_quota and
all other conditions are satisfied too. So Ref2 gets a ref_mod
value of 2.

8) Ref1 and Ref2 aren't merged, because they have different values
for their no_quota field.

9) Delayed reference Ref1 is picked for running (select_delayed_ref()
always prefers references with an action == BTRFS_ADD_DELAYED_REF).
So run_delayed_tree_ref() is called for Ref1 which triggers the
BUG_ON because Ref1->red_mod != 1 (equals 2).

So fix this by removing the no_quota field, as it's not used anymore as
of commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented
qgroup mechanism.").

The use of no_quota was also buggy in at least two places:

1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting
no_quota to 0 instead of 1 when the following condition was true:
is_fstree(ref_root) || !fs_info->quota_enabled

2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to
reset a node's no_quota when the condition "!is_fstree(root_objectid)
|| !root->fs_info->quota_enabled" was true but we did it only in
an unused local stack variable, that is, we never reset the no_quota
value in the node itself.

This fixes the remainder of problems several people have been having when
running delayed references, mostly while a balance is running in parallel,
on a 4.2+ kernel.

Very special thanks to Stéphane Lesimple for helping debugging this issue
and testing this fix on his multi terabyte filesystem (which took more
than one day to balance alone, plus fsck, etc).

Also, this fixes deadlock issue when using the clone ioctl with qgroups
enabled, as reported by Elias Probst in the mailing list. The deadlock
happens because after calling btrfs_insert_empty_item we have our path
holding a write lock on a leaf of the fs/subvol tree and then before
releasing the path we called check_ref() which did backref walking, when
qgroups are enabled, and tried to read lock the same leaf. The trace for
this case is the following:

INFO: task systemd-nspawn:6095 blocked for more than 120 seconds.
(...)
Call Trace:
[] schedule+0x74/0x83
[] btrfs_tree_read_lock+0xc0/0xea
[] ? wait_woken+0x74/0x74
[] btrfs_search_old_slot+0x51a/0x810
[] btrfs_next_old_leaf+0xdf/0x3ce
[] ? ulist_add_merge+0x1b/0x127
[] __resolve_indirect_refs+0x62a/0x667
[] ? btrfs_clear_lock_blocking_rw+0x78/0xbe
[] find_parent_nodes+0xaf3/0xfc6
[] __btrfs_find_all_roots+0x92/0xf0
[] btrfs_find_all_roots+0x45/0x65
[] ? btrfs_get_tree_mod_seq+0x2b/0x88
[] check_ref+0x64/0xc4
[] btrfs_clone+0x66e/0xb5d
[] btrfs_ioctl_clone+0x48f/0x5bb
[] ? native_sched_clock+0x28/0x77
[] btrfs_ioctl+0xabc/0x25cb
(...)

The problem goes away by eleminating check_ref(), which no longer is
needed as its purpose was to get a value for the no_quota field of
a delayed reference (this patch removes the no_quota field as mentioned
earlier).

Reported-by: Stéphane Lesimple
Tested-by: Stéphane Lesimple
Reported-by: Elias Probst
Reported-by: Peter Becker
Reported-by: Malte Schröder
Reported-by: Derek Dongray
Reported-by: Erkki Seppala
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Filipe Manana
Reviewed-by: Qu Wenruo

Filipe Manana
2015-10-26 03:53:26 +0800
2c3cf7d5f Btrfs: fix regression when running delayed references ... Browse Code »

In the kernel 4.2 merge window we had a refactoring/rework of the delayed
references implementation in order to fix certain problems with qgroups.
However that rework introduced one more regression that leads to the
following trace when running delayed references for metadata:

[35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
[35908.065201] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc psmouse i2
[35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1
[35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[35908.065201] task: ffff880114b7d780 ti: ffff88010c4c8000 task.ti: ffff88010c4c8000
[35908.065201] RIP: 0010:[] [] insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201] RSP: 0018:ffff88010c4cbb08 EFLAGS: 00010293
[35908.065201] RAX: 0000000000000000 RBX: ffff88008a661000 RCX: 0000000000000000
[35908.065201] RDX: ffffffffa04dd58f RSI: 0000000000000001 RDI: 0000000000000000
[35908.065201] RBP: ffff88010c4cbb40 R08: 0000000000001000 R09: ffff88010c4cb9f8
[35908.065201] R10: 0000000000000000 R11: 000000000000002c R12: 0000000000000000
[35908.065201] R13: ffff88020a74c578 R14: 0000000000000000 R15: 0000000000000000
[35908.065201] FS: 0000000000000000(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000
[35908.065201] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[35908.065201] CR2: 00000000015e8708 CR3: 0000000102185000 CR4: 00000000000006e0
[35908.065201] Stack:
[35908.065201] ffff88010c4cbb18 0000000000000f37 ffff88020a74c578 ffff88015a408000
[35908.065201] ffff880154a44000 0000000000000000 0000000000000005 ffff88010c4cbbd8
[35908.065201] ffffffffa0492b9a 0000000000000005 0000000000000000 0000000000000000
[35908.065201] Call Trace:
[35908.065201] [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs]
[35908.065201] [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 [btrfs]
[35908.065201] [] __btrfs_run_delayed_refs+0xafa/0xd33 [btrfs]
[35908.065201] [] ? join_transaction.isra.10+0x25/0x41f [btrfs]
[35908.065201] [] ? join_transaction.isra.10+0xa8/0x41f [btrfs]
[35908.065201] [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs]
[35908.065201] [] delayed_ref_async_start+0x3c/0x7b [btrfs]
[35908.065201] [] normal_work_helper+0x14c/0x32a [btrfs]
[35908.065201] [] btrfs_extent_refs_helper+0x12/0x14 [btrfs]
[35908.065201] [] process_one_work+0x24a/0x4ac
[35908.065201] [] worker_thread+0x206/0x2c2
[35908.065201] [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201] [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201] [] kthread+0xef/0xf7
[35908.065201] [] ? kthread_parkme+0x24/0x24
[35908.065201] [] ret_from_fork+0x3f/0x70
[35908.065201] [] ? kthread_parkme+0x24/0x24
[35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 0b 4c 8b 45 30 8b 4d 28 45 31
[35908.065201] RIP [] insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201] RSP
[35908.310885] ---[ end trace fe4299baf0666457 ]---

This happens because the new delayed references code no longer merges
delayed references that have different sequence values. The following
steps are an example sequence leading to this issue:

1) Transaction N starts, fs_info->tree_mod_seq has value 0;

2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for
bytenr A is created, with a value of 1 and a seq value of 0;

3) fs_info->tree_mod_seq is incremented to 1;

4) Extent buffer A is deleted through btrfs_del_items(), which calls
btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The
later returns the metadata extent associated to extent buffer A to
the free space cache (the range is not pinned), because the extent
buffer was created in the current transaction (N) and writeback never
happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set
in the extent buffer).
This creates the delayed reference Ref2 for bytenr A, with a value
of -1 and a seq value of 1;

5) Delayed reference Ref2 is not merged with Ref1 when we create it,
because they have different sequence numbers (decided at
add_delayed_ref_tail_merge());

6) fs_info->tree_mod_seq is incremented to 2;

7) Some task attempts to allocate a new extent buffer (done at
extent-tree.c:find_free_extent()), but due to heavy fragmentation
and running low on metadata space the clustered allocation fails
and we fall back to unclustered allocation, which finds the
extent at offset A, so a new extent buffer at offset A is allocated.
This creates delayed reference Ref3 for bytenr A, with a value of 1
and a seq value of 2;

8) Ref3 is not merged neither with Ref2 nor Ref1, again because they
all have different seq values;

9) We start running the delayed references (__btrfs_run_delayed_refs());

10) The delayed Ref1 is the first one being applied, which ends up
creating an inline extent backref in the extent tree;

10) Next the delayed reference Ref3 is selected for execution, and not
Ref2, because select_delayed_ref() always gives a preference for
positive references (that have an action of BTRFS_ADD_DELAYED_REF);

11) When running Ref3 we encounter alreay the inline extent backref
in the extent tree at insert_inline_extent_backref(), which makes
us hit the following BUG_ON:

BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);

This is always true because owner corresponds to the level of the
extent buffer/btree node in the btree.

For the scenario described above we hit the BUG_ON because we never merge
references that have different seq values.

We used to do the merging before the 4.2 kernel, more specifically, before
the commmits:

c6fc24549960 ("btrfs: delayed-ref: Use list to replace the ref_root in ref_head.")
c43d160fcd5e ("btrfs: delayed-ref: Cleanup the unneeded functions.")

This issue became more exposed after the following change that was added
to 4.2 as well:

cffc3374e567 ("Btrfs: fix order by which delayed references are run")

Which in turn fixed another regression by the two commits previously
mentioned.

So fix this by bringing back the delayed reference merge code, with the
proper adaptations so that it operates against the new data structure
(linked list vs old red black tree implementation).

This issue was hit running fstest btrfs/063 in a loop. Several people have
reported this issue in the mailing list when running on kernels 4.2+.

Very special thanks to Stéphane Lesimple for helping debugging this issue
and testing this fix on his multi terabyte filesystem (which took more
than one day to balance alone, plus fsck, etc).

Fixes: c6fc24549960 ("btrfs: delayed-ref: Use list to replace the ref_root in ref_head.")
Reported-by: Peter Becker
Reported-by: Stéphane Lesimple
Tested-by: Stéphane Lesimple
Reported-by: Malte Schröder
Reported-by: Derek Dongray
Reported-by: Erkki Seppala
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Filipe Manana
Reviewed-by: Liu Bo

Filipe Manana
2015-10-26 03:52:23 +0800

22 Oct, 2015

1 commit

f64d5ca86 btrfs: delayed_ref: Add new function to record reserved space into delayed ref ... Browse Code »

Add new function btrfs_add_delayed_qgroup_reserve() function to record
how much space is reserved for that extent.

As btrfs only accounts qgroup at run_delayed_refs() time, so newly
allocated extent should keep the reserved space until then.

So add needed function with related members to do it.

Signed-off-by: Qu Wenruo
Signed-off-by: Chris Mason

Qu Wenruo
2015-10-22 09:37:46 +0800

25 Jun, 2015

1 commit

5a5003df9 btrfs: delayed-ref: double free in btrfs_add_delayed_tree_ref() ... Browse Code »

There is a cut and paste error so instead of freeing "head_ref", we free
"ref" twice.

Fixes: 3368d001ba5d ('btrfs: qgroup: Record possible quota-related extent for qgroup.')
Signed-off-by: Dan Carpenter
Signed-off-by: Chris Mason

Dan Carpenter
2015-06-25 03:28:03 +0800

11 Jun, 2015

3 commits

3368d001b btrfs: qgroup: Record possible quota-related extent for qgroup. ... Browse Code »

Add hook in add_delayed_ref_head() to record quota-related extent record
into delayed_ref_root->dirty_extent_record rb-tree for later qgroup
accounting.

Signed-off-by: Qu Wenruo
Signed-off-by: Chris Mason

Qu Wenruo
2015-06-11 00:25:32 +0800
c43d160fc btrfs: delayed-ref: Cleanup the unneeded functions. ... Browse Code »

Cleanup the rb_tree merge/insert/update functions, since now we use list
instead of rb_tree now.

Signed-off-by: Qu Wenruo
Signed-off-by: Chris Mason

Qu Wenruo
2015-06-11 00:25:09 +0800
c6fc24549 btrfs: delayed-ref: Use list to replace the ref_root in ref_head. ... Browse Code »

This patch replace the rbtree used in ref_head to list.
This has the following advantage:
1) Easier merge logic.
With the new list implement, we only need to care merging the tail
ref_node with the new ref_node.
And this can be done quite easy at insert time, no need to do a
indicated merge at run_delayed_refs().

Signed-off-by: Qu Wenruo
Signed-off-by: Chris Mason

Qu Wenruo
2015-06-11 00:25:03 +0800

11 Apr, 2015

1 commit

1262133b8 Btrfs: account for crcs in delayed ref processing ... Browse Code »

As we delete large extents, we end up doing huge amounts of COW in order
to delete the corresponding crcs. This adds accounting so that we keep
track of that space and flushing of delayed refs so that we don't build
up too much delayed crc work.

This helps limit the delayed work that must be done at commit time and
tries to avoid ENOSPC aborts because the crcs eat all the global
reserves.

Signed-off-by: Chris Mason

Josef Bacik
2015-04-11 05:04:47 +0800

10 Jun, 2014

1 commit

fcebe4562 Btrfs: rework qgroup accounting ... Browse Code »

Currently qgroups account for space by intercepting delayed ref updates to fs
trees. It does this by adding sequence numbers to delayed ref updates so that
it can figure out how the tree looked before the update so we can adjust the
counters properly. The problem with this is that it does not allow delayed refs
to be merged, so if you say are defragging an extent with 5k snapshots pointing
to it we will thrash the delayed ref lock because we need to go back and
manually merge these things together. Instead we want to process quota changes
when we know they are going to happen, like when we first allocate an extent, we
free a reference for an extent, we add new references etc. This patch
accomplishes this by only adding qgroup operations for real ref changes. We
only modify the sequence number when we need to lookup roots for bytenrs, this
reduces the amount of churn on the sequence number and allows us to merge
delayed refs as we add them most of the time. This patch encompasses a bunch of
architectural changes

1) qgroup ref operations: instead of tracking qgroup operations through the
delayed refs we simply add new ref operations whenever we notice that we need to
when we've modified the refs themselves.

2) tree mod seq: we no longer have this separation of major/minor counters.
this makes the sequence number stuff much more sane and we can remove some
locking that was needed to protect the counter.

3) delayed ref seq: we now read the tree mod seq number and use that as our
sequence. This means each new delayed ref doesn't have it's own unique sequence
number, rather whenever we go to lookup backrefs we inc the sequence number so
we can make sure to keep any new operations from screwing up our world view at
that given point. This allows us to merge delayed refs during runtime.

With all of these changes the delayed ref stuff is a little saner and the qgroup
accounting stuff no longer goes negative in some cases like it was before.
Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2014-06-10 08:20:48 +0800

21 Mar, 2014

1 commit

21543badd Btrfs: fix race when updating existing ref head ... Browse Code »

While we update an existing ref head's extent_op, we're not holding
its spinlock, so while we're updating its extent_op contents (key,
flags) we can have a task running __btrfs_run_delayed_refs() that
holds the ref head's lock and sets its extent_op to NULL right after
the task updating the ref head just checked its extent_op was not NULL.

Signed-off-by: Filipe David Borba Manana
Signed-off-by: Chris Mason

Filipe Manana
2014-03-21 08:15:28 +0800

11 Mar, 2014

2 commits

85fdfdf61 Btrfs: cleanup delayed-ref.c:find_ref_head() ... Browse Code »

The argument last wasn't used, all callers supplied a NULL value
for it. Also removed unnecessary intermediate storage of the result
of key comparisons.

Signed-off-by: Filipe David Borba Manana
Signed-off-by: Josef Bacik

Filipe Manana
2014-03-11 03:16:46 +0800
6103fb43f Btrfs: remove unnecessary ref heads rb tree search ... Browse Code »

When we didn't find the exact ref head we were looking for, if
return_bigger != 0 we set a new search key to match either the
next node after the last one we found or the first one in the
ref heads rb tree, and then did another full tree search. For both
cases this ended up being pointless as we would end up returning
an entry we already had before repeating the search.

Signed-off-by: Filipe David Borba Manana
Signed-off-by: Josef Bacik

Filipe Manana
2014-03-11 03:16:46 +0800

29 Jan, 2014

3 commits

d7df2c796 Btrfs: attach delayed ref updates to delayed ref heads ... Browse Code »

Currently we have two rb-trees, one for delayed ref heads and one for all of the
delayed refs, including the delayed ref heads. When we process the delayed refs
we have to hold onto the delayed ref lock for all of the selecting and merging
and such, which results in quite a bit of lock contention. This was solved by
having a waitqueue and only one flusher at a time, however this hurts if we get
a lot of delayed refs queued up.

So instead just have an rb tree for the delayed ref heads, and then attach the
delayed ref updates to an rb tree that is per delayed ref head. Then we only
need to take the delayed ref lock when adding new delayed refs and when
selecting a delayed ref head to process, all the rest of the time we deal with a
per delayed ref head lock which will be much less contentious.

The locking rules for this get a little more complicated since we have to lock
up to 3 things to properly process delayed refs, but I will address that problem
later. For now this passes all of xfstests and my overnight stress tests.
Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2014-01-29 05:20:25 +0800
9e5ac13ac Btrfs: skip merge part for delayed data refs ... Browse Code »

When we have data deduplication on, we'll hang on the merge part
because it needs to verify every queued delayed data refs related to
this disk offset but we may have millions refs.

And in the case of delayed data refs, we don't usually have too much
data refs to merge.

So it's safe to shut it down for data refs.

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Liu Bo
2014-01-29 05:19:23 +0800
c46effa60 Btrfs: introduce a head ref rbtree ... Browse Code »

The way how we process delayed refs is
1) get a bunch of head refs,
2) pick up one head ref,
3) go one node back for any delayed ref updates.

The head ref is also linked in the same rbtree as the delayed ref is,
so in 1) stage, we have to walk one by one including not only head refs, but
delayed refs.

When we have a great number of delayed refs pending to process,
this'll cost time a lot.

Here we introduce a head ref specific rbtree, it only has head refs, so troubles
go away.

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Liu Bo
2014-01-29 05:19:22 +0800

01 Sep, 2013

2 commits

35a3621be Btrfs: get rid of sparse warnings ... Browse Code »

make C=2 fs/btrfs/ CF=-D__CHECK_ENDIAN__

I tried to filter out the warnings for which patches have already
been sent to the mailing list, pending for inclusion in btrfs-next.

All these changes should be obviously safe.

Signed-off-by: Stefan Behrens
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Stefan Behrens
2013-09-01 20:15:50 +0800
599c75ec3 Btrfs/tracepoint: update delayed ref tracepoints ... Browse Code »

This shows exactly how btrfs processes the delayed refs onto disks,
which is very helpful on understanding delayed ref mechanism and
debugging related bugs.

Signed-off-by: Liu Bo
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Liu Bo
2013-09-01 19:57:39 +0800

07 May, 2013

2 commits

fc36ed7e0 Btrfs: separate sequence numbers for delayed ref tracking and tree mod log ... Browse Code »

Sequence numbers for delayed refs have been introduced in the first version
of the qgroup patch set. To solve the problem of find_all_roots on a busy
file system, the tree mod log was introduced. The sequence numbers for that
were simply shared between those two users.

However, at one point in qgroup's quota accounting, there's a statement
accessing the previous sequence number, that's still just doing (seq - 1)
just as it would have to in the very first version.

To satisfy that requirement, this patch makes the sequence number counter 64
bit and splits it into a major part (used for qgroup sequence number
counting) and a minor part (incremented for each tree modification in the
log). This enables us to go exactly one major step backwards, as required
for qgroups, while still incrementing the sequence counter for tree mod log
insertions to keep track of their order. Keeping them in a single variable
means there's no need to change all the code dealing with comparisons of two
sequence numbers.

The sequence number is reset to 0 on commit (not new in this patch), which
ensures we won't overflow the two 32 bit counters.

Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs
from the tree mod log code may happen.

Signed-off-by: Jan Schmidt
Signed-off-by: Josef Bacik

Jan Schmidt
2013-05-07 03:55:17 +0800
41b0fc428 Btrfs: compare relevant parts of delayed tree refs ... Browse Code »

A user reported a panic while running a balance. What was happening was he was
relocating a block, which added the reference to the relocation tree. Then
relocation would walk through the relocation tree and drop that reference and
free that block, and then it would walk down a snapshot which referenced the
same block and add another ref to the block. The problem is this was all
happening in the same transaction, so the parent block was free'ed up when we
drop our reference which was immediately available for allocation, and then it
was used _again_ to add a reference for the same block from a different
snapshot. This resulted in something like this in the delayed ref tree

add ref to 90234880, parent=2067398656, ref_root 1766, level 1
del ref to 90234880, parent=2067398656, ref_root 18446744073709551608, level 1
add ref to 90234880, parent=2067398656, ref_root 1767, level 1

as you can see the ref_root's don't match, because when we inc the ref we use
the header owner, which is the original tree the block belonged to, instead of
the data reloc tree. Then when we remove the extent we use the reloc tree
objectid. But none of this matters, since it is a shared reference which means
only the parent matters. When the delayed ref stuff runs it adds all the
increments first, and then does all the drops, to make sure that we don't delete
the ref if we net a positive ref count. But tree blocks aren't allowed to have
multiple refs from the same block, so this panics when it tries to add the
second ref. We need the add and the drop to cancel each other out in memory so
we only do the final add.

So to fix this we need to adjust how the delayed refs are added to the tree.
Only the ref_root matters when it is a normal backref, and only the parent
matters when it is a shared backref. So make our decision based on what ref
type we have. This allows us to keep the ref_root in memory in case anybody
wants to use it for something else, and it allows the delayed refs to be merged
properly so we don't end up with this panic.

With this patch the users image no longer panics on mount, and it has a clean
fsck after a normal mount/umount cycle. Thanks,

Cc: stable@vger.kernel.org
Reported-by: Roman Mamedov
Signed-off-by: Josef Bacik

Josef Bacik
2013-05-07 03:54:29 +0800

20 Feb, 2013

2 commits

093486c45 Btrfs: make delayed ref lock logic more readable ... Browse Code »

Locking and unlocking delayed ref mutex are in the different functions,
and the name of lock functions is not uniform, so the readability is not
so good, this patch optimizes the lock logic and makes it more readable.

Signed-off-by: Miao Xie
Signed-off-by: Josef Bacik

Miao Xie
2013-02-20 22:36:41 +0800
78a6184a3 Btrfs: use slabs for delayed reference allocation ... Browse Code »

The delayed reference allocation is in the fast path of the IO, so use slabs
to improve the speed of the allocation.

And besides that, it can do check for leaked objects when the module is removed.

Signed-off-by: Miao Xie

Miao Xie
2013-02-20 22:36:34 +0800

29 Aug, 2012

2 commits

ae1e206b8 Btrfs: allow delayed refs to be merged ... Browse Code »

Daniel Blueman reported a bug with fio+balance on a ramdisk setup.
Basically what happens is the balance relocates a tree block which will drop
the implicit refs for all of its children and adds a full backref. Once the
block is relocated we have to add the implicit refs back, so when we cow the
block again we add the implicit refs for its children back. The problem
comes when the original drop ref doesn't get run before we add the implicit
refs back. The delayed ref stuff will specifically prefer ADD operations
over DROP to keep us from freeing up an extent that will have references to
it, so we try to add the implicit ref before it is actually removed and we
panic. This worked fine before because the add would have just canceled the
drop out and we would have been fine. But the backref walking work needs to
be able to freeze the delayed ref stuff in time so we have this ever
increasing sequence number that gets attached to all new delayed ref updates
which makes us not merge refs and we run into this issue.

So to fix this we need to merge delayed refs. So everytime we run a
clustered ref we need to try and merge all of its delayed refs. The backref
walking stuff locks the delayed ref head before processing, so if we have it
locked we are safe to merge any refs inside of the sequence number. If
there is no sequence number we can merge all refs. Doing this not only
fixes our bug but keeps the delayed ref code from adding and removing
useless refs and batching together multiple refs into one search instead of
one search per delayed ref, which will really help our commit times. I ran
this with Daniels test and 276 and I haven't seen any problems. Thanks,

Reported-by: Daniel J Blueman
Signed-off-by: Josef Bacik

Josef Bacik
2012-08-29 04:53:38 +0800
1fa11e265 Btrfs: fix deadlock in wait_for_more_refs ... Browse Code »

Commit a168650c introduced a waiting mechanism to prevent busy waiting in
btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations,
where a tree_mod_seq is held while waiting for the io to complete, while
the end_io calls btrfs_run_delayed_refs.
This whole mechanism is unnecessary. If not enough runnable refs are
available to satisfy count, just return as count is more like a guideline
than a strict requirement.
In case we have to run all refs, commit transaction makes sure that no
other threads are working in the transaction anymore, so we just assert
here that no refs are blocked.

Signed-off-by: Arne Jansen
Signed-off-by: Chris Mason

Arne Jansen
2012-08-29 04:53:32 +0800

12 Jul, 2012

1 commit

546adb0d8 Btrfs: hooks for qgroup to record delayed refs ... Browse Code »

Hooks into qgroup code to record refs and into transaction commit.
This is the main entry point for qgroup. Basically every change in
extent backrefs got accounted to the appropriate qgroups.

Signed-off-by: Arne Jansen
Signed-off-by: Jan Schmidt

Jan Schmidt
2012-07-12 16:54:38 +0800

10 Jul, 2012

1 commit

097b8a7c9 Btrfs: join tree mod log code with the code holding back delayed refs ... Browse Code »

We've got two mechanisms both required for reliable backref resolving (tree
mod log and holding back delayed refs). You cannot make use of one without
the other. So instead of requiring the user of this mechanism to setup both
correctly, we join them into a single interface.

Additionally, we stop inserting non-blockers into fs_info->tree_mod_seq_list
as we did before, which was of no value.

Signed-off-by: Jan Schmidt

Jan Schmidt
2012-07-10 21:14:41 +0800

31 May, 2012

1 commit

95a06077f Btrfs: use delayed ref sequence numbers for all fs-tree updates ... Browse Code »

The sequence number for delayed refs is needed to postpone certain delayed
refs for a very short period while walking backrefs. Before the tree
modification log, we thought we'd only have to hold back those references
that don't have a counter operation.

While now we've the tree mod log, we're rewinding fs tree blocks to a
defined consistent state. We cannot know in advance for which tree block
we'll be doing rewind operations later. Therefore, we must postpone all the
delayed refs for fs-tree blocks, even those having a counter operation.

Signed-off-by: Jan Schmidt

Jan Schmidt
2012-05-31 00:18:21 +0800

22 Mar, 2012

2 commits

143bede52 btrfs: return void in functions without error conditions ... Browse Code »

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-03-22 08:45:34 +0800
cddcd8001 btrfs: Fix kfree of member instead of structure ... Browse Code »

Correctness fix: The kfree calls in the add_delayed_* functions free
the node that's passed into it, but the node is a member of another
structure. It works because it's always the first member of the
containing structure, but it should really be using the containing
structure itself.

Signed-off-by: Jeff Mahoney

Jeff Mahoney
2012-03-22 08:45:30 +0800

04 Jan, 2012

2 commits

a168650c0 Btrfs: add waitqueue instead of doing busy waiting for more delayed refs ... Browse Code »
43

Now that we may be holding back delayed refs for a limited period, we
might end up having no runnable delayed refs. Without this commit, we'd
do busy waiting in that thread until another (runnable) ref arives.
Instead, we're detecting this situation and use a waitqueue, such that
we only try to run more refs after
a) another runnable ref was added or
b) delayed refs are no longer held back

Signed-off-by: Jan Schmidt

Jan Schmidt
2012-01-04 23:12:48 +0800
d1270cd91 Btrfs: put back delayed refs that are too new ... Browse Code »
43

When processing a delayed ref, first check if there are still old refs in
the process of being added. If so, put this ref back to the tree. To avoid
looping on this ref, choose a newer one in the next loop.
btrfs_find_ref_cluster has to take care of that.

Signed-off-by: Arne Jansen
Signed-off-by: Jan Schmidt

Arne Jansen
2012-01-04 23:12:45 +0800