Eric Lee / smarc-fsl-linux-kernel

27 Jul, 2020

6 commits

49e5fb462 btrfs: qgroup: export qgroups in sysfs ... Browse Code »

This patch will add the following sysfs interface:

/sys/fs/btrfs//qgroups//referenced
/sys/fs/btrfs//qgroups//exclusive
/sys/fs/btrfs//qgroups//max_referenced
/sys/fs/btrfs//qgroups//max_exclusive
/sys/fs/btrfs//qgroups//limit_flags

Which is also available in output of "btrfs qgroup show".

/sys/fs/btrfs//qgroups//rsv_data
/sys/fs/btrfs//qgroups//rsv_meta_pertrans
/sys/fs/btrfs//qgroups//rsv_meta_prealloc

The last 3 rsv related members are not visible to users, but can be very
useful to debug qgroup limit related bugs.

Also, to avoid '/' used in , the separator between qgroup
level and qgroup id is changed to '_'.

The interface is not hidden behind 'debug' as we want this interface to
be included into production build and to provide another way to read the
qgroup information besides the ioctls.

Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Qu Wenruo
2020-07-27 18:55:37 +0800
cfdd45921 btrfs: make btrfs_qgroup_check_reserved_leak take btrfs_inode ... Browse Code »

vfs_inode is used only for the inode number everything else requires
btrfs_inode.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
[ use btrfs_ino ]
Signed-off-by: David Sterba

Nikolay Borisov
2020-07-27 18:55:37 +0800
7661a3e03 btrfs: make btrfs_qgroup_reserve_data take btrfs_inode ... Browse Code »

There's only a single use of vfs_inode in a tracepoint so let's take
btrfs_inode directly.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2020-07-27 18:55:35 +0800
72b7d15bf btrfs: make btrfs_qgroup_release_data take btrfs_inode ... Browse Code »

It just forwards its argument to __btrfs_qgroup_release_data.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2020-07-27 18:55:32 +0800
8b8a979f1 btrfs: make btrfs_qgroup_free_data take btrfs_inode ... Browse Code »

It passes btrfs_inode to its callee so change the interface.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2020-07-27 18:55:30 +0800
5958253cf btrfs: qgroup: catch reserved space leaks at unmount time ... Browse Code »

Before this patch, qgroup completely relies on per-inode extent io tree
to detect reserved data space leak.

However previous bug has already shown how release page before
btrfs_finish_ordered_io() could lead to leak, and since it's
QGROUP_RESERVED bit cleared without triggering qgroup rsv, it can't be
detected by per-inode extent io tree.

So this patch adds another (and hopefully the final) safety net to catch
qgroup data reserved space leak. At least the new safety net catches
all the leaks during development, so it should be pretty useful in the
real world.

Reviewed-by: Josef Bacik
Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Qu Wenruo
2020-07-27 18:55:24 +0800

19 Feb, 2020

1 commit

81f7eb00f btrfs: destroy qgroup extent records on transaction abort ... Browse Code »

We clean up the delayed references when we abort a transaction but we
leave the pending qgroup extent records behind, leaking memory.

This patch destroys the extent records when we destroy the delayed refs
and makes sure ensure they're gone before releasing the transaction.

Fixes: 3368d001ba5d ("btrfs: qgroup: Record possible quota-related extent for qgroup.")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik
Signed-off-by: Jeff Mahoney
[ Rebased to latest upstream, remove to_qgroup() helper, use
rbtree_postorder_for_each_entry_safe() wrapper ]
Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Jeff Mahoney
2020-02-19 07:35:54 +0800

19 Nov, 2019

1 commit

32da5386d btrfs: rename btrfs_block_group_cache ... Browse Code »

The type name is misleading, a single entry is named 'cache' while this
normally means a collection of objects. Rename that everywhere. Also the
identifier was quite long, making function prototypes harder to format.

Suggested-by: Nikolay Borisov
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

David Sterba
2019-11-19 00:51:51 +0800

25 Feb, 2019

4 commits

1418bae1c btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrf… ... Browse Code »

…s_qgroup_extent_record

[BUG]
Btrfs/139 will fail with a high probability if the testing machine (VM)
has only 2G RAM.

Resulting the final write success while it should fail due to EDQUOT,
and the fs will have quota exceeding the limit by 16K.

The simplified reproducer will be: (needs a 2G ram VM)

$ mkfs.btrfs -f $dev
$ mount $dev $mnt

$ btrfs subv create $mnt/subv
$ btrfs quota enable $mnt
$ btrfs quota rescan -w $mnt
$ btrfs qgroup limit -e 1G $mnt/subv

$ for i in $(seq -w 1 8); do
xfs_io -f -c "pwrite 0 128M" $mnt/subv/file_$i > /dev/null
echo "file $i written" > /dev/kmsg
done
$ sync
$ btrfs qgroup show -pcre --raw $mnt

The last pwrite will not trigger EDQUOT and final 'qgroup show' will
show something like:

qgroupid rfer excl max_rfer max_excl parent child
-------- ---- ---- -------- -------- ------ -----
0/5 16384 16384 none none --- ---
0/256 1073758208 1073758208 none 1073741824 --- ---

And 1073758208 is larger than
> 1073741824.

[CAUSE]
It's a bug in btrfs qgroup data reserved space management.

For quota limit, we must ensure that:
reserved (data + metadata) + rfer/excl <= limit

Since rfer/excl is only updated at transaction commmit time, reserved
space needs to be taken special care.

One important part of reserved space is data, and for a new data extent
written to disk, we still need to take the reserved space until
rfer/excl numbers get updated.

Originally when an ordered extent finishes, we migrate the reserved
qgroup data space from extent_io tree to delayed ref head of the data
extent, expecting delayed ref will only be cleaned up at commit
transaction time.

However for small RAM machine, due to memory pressure dirty pages can be
flushed back to disk without committing a transaction.

The related events will be something like:

file 1 written
btrfs_finish_ordered_io: ino=258 ordered offset=0 len=54947840
btrfs_finish_ordered_io: ino=258 ordered offset=54947840 len=5636096
btrfs_finish_ordered_io: ino=258 ordered offset=61153280 len=57344
btrfs_finish_ordered_io: ino=258 ordered offset=61210624 len=8192
btrfs_finish_ordered_io: ino=258 ordered offset=60583936 len=569344
cleanup_ref_head: num_bytes=54947840
cleanup_ref_head: num_bytes=5636096
cleanup_ref_head: num_bytes=569344
cleanup_ref_head: num_bytes=57344
cleanup_ref_head: num_bytes=8192
^^^^^^^^^^^^^^^^ This will free qgroup data reserved space
file 2 written
...
file 8 written
cleanup_ref_head: num_bytes=8192
...
btrfs_commit_transaction <<< the only transaction committed during
the test

When file 2 is written, we have already freed 128M reserved qgroup data
space for ino 258. Thus later write won't trigger EDQUOT.

This allows us to write more data beyond qgroup limit.

In my 2G ram VM, it could reach about 1.2G before hitting EDQUOT.

[FIX]
By moving reserved qgroup data space from btrfs_delayed_ref_head to
btrfs_qgroup_extent_record, we can ensure that reserved qgroup data
space won't be freed half way before commit transaction, thus fix the
problem.

Fixes: f64d5ca86821 ("btrfs: delayed_ref: Add new function to record reserved space into delayed ref")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

Qu Wenruo
2019-02-25 21:13:39 +0800
9627736b7 btrfs: qgroup: Cleanup old subtree swap code ... Browse Code »

Since it's replaced by new delayed subtree swap code, remove the
original code.

The cleanup is small since most of its core function is still used by
delayed subtree swap trace.

Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2019-02-25 21:13:26 +0800
f616f5cd9 btrfs: qgroup: Use delayed subtree rescan for balance ... Browse Code »

Before this patch, qgroup code traces the whole subtree of subvolume and
reloc trees unconditionally.

This makes qgroup numbers consistent, but it could cause tons of
unnecessary extent tracing, which causes a lot of overhead.

However for subtree swap of balance, just swap both subtrees because
they contain the same contents and tree structure, so qgroup numbers
won't change.

It's the race window between subtree swap and transaction commit could
cause qgroup number change.

This patch will delay the qgroup subtree scan until COW happens for the
subtree root.

So if there is no other operations for the fs, balance won't cause extra
qgroup overhead. (best case scenario)
Depending on the workload, most of the subtree scan can still be
avoided.

Only for worst case scenario, it will fall back to old subtree swap
overhead. (scan all swapped subtrees)

[[Benchmark]]
Hardware:
VM 4G vRAM, 8 vCPUs,
disk is using 'unsafe' cache mode,
backing device is SAMSUNG 850 evo SSD.
Host has 16G ram.

Mkfs parameter:
--nodesize 4K (To bump up tree size)

Initial subvolume contents:
4G data copied from /usr and /lib.
(With enough regular small files)

Snapshots:
16 snapshots of the original subvolume.
each snapshot has 3 random files modified.

balance parameter:
-m

So the content should be pretty similar to a real world root fs layout.

And after file system population, there is no other activity, so it
should be the best case scenario.

| v4.20-rc1 | w/ patchset | diff
-----------------------------------------------------------------------
relocated extents | 22615 | 22457 | -0.1%
qgroup dirty extents | 163457 | 121606 | -25.6%
time (sys) | 22.884s | 18.842s | -17.6%
time (real) | 27.724s | 22.884s | -17.5%

Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2019-02-25 21:13:26 +0800
370a11b81 btrfs: qgroup: Introduce per-root swapped blocks infrastructure ... Browse Code »

To allow delayed subtree swap rescan, btrfs needs to record per-root
information about which tree blocks get swapped. This patch introduces
the required infrastructure.

The designed workflow will be:

1) Record the subtree root block that gets swapped.

During subtree swap:
O = Old tree blocks
N = New tree blocks
reloc tree subvolume tree X
Root Root
/ \ / \
NA OB OA OB
/ | | \ / | | \
NC ND OE OF OC OD OE OF

In this case, NA and OA are going to be swapped, record (NA, OA) into
subvolume tree X.

2) After subtree swap.
reloc tree subvolume tree X
Root Root
/ \ / \
OA OB NA OB
/ | | \ / | | \
OC OD OE OF NC ND OE OF

3a) COW happens for OB
If we are going to COW tree block OB, we check OB's bytenr against
tree X's swapped_blocks structure.
If it doesn't fit any, nothing will happen.

3b) COW happens for NA
Check NA's bytenr against tree X's swapped_blocks, and get a hit.
Then we do subtree scan on both subtrees OA and NA.
Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND).

Then no matter what we do to subvolume tree X, qgroup numbers will
still be correct.
Then NA's record gets removed from X's swapped_blocks.

4) Transaction commit
Any record in X's swapped_blocks gets removed, since there is no
modification to swapped subtrees, no need to trigger heavy qgroup
subtree rescan for them.

This will introduce 128 bytes overhead for each btrfs_root even qgroup
is not enabled. This is to reduce memory allocations and potential
failures.

Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2019-02-25 21:13:26 +0800

17 Dec, 2018

2 commits

52042d8e8 btrfs: Fix typos in comments and strings ... Browse Code »

The typos accumulate over time so once in a while time they get fixed in
a large patch.

Signed-off-by: Andrea Gelmini
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Andrea Gelmini
2018-12-17 21:51:50 +0800
bbe339cc3 btrfs: drop extra enum initialization where using defaults ... Browse Code »

The first auto-assigned value to enum is 0, we can use that and not
initialize all members where the auto-increment does the same. This is
used for values that are not part of on-disk format.

Reviewed-by: Omar Sandoval
Reviewed-by: Qu Wenruo
Reviewed-by: Johannes Thumshirn
Signed-off-by: David Sterba

David Sterba
2018-12-17 21:51:43 +0800

15 Oct, 2018

3 commits

3628b4ca6 btrfs: qgroup: Avoid calling qgroup functions if qgroup is not enabled ... Browse Code »

Some qgroup trace events like btrfs_qgroup_release_data() and
btrfs_qgroup_free_delayed_ref() can still be triggered even if qgroup is
not enabled.

This is caused by the lack of qgroup status check before calling some
qgroup functions. Thankfully the functions can handle quota disabled
case well and just do nothing for qgroup disabled case.

This patch will do earlier check before triggering related trace events.

And for enabled disabled race case:

1) For enabled->disabled case
Disable will wipe out all qgroups data including reservation and
excl/rfer. Even if we leak some reservation or numbers, it will
still be cleared, so nothing will go wrong.

2) For disabled -> enabled case
Current btrfs_qgroup_release_data() will use extent_io tree to ensure
we won't underflow reservation. And for delayed_ref we use
head->qgroup_reserved to record the reserved space, so in that case
head->qgroup_reserved should be 0 and we won't underflow.

CC: stable@vger.kernel.org # 4.14+
Reported-by: Chris Murphy
Link: https://lore.kernel.org/linux-btrfs/CAJCQCtQau7DtuUUeycCkZ36qjbKuxNzsgqJ7+sJ6W0dK_NLE3w@mail.gmail.com/
Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Qu Wenruo
2018-10-15 23:23:40 +0800
3d0174f78 btrfs: qgroup: Only trace data extents in leaves if we're relocating data block group ... Browse Code »

For qgroup_trace_extent_swap(), if we find one leaf that needs to be
traced, we will also iterate all file extents and trace them.

This is OK if we're relocating data block groups, but if we're
relocating metadata block groups, balance code itself has ensured that
both subtree of file tree and reloc tree contain the same contents.

That's to say, if we're relocating metadata block groups, all file
extents in reloc and file tree should match, thus no need to trace them.
This should reduce the total number of dirty extents processed in metadata
block group balance.

[[Benchmark]] (with all previous enhancement)
Hardware:
VM 4G vRAM, 8 vCPUs,
disk is using 'unsafe' cache mode,
backing device is SAMSUNG 850 evo SSD.
Host has 16G ram.

Mkfs parameter:
--nodesize 4K (To bump up tree size)

Initial subvolume contents:
4G data copied from /usr and /lib.
(With enough regular small files)

Snapshots:
16 snapshots of the original subvolume.
each snapshot has 3 random files modified.

balance parameter:
-m

So the content should be pretty similar to a real world root fs layout.

| v4.19-rc1 | w/ patchset | diff (*)
---------------------------------------------------------------
relocated extents | 22929 | 22851 | -0.3%
qgroup dirty extents | 227757 | 140886 | -38.1%
time (sys) | 65.253s | 37.464s | -42.6%
time (real) | 74.032s | 44.722s | -39.6%

Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2018-10-15 23:23:36 +0800
5f527822b btrfs: qgroup: Use generation-aware subtree swap to mark dirty extents ... Browse Code »

Before this patch, with quota enabled during balance, we need to mark
the whole subtree dirty for quota.

E.g.
OO = Old tree blocks (from file tree)
NN = New tree blocks (from reloc tree)

File tree (src) Reloc tree (dst)
OO (a) NN (a)
/ \ / \
(b) OO OO (c) (b) NN NN (c)
/ \ / \ / \ / \
OO OO OO OO (d) OO OO OO NN (d)

For old balance + quota case, quota will mark the whole src and dst tree
dirty, including all the 3 old tree blocks in reloc tree.

It's doable for small file tree or new tree blocks are all located at
lower level.

But for large file tree or new tree blocks are all located at higher
level, this will lead to mark the whole tree dirty, and be unbelievably
slow.

This patch will change how we handle such balance with quota enabled
case.

Now we will search from (b) and (c) for any new tree blocks whose
generation is equal to @last_snapshot, and only mark them dirty.

In above case, we only need to trace tree blocks NN(b), NN(c) and NN(d).
(NN(a) will be traced when COW happens for nodeptr modification). And
also for tree blocks OO(b), OO(c), OO(d). (OO(a) will be traced when COW
happens for nodeptr modification.)

For above case, we could skip 3 tree blocks, but for larger tree, we can
skip tons of unmodified tree blocks, and hugely speed up balance.

This patch will introduce a new function,
btrfs_qgroup_trace_subtree_swap(), which will do the following main
work:

1) Read out real root eb
And setup basic dst_path for later calls
2) Call qgroup_trace_new_subtree_blocks()
To trace all new tree blocks in reloc tree and their counter
parts in the file tree.

Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2018-10-15 23:23:36 +0800

06 Aug, 2018

12 commits

a93774225 btrfs: qgroup: Drop fs_info parameter from btrfs_qgroup_inherit ... Browse Code »

It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi
Signed-off-by: David Sterba

Lu Fengqi
2018-08-06 19:12:53 +0800
280f8bd2c btrfs: qgroup: Drop fs_info parameter from btrfs_run_qgroups ... Browse Code »

It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi
Signed-off-by: David Sterba

Lu Fengqi
2018-08-06 19:12:52 +0800
8696d7604 btrfs: qgroup: Drop fs_info parameter from btrfs_qgroup_account_extent ... Browse Code »

It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi
Signed-off-by: David Sterba

Lu Fengqi
2018-08-06 19:12:52 +0800
deb406274 btrfs: qgroup: Drop root parameter from btrfs_qgroup_trace_subtree ... Browse Code »

The fs_info can be fetched from the transaction handle directly.

Signed-off-by: Lu Fengqi
Signed-off-by: David Sterba

Lu Fengqi
2018-08-06 19:12:52 +0800
8d38d7eb7 btrfs: qgroup: Drop fs_info parameter from btrfs_qgroup_trace_leaf_items ... Browse Code »

It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi
Signed-off-by: David Sterba

Lu Fengqi
2018-08-06 19:12:52 +0800
a95f3aafd btrfs: qgroup: Drop fs_info parameter from btrfs_qgroup_trace_extent ... Browse Code »

It can be fetched from the transaction handle. In addition, remove the
WARN_ON(trans == NULL) because it's not possible to hit this condition.

Signed-off-by: Lu Fengqi
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Lu Fengqi
2018-08-06 19:12:52 +0800
f0042d5e9 btrfs: qgroup: Drop fs_info parameter from btrfs_limit_qgroup ... Browse Code »

It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi
Signed-off-by: David Sterba

Lu Fengqi
2018-08-06 19:12:51 +0800
3efbee1d0 btrfs: qgroup: Drop fs_info parameter from btrfs_remove_qgroup ... Browse Code »

It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi
Signed-off-by: David Sterba

Lu Fengqi
2018-08-06 19:12:51 +0800
49a05ecde btrfs: qgroup: Drop fs_info parameter from btrfs_create_qgroup ... Browse Code »

It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi
Signed-off-by: David Sterba

Lu Fengqi
2018-08-06 19:12:51 +0800
39616c273 btrfs: qgroup: Drop fs_info parameter from btrfs_del_qgroup_relation ... Browse Code »

It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi
Signed-off-by: David Sterba

Lu Fengqi
2018-08-06 19:12:51 +0800
9f8a6ce6b btrfs: qgroup: Drop fs_info parameter from btrfs_add_qgroup_relation ... Browse Code »

It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi
Signed-off-by: David Sterba

Lu Fengqi
2018-08-06 19:12:51 +0800
340f1aa27 btrfs: qgroups: Move transaction management inside btrfs_quota_enable/disable ... Browse Code »

Commit 5d23515be669 ("btrfs: Move qgroup rescan on quota enable to
btrfs_quota_enable") not only resulted in an easier to follow code but
it also introduced a subtle bug. It changed the timing when the initial
transaction rescan was happening:

- before the commit: it would happen after transaction commit had occured
- after the commit: it might happen before the transaction was committed

This results in failure to correctly rescan the quota since there could
be data which is still not committed on disk.

This patch aims to fix this by moving the transaction creation/commit
inside btrfs_quota_enable, which allows to schedule the quota commit
after the transaction has been committed.

Fixes: 5d23515be669 ("btrfs: Move qgroup rescan on quota enable to btrfs_quota_enable")
Reported-by: Misono Tomohiro
Link: https://marc.info/?l=linux-btrfs&m=152999289017582
Signed-off-by: Nikolay Borisov
Signed-off-by: David Sterba

Nikolay Borisov
2018-08-06 19:12:45 +0800

12 Apr, 2018

1 commit

9888c3402 btrfs: replace GPL boilerplate by SPDX -- headers ... Browse Code »

Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Add the
SPDX header.

Unify the include protection macros to match the file names.

Signed-off-by: David Sterba

David Sterba
2018-04-12 22:29:46 +0800

31 Mar, 2018

5 commits

64cfaef63 btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS ... Browse Code »

For meta_prealloc reservation users, after btrfs_join_transaction()
caller will modify tree so part (or even all) meta_prealloc reservation
should be converted to meta_pertrans until transaction commit time.

This patch introduces a new function,
btrfs_qgroup_convert_reserved_meta() to do this for META_PREALLOC
reservation user.

Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2018-03-31 07:41:14 +0800
733e03a0b btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans ... Browse Code »

Btrfs uses 2 different methods to reseve metadata qgroup space.

1) Reserve at btrfs_start_transaction() time
This is quite straightforward, caller will use the trans handler
allocated to modify b-trees.

In this case, reserved metadata should be kept until qgroup numbers
are updated.

2) Reserve by using block_rsv first, and later btrfs_join_transaction()
This is more complicated, caller will reserve space using block_rsv
first, and then later call btrfs_join_transaction() to get a trans
handle.

In this case, before we modify trees, the reserved space can be
modified on demand, and after btrfs_join_transaction(), such reserved
space should also be kept until qgroup numbers are updated.

Since these two types behave differently, split the original "META"
reservation type into 2 sub-types:

META_PERTRANS:
For above case 1)

META_PREALLOC:
For reservations that happened before btrfs_join_transaction() of
case 2)

NOTE: This patch will only convert existing qgroup meta reservation
callers according to its situation, not ensuring all callers are at
correct timing.
Such fix will be added in later patches.

Signed-off-by: Qu Wenruo
[ update comments ]
Signed-off-by: David Sterba

Qu Wenruo
2018-03-31 07:41:14 +0800
5c40507ff btrfs: qgroup: Cleanup the remaining old reservation counters ... Browse Code »

So qgroup is switched to new separate types reservation system.

Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2018-03-31 07:41:13 +0800
d4e5c9205 btrfs: qgroup: Skeleton to support separate qgroup reservation type ... Browse Code »

Instead of single qgroup->reserved, use a new structure btrfs_qgroup_rsv
to store different types of reservation.

This patch only updates the header needed to compile.

Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2018-03-31 07:41:13 +0800
460fb20a4 btrfs: Drop fs_info parameter from btrfs_qgroup_account_extents ... Browse Code »

It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov
Signed-off-by: David Sterba

Nikolay Borisov
2018-03-31 07:41:10 +0800

30 Jun, 2017

3 commits

bc42bda22 btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges ... Browse Code »

[BUG]
For the following case, btrfs can underflow qgroup reserved space
at an error path:
(Page size 4K, function name without "btrfs_" prefix)

Task A | Task B
----------------------------------------------------------------------
Buffered_write [0, 2K) |
|- check_data_free_space() |
| |- qgroup_reserve_data() |
| Range aligned to page |
| range [0, 4K) <<< |
| 4K bytes reserved <<< |
|- copy pages to page cache |
| Buffered_write [2K, 4K)
| |- check_data_free_space()
| | |- qgroup_reserved_data()
| | Range alinged to page
| | range [0, 4K)
| | Already reserved by A <<<
| | 0 bytes reserved <<<
| |- delalloc_reserve_metadata()
| | And it *FAILED* (Maybe EQUOTA)
| |- free_reserved_data_space()
|- qgroup_free_data()
Range aligned to page range
[0, 4K)
Freeing 4K
(Special thanks to Chandan for the detailed report and analyse)

[CAUSE]
Above Task B is freeing reserved data range [0, 4K) which is actually
reserved by Task A.

And at writeback time, page dirty by Task A will go through writeback
routine, which will free 4K reserved data space at file extent insert
time, causing the qgroup underflow.

[FIX]
For btrfs_qgroup_free_data(), add @reserved parameter to only free
data ranges reserved by previous btrfs_qgroup_reserve_data().
So in above case, Task B will try to free 0 byte, so no underflow.

Reported-by: Chandan Rajendra
Signed-off-by: Qu Wenruo
Reviewed-by: Chandan Rajendra
Tested-by: Chandan Rajendra
Signed-off-by: David Sterba

Qu Wenruo
2017-06-30 02:17:02 +0800
364ecf365 btrfs: qgroup: Introduce extent changeset for qgroup reserve functions ... Browse Code »

Introduce a new parameter, struct extent_changeset for
btrfs_qgroup_reserved_data() and its callers.

Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
which range it reserved in current reserve, so it can free it in error
paths.

The reason we need to export it to callers is, at buffered write error
path, without knowing what exactly which range we reserved in current
allocation, we can free space which is not reserved by us.

This will lead to qgroup reserved space underflow.

Reviewed-by: Chandan Rajendra
Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2017-06-30 02:17:02 +0800
d1b8b94a2 btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function ... Browse Code »

Quite a lot of qgroup corruption happens due to wrong time of calling
btrfs_qgroup_prepare_account_extents().

Since the safest time is to call it just before
btrfs_qgroup_account_extents(), there is no need to separate these 2
functions.

Merging them will make code cleaner and less bug prone.

Signed-off-by: Qu Wenruo
[ changelog and comment adjustments ]
Signed-off-by: David Sterba

Qu Wenruo
2017-06-30 02:17:02 +0800

18 Apr, 2017

2 commits

d51ea5dd2 btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint ... Browse Code »

Newly introduced qgroup reserved space trace points are normally nested
into several common qgroup operations.

While some other trace points are not well placed to co-operate with
them, causing confusing output.

This patch re-arrange trace_btrfs_qgroup_release_data() and
trace_btrfs_qgroup_free_delayed_ref() trace points so they are triggered
before reserved space ones.

Signed-off-by: Qu Wenruo
Reviewed-by: Jeff Mahoney
Signed-off-by: David Sterba

Qu Wenruo
2017-04-18 20:07:26 +0800
3159fe7ba btrfs: qgroup: Add trace point for qgroup reserved space ... Browse Code »

Introduce the following trace points:
qgroup_update_reserve
qgroup_meta_reserve

These trace points are handy to trace qgroup reserve space related
problems.

Also export btrfs_qgroup structure, as now we directly pass btrfs_qgroup
structure to trace points, so that structure needs to be exported.

Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Qu Wenruo
2017-04-18 20:07:26 +0800