Eric Lee / smarc-fsl-linux-kernel

14 Nov, 2018

2 commits

08374d8bc btrfs: qgroup: Dirty all qgroups before rescan ... Browse Code »

commit 9c7b0c2e8dbfbcd80a71e2cbfe02704f26c185c6 upstream.

[BUG]
In the following case, rescan won't zero out the number of qgroup 1/0:

$ mkfs.btrfs -fq $DEV
$ mount $DEV /mnt

$ btrfs quota enable /mnt
$ btrfs qgroup create 1/0 /mnt
$ btrfs sub create /mnt/sub
$ btrfs qgroup assign 0/257 1/0 /mnt

$ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
$ btrfs sub snap /mnt/sub /mnt/snap
$ btrfs quota rescan -w /mnt
$ btrfs qgroup show -pcre /mnt
qgroupid rfer excl max_rfer max_excl parent child
-------- ---- ---- -------- -------- ------ -----
0/5 16.00KiB 16.00KiB none none --- ---
0/257 1016.00KiB 16.00KiB none none 1/0 ---
0/258 1016.00KiB 16.00KiB none none --- ---
1/0 1016.00KiB 16.00KiB none none --- 0/257

So far so good, but:

$ btrfs qgroup remove 0/257 1/0 /mnt
WARNING: quotas may be inconsistent, rescan needed
$ btrfs quota rescan -w /mnt
$ btrfs qgroup show -pcre /mnt
qgoupid rfer excl max_rfer max_excl parent child
-------- ---- ---- -------- -------- ------ -----
0/5 16.00KiB 16.00KiB none none --- ---
0/257 1016.00KiB 16.00KiB none none --- ---
0/258 1016.00KiB 16.00KiB none none --- ---
1/0 1016.00KiB 16.00KiB none none --- ---
^^^^^^^^^^ ^^^^^^^^ not cleared

[CAUSE]
Before rescan we call qgroup_rescan_zero_tracking() to zero out all
qgroups' accounting numbers.

However we don't mark all qgroups dirty, but rely on rescan to do so.

If we have any high level qgroup without children, it won't be marked
dirty during rescan, since we cannot reach that qgroup.

This will cause QGROUP_INFO items of childless qgroups never get updated
in the quota tree, thus their numbers will stay the same in "btrfs
qgroup show" output.

[FIX]
Just mark all qgroups dirty in qgroup_rescan_zero_tracking(), so even if
we have childless qgroups, their QGROUP_INFO items will still get
updated during rescan.

Reported-by: Misono Tomohiro
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo
Reviewed-by: Misono Tomohiro
Tested-by: Misono Tomohiro
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman

Qu Wenruo
2018-11-14 03:15:14 +0800
2f7e525a2 btrfs: qgroup: Avoid calling qgroup functions if qgroup is not enabled ... Browse Code »

commit 3628b4ca64f24a4ec55055597d0cb1c814729f8b upstream.

Some qgroup trace events like btrfs_qgroup_release_data() and
btrfs_qgroup_free_delayed_ref() can still be triggered even if qgroup is
not enabled.

This is caused by the lack of qgroup status check before calling some
qgroup functions. Thankfully the functions can handle quota disabled
case well and just do nothing for qgroup disabled case.

This patch will do earlier check before triggering related trace events.

And for enabled disabled race case:

1) For enabled->disabled case
Disable will wipe out all qgroups data including reservation and
excl/rfer. Even if we leak some reservation or numbers, it will
still be cleared, so nothing will go wrong.

2) For disabled -> enabled case
Current btrfs_qgroup_release_data() will use extent_io tree to ensure
we won't underflow reservation. And for delayed_ref we use
head->qgroup_reserved to record the reserved space, so in that case
head->qgroup_reserved should be 0 and we won't underflow.

CC: stable@vger.kernel.org # 4.14+
Reported-by: Chris Murphy
Link: https://lore.kernel.org/linux-btrfs/CAJCQCtQau7DtuUUeycCkZ36qjbKuxNzsgqJ7+sJ6W0dK_NLE3w@mail.gmail.com/
Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman

Qu Wenruo
2018-11-14 03:15:13 +0800

04 Nov, 2018

1 commit

f72388e36 btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf ... Browse Code »

[ Upstream commit 6f7de19ed3d4d3526ca5eca428009f97cf969c2f ]

Commit ff3d27a048d9 ("btrfs: qgroup: Finish rescan when hit the last leaf
of extent tree") added a new exit for rescan finish.

However after finishing quota rescan, we set
fs_info->qgroup_rescan_progress to (u64)-1 before we exit through the
original exit path.
While we missed that assignment of (u64)-1 in the new exit path.

The end result is, the quota status item doesn't have the same value.
(-1 vs the last bytenr + 1)
Although it doesn't affect quota accounting, it's still better to keep
the original behavior.

Reported-by: Misono Tomohiro
Fixes: ff3d27a048d9 ("btrfs: qgroup: Finish rescan when hit the last leaf of extent tree")
Signed-off-by: Qu Wenruo
Reviewed-by: Misono Tomohiro
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin

Qu Wenruo
2018-11-04 21:52:45 +0800

03 Aug, 2018

1 commit

2737a4ade btrfs: qgroup: Finish rescan when hit the last leaf of extent tree ... Browse Code »

[ Upstream commit ff3d27a048d926b3920ccdb75d98788c567cae0d ]

Under the following case, qgroup rescan can double account cowed tree
blocks:

In this case, extent tree only has one tree block.

-
| transid=5 last committed=4
| btrfs_qgroup_rescan_worker()
| |- btrfs_start_transaction()
| | transid = 5
| |- qgroup_rescan_leaf()
| |- btrfs_search_slot_for_read() on extent tree
| Get the only extent tree block from commit root (transid = 4).
| Scan it, set qgroup_rescan_progress to the last
| EXTENT/META_ITEM + 1
| now qgroup_rescan_progress = A + 1.
|
| fs tree get CoWed, new tree block is at A + 16K
| transid 5 get committed
-
| transid=6 last committed=5
| btrfs_qgroup_rescan_worker()
| btrfs_qgroup_rescan_worker()
| |- btrfs_start_transaction()
| | transid = 5
| |- qgroup_rescan_leaf()
| |- btrfs_search_slot_for_read() on extent tree
| Get the only extent tree block from commit root (transid = 5).
| scan it using qgroup_rescan_progress (A + 1).
| found new tree block beyong A, and it's fs tree block,
| account it to increase qgroup numbers.
-

In above case, tree block A, and tree block A + 16K get accounted twice,
while qgroup rescan should stop when it already reach the last leaf,
other than continue using its qgroup_rescan_progress.

Such case could happen by just looping btrfs/017 and with some
possibility it can hit such double qgroup accounting problem.

Fix it by checking the path to determine if we should finish qgroup
rescan, other than relying on next loop to exit.

Reported-by: Nikolay Borisov
Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Qu Wenruo
2018-08-03 13:50:28 +0800

26 Sep, 2017

2 commits

36b96fdc6 btrfs: Report error on removing qgroup if del_qgroup_item fails ... Browse Code »

Previously, we were calling del_qgroup_item, and ignoring the return code
resulting in a potential to have divergent in-memory state without an
error. Perhaps, it makes sense to handle this error code, and put the
filesystem into a read only, or similar state.

This patch only adds reporting of the error if the error is fatal,
(any error other than qgroup not found).

Signed-off-by: Sargun Dhillon
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

Sargun Dhillon
2017-09-26 20:54:01 +0800
c2faff790 btrfs: remove BTRFS_FS_QUOTA_DISABLING flag ... Browse Code »

Currently, "btrfs quota enable" would fail after "btrfs quota disable" on
the first time with syslog output "qgroup_rescan_init failed with -22", but
it would succeed on the second time.

When "quota disable" is called, BTRFS_FS_QUOTA_DISABLING flag bit will be
set in fs_info->flags in btrfs_quota_disable(), but it will not be droppd
in btrfs_run_qgroups() (which is called in btrfs_commit_transaction())
because quota_root has already been freed. If "quota enable" is called
after that, both BTRFS_FS_QUOTA_DISABLING and BTRFS_FS_QUOTA_ENABLED flag
would be dropped in the btrfs_run_qgroups() since quota_root is not NULL.
This leads to the failure of "quota enable" on the first time.

BTRFS_FS_QUOTA_DISABLING flag is not used outside of "quota disable"
context and is equivalent to whether quota_root is NULL or not.
btrfs_run_qgroups() checks whether quota_root is NULL or not in the first
place.

So, let's remove BTRFS_FS_QUOTA_DISABLING flag.

Signed-off-by: Tomohiro Misono
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Misono, Tomohiro
2017-09-26 20:52:57 +0800

21 Aug, 2017

1 commit

1cd5447eb btrfs: pass fs_info to btrfs_del_root instead of tree_root ... Browse Code »

btrfs_del_roots always uses the tree_root. Let's pass fs_info instead.

Signed-off-by: Jeff Mahoney
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Jeff Mahoney
2017-08-21 23:49:54 +0800

16 Aug, 2017

2 commits

913e15357 btrfs: drop newlines from strings when using btrfs_* helpers ... Browse Code »

The helpers append "\n" so we can keep the actual strings shorter. The
extra newline will print an empty line. Some messages have been
slightly modified to be more consistent with the rest (lowercase first
letter).

Reviewed-by: Nikolay Borisov
Signed-off-by: David Sterba

David Sterba
2017-08-16 22:12:02 +0800
b6e6bca51 btrfs: qgroups: Fix BUG_ON condition in tree level check ... Browse Code »

The current code was erroneously checking for
root_level > BTRFS_MAX_LEVEL. If we had a root_level of 8 then the check
won't trigger and we could potentially hit a buffer overflow. The
correct check should be root_level >= BTRFS_MAX_LEVEL .

Signed-off-by: Nikolay Borisov
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

Nikolay Borisov
2017-08-16 22:12:02 +0800

30 Jun, 2017

6 commits

6374e57ad btrfs: fix integer overflow in calc_reclaim_items_nr ... Browse Code »

Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with
v4.12-rc6. This was because commit 70e7af244 made it possible for
calc_reclaim_items_nr() to return a negative number. It's not really a
bug in that commit, it just didn't go far enough down the stack to find
all the possible 64->32 bit overflows.

This switches calc_reclaim_items_nr() to return a u64 and changes everyone
that uses the results of that math to u64 as well.

Reported-by: Dave Jones
Fixes: 70e7af2 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
Signed-off-by: Chris Mason
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Chris Mason
2017-06-30 02:17:02 +0800
bc42bda22 btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges ... Browse Code »

[BUG]
For the following case, btrfs can underflow qgroup reserved space
at an error path:
(Page size 4K, function name without "btrfs_" prefix)

Task A | Task B
----------------------------------------------------------------------
Buffered_write [0, 2K) |
|- check_data_free_space() |
| |- qgroup_reserve_data() |
| Range aligned to page |
| range [0, 4K) <<< |
| 4K bytes reserved <<< |
|- copy pages to page cache |
| Buffered_write [2K, 4K)
| |- check_data_free_space()
| | |- qgroup_reserved_data()
| | Range alinged to page
| | range [0, 4K)
| | Already reserved by A <<<
| | 0 bytes reserved <<<
| |- delalloc_reserve_metadata()
| | And it *FAILED* (Maybe EQUOTA)
| |- free_reserved_data_space()
|- qgroup_free_data()
Range aligned to page range
[0, 4K)
Freeing 4K
(Special thanks to Chandan for the detailed report and analyse)

[CAUSE]
Above Task B is freeing reserved data range [0, 4K) which is actually
reserved by Task A.

And at writeback time, page dirty by Task A will go through writeback
routine, which will free 4K reserved data space at file extent insert
time, causing the qgroup underflow.

[FIX]
For btrfs_qgroup_free_data(), add @reserved parameter to only free
data ranges reserved by previous btrfs_qgroup_reserve_data().
So in above case, Task B will try to free 0 byte, so no underflow.

Reported-by: Chandan Rajendra
Signed-off-by: Qu Wenruo
Reviewed-by: Chandan Rajendra
Tested-by: Chandan Rajendra
Signed-off-by: David Sterba

Qu Wenruo
2017-06-30 02:17:02 +0800
364ecf365 btrfs: qgroup: Introduce extent changeset for qgroup reserve functions ... Browse Code »

Introduce a new parameter, struct extent_changeset for
btrfs_qgroup_reserved_data() and its callers.

Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
which range it reserved in current reserve, so it can free it in error
paths.

The reason we need to export it to callers is, at buffered write error
path, without knowing what exactly which range we reserved in current
allocation, we can free space which is not reserved by us.

This will lead to qgroup reserved space underflow.

Reviewed-by: Chandan Rajendra
Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2017-06-30 02:17:02 +0800
7bc329c18 btrfs: qgroup: Return actually freed bytes for qgroup release or free data ... Browse Code »

btrfs_qgroup_release/free_data() only returns 0 or a negative error
number (ENOMEM is the only possible error).

This is normally good enough, but sometimes we need the exact byte
count it freed/released.

Change it to return actually released/freed bytenr number instead of 0
for success.
And slightly modify related extent_changeset structure, since in btrfs
one no-hole data extent won't be larger than 128M, so "unsigned int"
is large enough for the use case.

Signed-off-by: Qu Wenruo
Signed-off-by: David Sterba

Qu Wenruo
2017-06-30 02:17:02 +0800
d1b8b94a2 btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function ... Browse Code »

Quite a lot of qgroup corruption happens due to wrong time of calling
btrfs_qgroup_prepare_account_extents().

Since the safest time is to call it just before
btrfs_qgroup_account_extents(), there is no need to separate these 2
functions.

Merging them will make code cleaner and less bug prone.

Signed-off-by: Qu Wenruo
[ changelog and comment adjustments ]
Signed-off-by: David Sterba

Qu Wenruo
2017-06-30 02:17:02 +0800
5edfd9fdc btrfs: qgroup: Add quick exit for non-fs extents ... Browse Code »

Modify btrfs_qgroup_account_extent() to exit quicker for non-fs extents.

The quick exit condition is:
1) The extent belongs to a non-fs tree
Only fs-tree extents can affect qgroup numbers and is the only case
where extent can be shared between different trees.

Although strictly speaking extent in data-reloc or tree-reloc tree
can be shared, data/tree-reloc root won't appear in the result of
btrfs_find_all_roots(), so we can ignore such case.

So we can check the first root in old_roots/new_roots ulist.
- if we find the 1st root is a not a fs/subvol root, then we can skip
the extent
- if we find the 1st root is a fs/subvol root, then we must continue
calculation

OR

2) both 'nr_old_roots' and 'nr_new_roots' are 0
This means either such extent got allocated then freed in current
transaction or it's a new reloc tree extent, whose nr_new_roots is 0.
Either way it won't affect qgroup accounting and can be skipped
safely.

Such quick exit can make trace output more quite and less confusing:
(example with fs uuid and time stamp removed)

Before:
------
add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
btrfs_qgroup_account_extent: bytenr=29556736 num_bytes=16384 nr_old_roots=0 nr_new_roots=1
------
Extent tree block will trigger btrfs_qgroup_account_extent() trace point
while no qgroup number is changed, as extent tree won't affect qgroup
accounting.

After:
------
add_delayed_tree_ref: bytenr=29556736 num_bytes=16384 action=ADD_DELAYED_REF parent=0(-) ref_root=2(EXTENT_TREE) level=0 type=TREE_BLOCK_REF seq=0
------
Now such unrelated extent won't trigger btrfs_qgroup_account_extent()
trace point, making the trace less noisy.

Signed-off-by: Qu Wenruo
[ changelog and comment adjustments ]
Signed-off-by: David Sterba

Qu Wenruo
2017-06-30 02:17:02 +0800

21 Jun, 2017

1 commit

cddf3b2cb btrfs: add cond_resched to btrfs_qgroup_trace_leaf_items ... Browse Code »

On an uncontended system, we can end up hitting soft lockups while
doing replace_path. At the core, and frequently called is
btrfs_qgroup_trace_leaf_items, so it makes sense to add a cond_resched
there.

Signed-off-by: Jeff Mahoney
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Jeff Mahoney
2017-06-21 21:48:01 +0800

20 Jun, 2017

1 commit

f29efe292 btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE ... Browse Code »

This patch introduces the quota override flag to btrfs_fs_info, and a
change to quota limit checking code to temporarily allow for quota to be
overridden for processes with CAP_SYS_RESOURCE.

It's useful for administrative programs, such as log rotation, that may
need to temporarily use more disk space in order to free up a greater
amount of overall disk space without yielding more disk space to the
rest of userland.

Eventually, we may want to add the idea of an operator-specific quota,
operator reserved space, or something else to allow for administrative
override, but this is perhaps the simplest solution.

Signed-off-by: Sargun Dhillon
Reviewed-by: David Sterba
[ minor changelog edits ]
Signed-off-by: David Sterba

Sargun Dhillon
2017-06-20 00:25:58 +0800

10 May, 2017

1 commit

1176032cb Merge branch 'for-linus-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs updates from Chris Mason:
"This has fixes and cleanups Dave Sterba collected for the merge
window.

The biggest functional fixes are between btrfs raid5/6 and scrub, and
raid5/6 and device replacement. Some of our pending qgroup fixes are
included as well while I bash on the rest in testing.

We also have the usual set of cleanups, including one that makes
__btrfs_map_block() much more maintainable, and conversions from
atomic_t to refcount_t"

* 'for-linus-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (71 commits)
btrfs: fix the gfp_mask for the reada_zones radix tree
Btrfs: fix reported number of inode blocks
Btrfs: send, fix file hole not being preserved due to inline extent
Btrfs: fix extent map leak during fallocate error path
Btrfs: fix incorrect space accounting after failure to insert inline extent
Btrfs: fix invalid attempt to free reserved space on failure to cow range
btrfs: Handle delalloc error correctly to avoid ordered extent hang
btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error
btrfs: check if the device is flush capable
btrfs: delete unused member nobarriers
btrfs: scrub: Fix RAID56 recovery race condition
btrfs: scrub: Introduce full stripe lock for RAID56
btrfs: Use ktime_get_real_ts for root ctime
Btrfs: handle only applicable errors returned by btrfs_get_extent
btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option
btrfs: use q which is already obtained from bdev_get_queue
Btrfs: switch to div64_u64 if with a u64 divisor
Btrfs: update scrub_parity to use u64 stripe_len
Btrfs: enable repair during read for raid56 profile
btrfs: use clear_page where appropriate
...

Linus Torvalds
2017-05-10 23:33:17 +0800

19 Apr, 2017

1 commit

338bd52f3 btrfs: qgroup: move noisy underflow warning to debugging build ... Browse Code »

The WARN_ON and warning from report_reserved_underflow can become very
noisy and is visible unconditionally although this is namely for
debugging. The patch "btrfs: Add WARN_ON for qgroup reserved underflow"
(18dc22c19bef520cca11ce4c0807ac9dec48d31f) went to 4.11-rc1 and the plan
was to get the fix as well, but this hasn't happened.

CC: Qu Wenruo
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

David Sterba
2017-04-19 18:40:49 +0800

18 Apr, 2017

5 commits

d51ea5dd2 btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint ... Browse Code »

Newly introduced qgroup reserved space trace points are normally nested
into several common qgroup operations.

While some other trace points are not well placed to co-operate with
them, causing confusing output.

This patch re-arrange trace_btrfs_qgroup_release_data() and
trace_btrfs_qgroup_free_delayed_ref() trace points so they are triggered
before reserved space ones.

Signed-off-by: Qu Wenruo
Reviewed-by: Jeff Mahoney
Signed-off-by: David Sterba

Qu Wenruo
2017-04-18 20:07:26 +0800
3159fe7ba btrfs: qgroup: Add trace point for qgroup reserved space ... Browse Code »

Introduce the following trace points:
qgroup_update_reserve
qgroup_meta_reserve

These trace points are handy to trace qgroup reserve space related
problems.

Also export btrfs_qgroup structure, as now we directly pass btrfs_qgroup
structure to trace points, so that structure needs to be exported.

Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Qu Wenruo
2017-04-18 20:07:26 +0800
48a89bc4f btrfs: qgroups: Retry after commit on getting EDQUOT ... Browse Code »

We are facing the same problem with EDQUOT which was experienced with
ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but
here is a quick fix, which may be too big a hammer.

Quotas are reserved during the start of an operation, incrementing
qg->reserved. However, it is written to disk in a commit_transaction
which could take as long as commit_interval. In the meantime there
could be deletions which are not accounted for because deletions are
accounted for only while committed (free_refroot). So, when we get
a EDQUOT flush the data to disk and try again.

This fixes fstests btrfs/139.

Here is a sample script which shows this issue.

DEVICE=/dev/vdb
MOUNTPOINT=/mnt
TESTVOL=$MOUNTPOINT/tmp
QUOTA=5
PROG=btrfs
DD_BS="4k"
DD_COUNT="256"
RUN_TIMES=5000

mkfs.btrfs -f $DEVICE
mount -o commit=240 $DEVICE $MOUNTPOINT
$PROG subvolume create $TESTVOL
$PROG quota enable $TESTVOL
$PROG qgroup limit ${QUOTA}G $TESTVOL

typeset -i DD_RUN_GOOD
typeset -i QUOTA

function _check_cmd() {
if [[ ${?} > 0 ]]; then
echo -n "$(date) E: Running previous command"
echo ${*}
echo "Without sync"
$PROG qgroup show -pcreFf ${TESTVOL}
echo "With sync"
$PROG qgroup show -pcreFf --sync ${TESTVOL}
exit 1
fi
}

while true; do
DD_RUN_GOOD=$RUN_TIMES

while (( ${DD_RUN_GOOD} != 0 )); do
dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT}
_check_cmd "dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT}"
DD_RUN_GOOD=(${DD_RUN_GOOD}-1)
done

$PROG qgroup show -pcref $TESTVOL
echo "----------- Cleanup ---------- "
rm $TESTVOL/quotatest*

done

Signed-off-by: Goldwyn Rodrigues
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

Goldwyn Rodrigues
2017-04-18 20:07:25 +0800
de47c9d3f btrfs: replace hardcoded value with SEQ_LAST macro ... Browse Code »

Define the SEQ_LAST macro to replace (u64)-1 in places where said
value triggers a special-case ref search behavior.

Signed-off-by: Edmund Nadolski
Reviewed-by: Jeff Mahoney
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

Edmund Nadolski
2017-04-18 20:07:25 +0800
f486135eb btrfs: remove unused qgroup members from btrfs_trans_handle ... Browse Code »

The members have been effectively unused since "Btrfs: rework qgroup
accounting" (fcebe4562dec83b3), there's no substitute for
assert_qgroups_uptodate so it's removed as well.

Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

David Sterba
2017-04-18 20:07:25 +0800

29 Mar, 2017

1 commit

ce0dcee62 btrfs: Change qgroup_meta_rsv to 64bit ... Browse Code »

Using an int value is causing qg->reserved to become negative and
exclusive -EDQUOT to be reached prematurely.

This affects exclusive qgroups only.

TEST CASE:

DEVICE=/dev/vdb
MOUNTPOINT=/mnt
SUBVOL=$MOUNTPOINT/tmp

umount $SUBVOL
umount $MOUNTPOINT

mkfs.btrfs -f $DEVICE
mount /dev/vdb $MOUNTPOINT
btrfs quota enable $MOUNTPOINT
btrfs subvol create $SUBVOL
umount $MOUNTPOINT
mount /dev/vdb $MOUNTPOINT
mount -o subvol=tmp $DEVICE $SUBVOL
btrfs qgroup limit -e 3G $SUBVOL

btrfs quota rescan /mnt -w

for i in `seq 1 44000`; do
dd if=/dev/zero of=/mnt/tmp/test_$i bs=10k count=1
if [[ $? > 0 ]]; then
btrfs qgroup show -pcref $SUBVOL
exit 1
fi
done

Signed-off-by: Goldwyn Rodrigues
[ add reproducer to changelog ]
Signed-off-by: David Sterba

Goldwyn Rodrigues
2017-03-29 20:29:08 +0800

17 Feb, 2017

12 commits

fb235dc06 btrfs: qgroup: Move half of the qgroup accounting time out of commit trans ... Browse Code »

Just as Filipe pointed out, the most time consuming parts of qgroup are
btrfs_qgroup_account_extents() and
btrfs_qgroup_prepare_account_extents().
Which both call btrfs_find_all_roots() to get old_roots and new_roots
ulist.

What makes things worse is, we're calling that expensive
btrfs_find_all_roots() at transaction committing time with
TRANS_STATE_COMMIT_DOING, which will blocks all incoming transaction.

Such behavior is necessary for @new_roots search as current
btrfs_find_all_roots() can't do it correctly so we do call it just
before switch commit roots.

However for @old_roots search, it's not necessary as such search is
based on commit_root, so it will always be correct and we can move it
out of transaction committing.

This patch moves the @old_roots search part out of
commit_transaction(), so in theory we can half the time qgroup time
consumption at commit_transaction().

But please note that, this won't speedup qgroup overall, the total time
consumption is still the same, just reduce the performance stall.

Cc: Filipe Manana
Signed-off-by: Qu Wenruo
Reviewed-by: Filipe Manana
Signed-off-by: David Sterba

Qu Wenruo
2017-02-17 19:03:55 +0800
15b34517a btrfs: remove unused parameter from adjust_slots_upwards ... Browse Code »

Never used.

Reviewed-by: Liu Bo
Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:55 +0800
7c302b49d btrfs: remove unused parameter from clean_tree_block ... Browse Code »

Added but never needed.

Reviewed-by: Liu Bo
Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:51 +0800
6655bc3de btrfs: ulist: rename ulist_fini to ulist_release ... Browse Code »

Change the name so it matches the naming we already use eg. for
btrfs_path.

Suggested-by: Qu Wenruo
Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:50 +0800
4ae8553c2 btrfs: remove pointless rcu protection from btrfs_qgroup_inherit ... Browse Code »

There was never need for RCU protection around reading nodesize or other
fairly constant filesystem data.

Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:50 +0800
0b08e1f4f btrfs: qgroups: opencode qgroup_free helper ... Browse Code »

The helper name is not too helpful and is just wrapping a simple call.

Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:50 +0800
81353d50f btrfs: check quota status earlier and don't do unnecessary frees ... Browse Code »

Status of quotas should be the first check in
btrfs_qgroup_account_extent and we can return immediatelly, no need to
do no-op ulist frees.

Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:50 +0800
53d323599 btrfs: embed extent_changeset::range_changed to the structure ... Browse Code »

We can embed range_changed to the extent changeset to address following
problems:

- no need to allocate ulist dynamically, we also get rid of the GFP_NOFS
for free
- fix lack of allocation failure checking in btrfs_qgroup_reserve_data

The stack consuption where extent_changeset is used slightly increases:

before: 16
after: 16 - 8 (for pointer) + 32 (sizeof ulist) = 40

Which is bearable.

Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:49 +0800
025db916a btrfs: qgroups: make __del_qgroup_relation static ... Browse Code »

Internal helper.

Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:49 +0800
6602caf14 btrfs: use GFP_KERNEL in btrfs_add/del_qgroup_relation ... Browse Code »

Qgroup relations are added/deleted from ioctl, we hold the high level
qgroup lock, no deadlocks or recursion from the allocation possible
here.

Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:49 +0800
52bf8e7ae btrfs: use GFP_KERNEL in btrfs_quota_enable ... Browse Code »

We don't need to use GFP_NOFS here as this is called from ioctls an the
only lock held is the subvol_sem, which is of a high level and protects
creation/renames/deletion and is never held in the writeout paths.

Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:49 +0800
323b88f4a btrfs: use GFP_KERNEL in btrfs_read_qgroup_config ... Browse Code »

The qgroup config is read during mount, we do not have to use NOFS.

Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:49 +0800

14 Feb, 2017

2 commits

003d7c59e btrfs: allow unlink to exceed subvolume quota ... Browse Code »

Once a qgroup limit is exceeded, it's impossible to restore normal
operation to the subvolume without modifying the limit or removing
the subvolume. This is a surprising situation for many users used
to the typical workflow with quotas on other file systems where it's
possible to remove files until the used space is back under the limit.

When we go to unlink a file and start the transaction, we'll hit
the qgroup limit while trying to reserve space for the items we'll
modify while removing the file. We discussed last month how best
to handle this situation and agreed that there is no perfect solution.
The best principle-of-least-surprise solution is to handle it similarly
to how we already handle ENOSPC when unlinking, which is to allow
the operation to succeed with the expectation that it will ultimately
release space under most circumstances.

This patch modifies the transaction start path to select whether to
honor the qgroups limits. btrfs_start_transaction_fallback_global_rsv
is the only caller that skips enforcement. The reservation and tracking
still happens normally -- it just skips the enforcement step.

Signed-off-by: Jeff Mahoney
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

Jeff Mahoney
2017-02-14 22:50:59 +0800
18dc22c19 btrfs: Add WARN_ON for qgroup reserved underflow ... Browse Code »

Goldwyn Rodrigues has exposed and fixed a bug which underflows btrfs
qgroup reserved space, and leads to non-writable fs.

This reminds us that we don't have enough underflow check for qgroup
reserved space.

For underflow case, we should not really underflow the numbers but warn
and keeps qgroup still work.

So add more check on qgroup reserved space and add WARN_ON() and
btrfs_warn() for any underflow case.

Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Reviewed-by: Goldwyn Rodrigues
Signed-off-by: David Sterba

Qu Wenruo
2017-02-14 22:50:49 +0800

14 Dec, 2016

1 commit

5f52a2c51 Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/f… ... Browse Code »

…dmanana/linux into for-linus-4.10

Patches queued up by Filipe:

The most important change is still the fix for the extent tree
corruption that happens due to balance when qgroups are enabled (a
regression introduced in 4.7 by a fix for a regression from the last
qgroups rework). This has been hitting SLE and openSUSE users and QA
very badly, where transactions keep getting aborted when running
delayed references leaving the root filesystem in RO mode and nearly
unusable. There are fixes here that allow us to run xfstests again
with the integrity checker enabled, which has been impossible since 4.8
(apparently I'm the only one running xfstests with the integrity
checker enabled, which is useful to validate dirtied leafs, like
checking if there are keys out of order, etc). The rest are just some
trivial fixes, most of them tagged for stable, and two cleanups.

Signed-off-by: Chris Mason <clm@fb.com>

Chris Mason
2016-12-14 01:14:42 +0800