06 Dec, 2018
1 commit
-
We want to release the unused reservation we have since it refills the
delayed refs reserve, which will make everything go smoother when
running the delayed refs if we're short on our reservation.CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Omar Sandoval
Reviewed-by: Liu Bo
Reviewed-by: Nikolay Borisov
Signed-off-by: Josef Bacik
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
14 Nov, 2018
1 commit
-
commit 30928e9baac238a7330085a1c5747f0b5df444b4 upstream.
This could result in a really bad case where we do something like
evict
evict_refill_and_join
btrfs_commit_transaction
btrfs_run_delayed_iputs
evict
evict_refill_and_join
btrfs_commit_transaction
... foreverWe have plenty of other places where we run delayed iputs that are much
safer, let those do the work.CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana
Signed-off-by: Josef Bacik
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman
30 May, 2018
1 commit
-
…created with quota enabled
[ Upstream commit 4d31778aa2fa342f5f92ca4025b293a1729161d1 ]
When multiple pending snapshots referring to the same source subvolume
are executed, enabled quota will cause root item corruption, where root
items are using old bytenr (no backref in extent tree).This can be triggered by fstests btrfs/152.
The cause is when source subvolume is still dirty, extra commit
(simplied transaction commit) of qgroup_account_snapshot() can skip
dirty roots not recorded in current transaction, making root item of
source subvolume not updated.Fix it by forcing recording source subvolume in current transaction
before qgroup sub-transaction commit.Reported-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
19 Mar, 2018
1 commit
-
This reverts commit 3c181c12c431fe33b669410d663beb9cceefcd1b as it
causes breakage on big endian systems with btrfs images.Reported-by: Christoph Biedl
Cc: Anand Jain
Cc: Liu Bo
Cc: David Sterba
Signed-off-by: Greg Kroah-Hartman
09 Mar, 2018
1 commit
-
commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
The fs_info::super_copy is a byte copy of the on-disk structure and all
members must use the accessor macros/functions to obtain the right
value. This was missing in update_super_roots and in sysfs readers.Moving between opposite endianness hosts will report bogus numbers in
sysfs, and mount may fail as the root will not be restored correctly. If
the filesystem is always used on a same endian host, this will not be a
problem.Fix this by using the btrfs_set_super...() functions to set
fs_info::super_copy values, and for the sysfs, use the cached
fs_info::nodesize/sectorsize values.CC: stable@vger.kernel.org
Fixes: df93589a17378 ("btrfs: export more from FS_INFO to sysfs")
Signed-off-by: Anand Jain
Reviewed-by: Liu Bo
Reviewed-by: David Sterba
[ update changelog ]
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman
30 Jun, 2017
2 commits
-
Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with
v4.12-rc6. This was because commit 70e7af244 made it possible for
calc_reclaim_items_nr() to return a negative number. It's not really a
bug in that commit, it just didn't go far enough down the stack to find
all the possible 64->32 bit overflows.This switches calc_reclaim_items_nr() to return a u64 and changes everyone
that uses the results of that math to u64 as well.Reported-by: Dave Jones
Fixes: 70e7af2 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
Signed-off-by: Chris Mason
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
Quite a lot of qgroup corruption happens due to wrong time of calling
btrfs_qgroup_prepare_account_extents().Since the safest time is to call it just before
btrfs_qgroup_account_extents(), there is no need to separate these 2
functions.Merging them will make code cleaner and less bug prone.
Signed-off-by: Qu Wenruo
[ changelog and comment adjustments ]
Signed-off-by: David Sterba
20 Jun, 2017
3 commits
-
We can keep the state among the other fs_info flags, there's no reason
why fs_frozen would need to be separate.Reviewed-by: Nikolay Borisov
Signed-off-by: David Sterba -
Observing the number of slab objects of btrfs_transaction, there's just
one active on an almost quiescent filesystem, and the number of objects
goes to about ten when sync is in progress. Then the nubmer goes down to
1. This matches the expectations of the transaction lifetime.For such use the separate slab cache is not justified, as we do not
reuse objects frequently. For the shortlived transaction, the generic
slab (size 512) should be ok. We can optimistically expect that the 512
slabs are not all used (fragmentation) and there are free slots to take
when we do the allocation, compared to potentially allocating a whole new
page for the separate slab.We'll lose the stats about the object use, which could be added later if
we really need them.Signed-off-by: David Sterba
-
For extent_io tree's we have carried the address_mapping of the inode
around in the io tree in order to pull the inode back out for calling
into various tree ops hooks. This works fine when everything that has
an extent_io_tree has an inode. But we are going to remove the
btree_inode, so we need to change this. Instead just have a generic
void * for private data that we can initialize with, and have all the
tree ops use that instead. This had a lot of cascading changes but
should be relatively straightforward.Signed-off-by: Josef Bacik
Reviewed-by: Chandan Rajendra
Reviewed-by: David Sterba
[ minor reordering of the callback prototypes ]
Signed-off-by: David Sterba
18 Apr, 2017
3 commits
-
[BUG]
The easist way to reproduce the bug is:
------
# mkfs.btrfs -f $dev -n 16K
# mount $dev $mnt -o inode_cache
# btrfs quota enable $mnt
# btrfs quota rescan -w $mnt
# btrfs qgroup show $mnt
qgroupid rfer excl
-------- ---- ----
0/5 32.00KiB 32.00KiB
^^ Twice the correct value
------And fstests/btrfs qgroup test group can easily detect them with
inode_cache mount option.
Although some of them are false alerts since old test cases are using
fixed golden output.
While new test cases will use "btrfs check" to detect qgroup mismatch.[CAUSE]
Inode_cache mount option will make commit_fs_roots() to call
btrfs_save_ino_cache() to update fs/subvol trees, and generate new
delayed refs.However we call btrfs_qgroup_prepare_account_extents() too early, before
commit_fs_roots().
This makes the "old_roots" for newly generated extents are always NULL.
For freeing extent case, this makes both new_roots and old_roots to be
empty, while correct old_roots should not be empty.
This causing qgroup numbers not decreased correctly.[FIX]
Modify the timing of calling btrfs_qgroup_prepare_account_extents() to
just before btrfs_qgroup_account_extents(), and add needed delayed_refs
handler.
So qgroup can handle inode_map mount options correctly.Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
The members have been effectively unused since "Btrfs: rework qgroup
accounting" (fcebe4562dec83b3), there's no substitute for
assert_qgroups_uptodate so it's removed as well.Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba -
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David Sterba
28 Feb, 2017
3 commits
-
Signed-off-by: Nikolay Borisov
Signed-off-by: David Sterba -
Signed-off-by: Nikolay Borisov
Signed-off-by: David Sterba -
Signed-off-by: Nikolay Borisov
Signed-off-by: David Sterba
17 Feb, 2017
3 commits
-
Added but never used.
Reviewed-by: Liu Bo
Signed-off-by: David Sterba -
write_all_supers and write_ctree_super are almost equal, the parameter
'trans' is unused so we can drop it and have just one helper.Reviewed-by: Liu Bo
Signed-off-by: David Sterba -
The quota status used to be tracked as a variable, so the mutex was
needed (until "Btrfs: add a flags field to btrfs_fs_info" afcdd129e05a9).
Since the status is a bit modified atomically and we don't hold the
mutex beyond the check, we can drop it.Signed-off-by: David Sterba
14 Feb, 2017
3 commits
-
Once a qgroup limit is exceeded, it's impossible to restore normal
operation to the subvolume without modifying the limit or removing
the subvolume. This is a surprising situation for many users used
to the typical workflow with quotas on other file systems where it's
possible to remove files until the used space is back under the limit.When we go to unlink a file and start the transaction, we'll hit
the qgroup limit while trying to reserve space for the items we'll
modify while removing the file. We discussed last month how best
to handle this situation and agreed that there is no perfect solution.
The best principle-of-least-surprise solution is to handle it similarly
to how we already handle ENOSPC when unlinking, which is to allow
the operation to succeed with the expectation that it will ultimately
release space under most circumstances.This patch modifies the transaction start path to select whether to
honor the qgroups limits. btrfs_start_transaction_fallback_global_rsv
is the only caller that skips enforcement. The reservation and tracking
still happens normally -- it just skips the enforcement step.Signed-off-by: Jeff Mahoney
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba -
Currently btrfs_ino takes a struct inode and this causes a lot of
internal btrfs functions which consume this ino to take a VFS inode,
rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
of VFS structs into the internals of btrfs first it's necessary to
eliminate all uses of struct inode for the purpose of inode. This patch
does that by using BTRFS_I to convert an inode to btrfs_inode. With
this problem eliminated subsequent patches will start eliminating the
passing of struct inode altogether, eventually resulting in a lot cleaner
code.Signed-off-by: Nikolay Borisov
[ fix btrfs_get_extent tracepoint prototype ]
Signed-off-by: David Sterba -
This replaces ACCESS_ONCE macro with the corresponding
READ|WRITE macrosSigned-off-by: Seraphime Kirkovski
Reviewed-by: David Sterba
Signed-off-by: David Sterba
06 Dec, 2016
9 commits
-
Now we only use the root parameter to print the root objectid in
a tracepoint. We can use the root parameter from the transaction
handle for that. It's also used to join the transaction with
async commits, so we remove the comment that it's just for checking.Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba -
btrfs_write_and_wait_marked_extents and btrfs_sync_log both call
btrfs_wait_marked_extents, which provides a core loop and then handles
errors differently based on whether it's it's a log root or not.This means that btrfs_write_and_wait_marked_extents needs to take a root
because btrfs_wait_marked_extents requires one, even though it's only
used to determine whether the root is a log root. The log root code
won't ever call into the transaction commit code using a log root, so we
can factor out the core loop and provide the error handling appropriate
to each waiter in new routines. This allows us to eventually remove
the root argument from btrfs_commit_transaction, and as a result,
btrfs_end_transaction.Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba -
There are loads of functions in btrfs that accept a root parameter
but only use it to obtain an fs_info pointer. Let's convert those to
just accept an fs_info pointer directly.Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba -
This results in btrfs_assert_delayed_root_empty and
btrfs_destroy_delayed_inode taking an fs_info instead of a root.Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba -
In routines where someptr->fs_info is referenced multiple times, we
introduce a convenience variable. This makes the code considerably
more readable.Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba -
Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba -
We track the node sizes per-root, but they never vary from the values
in the superblock. This patch messes with the 80-column style a bit,
but subsequent patches to factor out root->fs_info into a convenience
variable fix it up again.Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba -
There are many functions that are always called with the same root
argument. Rather than passing the same root every time, we can
pass an fs_info pointer instead and have the function get the root
pointer itself.Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba -
There are 11 functions that accept a root parameter and immediately
overwrite it. We can pass those an fs_info pointer instead.Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba
12 Oct, 2016
1 commit
-
Pull btrfs updates from Chris Mason:
"This is a big variety of fixes and cleanups.Liu Bo continues to fixup fuzzer related problems, and some of Josef's
cleanups are prep for his bigger extent buffer changes (slated for
v4.10)"* 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (39 commits)
Revert "btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs"
Btrfs: remove unnecessary btrfs_mark_buffer_dirty in split_leaf
Btrfs: don't BUG() during drop snapshot
btrfs: fix btrfs_no_printk stub helper
Btrfs: memset to avoid stale content in btree leaf
btrfs: parent_start initialization cleanup
btrfs: Remove already completed TODO comment
btrfs: Do not reassign count in btrfs_run_delayed_refs
btrfs: fix a possible umount deadlock
Btrfs: fix memory leak in do_walk_down
btrfs: btrfs_debug should consume fs_info when DEBUG is not defined
btrfs: convert send's verbose_printk to btrfs_debug
btrfs: convert pr_* to btrfs_* where possible
btrfs: convert printk(KERN_* to use pr_* calls
btrfs: unsplit printed strings
btrfs: clean the old superblocks before freeing the device
Btrfs: kill BUG_ON in run_delayed_tree_ref
Btrfs: don't leak reloc root nodes on error
btrfs: squash lines for simple wrapper functions
Btrfs: improve check_node to avoid reading corrupted nodes
...
28 Sep, 2016
1 commit
-
current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.Signed-off-by: Deepa Dinamani
Signed-off-by: Al Viro
27 Sep, 2016
4 commits
-
For many printks, we want to know which file system issued the message.
This patch converts most pr_* calls to use the btrfs_* versions instead.
In some cases, this means adding plumbing to allow call sites access to
an fs_info pointer.fs/btrfs/check-integrity.c is left alone for another day.
Signed-off-by: Jeff Mahoney
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
This patch converts printk(KERN_* style messages to use the pr_* versions.
One side effect is that anything that was KERN_DEBUG is now automatically
a dynamic debug message.Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba -
CodingStyle chapter 2:
"[...] never break user-visible strings such as printk messages,
because that breaks the ability to grep for them."This patch unsplits user-visible strings.
Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba -
Since we could get errors from the concurrent aborted transaction,
the check of this BUG_ON in start_transaction is not true any more.Say, while flushing free space cache inode's dirty pages,
btrfs_finish_ordered_io
-> btrfs_join_transaction_nolock
(the transaction has been aborted.)
-> BUG_ON(type == TRANS_JOIN_NOLOCK);Signed-off-by: Liu Bo
Reviewed-by: Josef Bacik
Signed-off-by: David Sterba
26 Sep, 2016
1 commit
-
We have a lot of random ints in btrfs_fs_info that can be put into flags. This
is mostly equivalent with the exception of how we deal with quota going on or
off, now instead we set a flag when we are turning it on or off and deal with
that appropriately, rather than just having a pending state that the current
quota_enabled gets set to. Thanks,Signed-off-by: Josef Bacik
Signed-off-by: David Sterba
25 Aug, 2016
1 commit
-
When running fstests generic/068, sometimes we got below deadlock:
xfs_io D ffff8800331dbb20 0 6697 6693 0x00000080
ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000
ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001
ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8
Call Trace:
[] schedule+0x35/0x80
[] rwsem_down_read_failed+0xf2/0x140
[] ? __filemap_fdatawrite_range+0xd1/0x100
[] call_rwsem_down_read_failed+0x18/0x30
[] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs]
[] percpu_down_read+0x35/0x50
[] __sb_start_write+0x2c/0x40
[] start_transaction+0x2a5/0x4d0 [btrfs]
[] btrfs_join_transaction+0x17/0x20 [btrfs]
[] btrfs_evict_inode+0x3c4/0x5d0 [btrfs]
[] evict+0xba/0x1a0
[] iput+0x196/0x200
[] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs]
[] btrfs_commit_transaction+0x928/0xa80 [btrfs]
[] btrfs_freeze+0x30/0x40 [btrfs]
[] freeze_super+0xf0/0x190
[] do_vfs_ioctl+0x4a5/0x5c0
[] ? do_audit_syscall_entry+0x66/0x70
[] ? syscall_trace_enter_phase1+0x11f/0x140
[] SyS_ioctl+0x79/0x90
[] do_syscall_64+0x62/0x110
[] entry_SYSCALL64_slow_path+0x25/0x25>From this warning, freeze_super() already holds SB_FREEZE_FS, but
btrfs_freeze() will call btrfs_commit_transaction() again, if
btrfs_commit_transaction() finds that it has delayed iputs to handle,
it'll start_transaction(), which will try to get SB_FREEZE_FS lock
again, then deadlock occurs.The root cause is that in btrfs, sync_filesystem(sb) does not make
sure all metadata is updated. There still maybe some codes adding
delayed iputs, see below sample race window:CPU1 | CPU2
|-> freeze_super() |
|-> sync_filesystem(sb); |
| |-> cleaner_kthread()
| | |-> btrfs_delete_unused_bgs()
| | |-> btrfs_remove_chunk()
| | |-> btrfs_remove_block_group()
| | |-> btrfs_add_delayed_iput()
| |
|-> sb->s_writers.frozen = SB_FREEZE_FS; |
|-> sb_wait_write(sb, SB_FREEZE_FS); |
| acquire SB_FREEZE_FS lock. |
| |
|-> btrfs_freeze() |
|-> btrfs_commit_transaction() |
|-> btrfs_run_delayed_iputs() |
| will handle delayed iputs, |
| that means start_transaction() |
| will be called, which will try |
| to get SB_FREEZE_FS lock. |To fix this issue, introduce a "int fs_frozen" to record internally whether
fs has been frozen. If fs has been frozen, we can not handle delayed iputs.Signed-off-by: Wang Xiaoguang
Reviewed-by: David Sterba
[ add comment to btrfs_freeze ]
Signed-off-by: David SterbaSigned-off-by: Chris Mason
26 Jul, 2016
1 commit
-
__btrfs_abort_transaction doesn't use its root parameter except to
obtain an fs_info pointer. We can obtain that from trans->root->fs_info
for now and from trans->fs_info in a later patch.Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba