Eric Lee / smarc-fsl-linux-kernel

06 Dec, 2018

1 commit

85df1f9f8 btrfs: release metadata before running delayed refs ... Browse Code »

We want to release the unused reservation we have since it refills the
delayed refs reserve, which will make everything go smoother when
running the delayed refs if we're short on our reservation.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Omar Sandoval
Reviewed-by: Liu Bo
Reviewed-by: Nikolay Borisov
Signed-off-by: Josef Bacik
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin

Josef Bacik
2018-12-06 02:41:25 +0800

14 Nov, 2018

1 commit

87d7ea688 btrfs: don't run delayed_iputs in commit ... Browse Code »

commit 30928e9baac238a7330085a1c5747f0b5df444b4 upstream.

This could result in a really bad case where we do something like

evict
evict_refill_and_join
btrfs_commit_transaction
btrfs_run_delayed_iputs
evict
evict_refill_and_join
btrfs_commit_transaction
... forever

We have plenty of other places where we run delayed iputs that are much
safer, let those do the work.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana
Signed-off-by: Josef Bacik
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman

Josef Bacik
2018-11-14 03:15:17 +0800

30 May, 2018

1 commit

204bfcda8 btrfs: qgroup: Fix root item corruption when multiple same source snapshots are … ... Browse Code »

…created with quota enabled

[ Upstream commit 4d31778aa2fa342f5f92ca4025b293a1729161d1 ]

When multiple pending snapshots referring to the same source subvolume
are executed, enabled quota will cause root item corruption, where root
items are using old bytenr (no backref in extent tree).

This can be triggered by fstests btrfs/152.

The cause is when source subvolume is still dirty, extra commit
(simplied transaction commit) of qgroup_account_snapshot() can skip
dirty roots not recorded in current transaction, making root item of
source subvolume not updated.

Fix it by forcing recording source subvolume in current transaction
before qgroup sub-transaction commit.

Reported-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Qu Wenruo
2018-05-30 13:52:26 +0800

19 Mar, 2018

1 commit

2b0509fa4 Revert "btrfs: use proper endianness accessors for super_copy" ... Browse Code »

This reverts commit 3c181c12c431fe33b669410d663beb9cceefcd1b as it
causes breakage on big endian systems with btrfs images.

Reported-by: Christoph Biedl
Cc: Anand Jain
Cc: Liu Bo
Cc: David Sterba
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2018-03-19 15:42:47 +0800

09 Mar, 2018

1 commit

eae6179f5 btrfs: use proper endianness accessors for super_copy ... Browse Code »

commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.

The fs_info::super_copy is a byte copy of the on-disk structure and all
members must use the accessor macros/functions to obtain the right
value. This was missing in update_super_roots and in sysfs readers.

Moving between opposite endianness hosts will report bogus numbers in
sysfs, and mount may fail as the root will not be restored correctly. If
the filesystem is always used on a same endian host, this will not be a
problem.

Fix this by using the btrfs_set_super...() functions to set
fs_info::super_copy values, and for the sysfs, use the cached
fs_info::nodesize/sectorsize values.

CC: stable@vger.kernel.org
Fixes: df93589a17378 ("btrfs: export more from FS_INFO to sysfs")
Signed-off-by: Anand Jain
Reviewed-by: Liu Bo
Reviewed-by: David Sterba
[ update changelog ]
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman

Anand Jain
2018-03-09 14:41:05 +0800

30 Jun, 2017

2 commits

6374e57ad btrfs: fix integer overflow in calc_reclaim_items_nr ... Browse Code »

Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with
v4.12-rc6. This was because commit 70e7af244 made it possible for
calc_reclaim_items_nr() to return a negative number. It's not really a
bug in that commit, it just didn't go far enough down the stack to find
all the possible 64->32 bit overflows.

This switches calc_reclaim_items_nr() to return a u64 and changes everyone
that uses the results of that math to u64 as well.

Reported-by: Dave Jones
Fixes: 70e7af2 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
Signed-off-by: Chris Mason
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Chris Mason
2017-06-30 02:17:02 +0800
d1b8b94a2 btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function ... Browse Code »

Quite a lot of qgroup corruption happens due to wrong time of calling
btrfs_qgroup_prepare_account_extents().

Since the safest time is to call it just before
btrfs_qgroup_account_extents(), there is no need to separate these 2
functions.

Merging them will make code cleaner and less bug prone.

Signed-off-by: Qu Wenruo
[ changelog and comment adjustments ]
Signed-off-by: David Sterba

Qu Wenruo
2017-06-30 02:17:02 +0800

20 Jun, 2017

3 commits

fac03c8da btrfs: move fs_info::fs_frozen to the flags ... Browse Code »

We can keep the state among the other fs_info flags, there's no reason
why fs_frozen would need to be separate.

Reviewed-by: Nikolay Borisov
Signed-off-by: David Sterba

David Sterba
2017-06-20 20:22:42 +0800
4b5faeac4 btrfs: use generic slab for for btrfs_transaction ... Browse Code »

Observing the number of slab objects of btrfs_transaction, there's just
one active on an almost quiescent filesystem, and the number of objects
goes to about ten when sync is in progress. Then the nubmer goes down to
1. This matches the expectations of the transaction lifetime.

For such use the separate slab cache is not justified, as we do not
reuse objects frequently. For the shortlived transaction, the generic
slab (size 512) should be ok. We can optimistically expect that the 512
slabs are not all used (fragmentation) and there are free slots to take
when we do the allocation, compared to potentially allocating a whole new
page for the separate slab.

We'll lose the stats about the object use, which could be added later if
we really need them.

Signed-off-by: David Sterba

David Sterba
2017-06-20 00:26:01 +0800
c6100a4b4 Btrfs: replace tree->mapping with tree->private_data ... Browse Code »

For extent_io tree's we have carried the address_mapping of the inode
around in the io tree in order to pull the inode back out for calling
into various tree ops hooks. This works fine when everything that has
an extent_io_tree has an inode. But we are going to remove the
btree_inode, so we need to change this. Instead just have a generic
void * for private data that we can initialize with, and have all the
tree ops use that instead. This had a lot of cascading changes but
should be relatively straightforward.

Signed-off-by: Josef Bacik
Reviewed-by: Chandan Rajendra
Reviewed-by: David Sterba
[ minor reordering of the callback prototypes ]
Signed-off-by: David Sterba

Josef Bacik
2017-06-20 00:25:58 +0800

18 Apr, 2017

3 commits

82bafb38c btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option ... Browse Code »

[BUG]
The easist way to reproduce the bug is:
------
# mkfs.btrfs -f $dev -n 16K
# mount $dev $mnt -o inode_cache
# btrfs quota enable $mnt
# btrfs quota rescan -w $mnt
# btrfs qgroup show $mnt
qgroupid rfer excl
-------- ---- ----
0/5 32.00KiB 32.00KiB
^^ Twice the correct value
------

And fstests/btrfs qgroup test group can easily detect them with
inode_cache mount option.
Although some of them are false alerts since old test cases are using
fixed golden output.
While new test cases will use "btrfs check" to detect qgroup mismatch.

[CAUSE]
Inode_cache mount option will make commit_fs_roots() to call
btrfs_save_ino_cache() to update fs/subvol trees, and generate new
delayed refs.

However we call btrfs_qgroup_prepare_account_extents() too early, before
commit_fs_roots().
This makes the "old_roots" for newly generated extents are always NULL.
For freeing extent case, this makes both new_roots and old_roots to be
empty, while correct old_roots should not be empty.
This causing qgroup numbers not decreased correctly.

[FIX]
Modify the timing of calling btrfs_qgroup_prepare_account_extents() to
just before btrfs_qgroup_account_extents(), and add needed delayed_refs
handler.
So qgroup can handle inode_map mount options correctly.

Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Qu Wenruo
2017-04-18 20:07:26 +0800
f486135eb btrfs: remove unused qgroup members from btrfs_trans_handle ... Browse Code »

The members have been effectively unused since "Btrfs: rework qgroup
accounting" (fcebe4562dec83b3), there's no substitute for
assert_qgroups_uptodate so it's removed as well.

Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

David Sterba
2017-04-18 20:07:25 +0800
9b64f57dd btrfs: convert btrfs_transaction.use_count from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David Sterba

Elena Reshetova
2017-04-18 20:07:23 +0800

28 Feb, 2017

3 commits

6ef06d279 btrfs: Make btrfs_i_size_write take btrfs_inode ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David Sterba

Nikolay Borisov
2017-02-28 18:30:06 +0800
877574e25 btrfs: Make btrfs_set_inode_index take btrfs_inode ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David Sterba

Nikolay Borisov
2017-02-28 18:30:06 +0800
8e7611cf3 btrfs: Make btrfs_insert_dir_item take btrfs_inode ... Browse Code »

Signed-off-by: Nikolay Borisov
Signed-off-by: David Sterba

Nikolay Borisov
2017-02-28 18:30:06 +0800

17 Feb, 2017

3 commits

8b74c03e3 btrfs: remove unused parameter from btrfs_prepare_extent_commit ... Browse Code »

Added but never used.

Reviewed-by: Liu Bo
Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:52 +0800
eece6a9cf btrfs: merge two superblock writing helpers ... Browse Code »

write_all_supers and write_ctree_super are almost equal, the parameter
'trans' is unused so we can drop it and have just one helper.

Reviewed-by: Liu Bo
Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:51 +0800
9ea6e2b54 btrfs: remove unnecessary mutex lock in qgroup_account_snapshot ... Browse Code »

The quota status used to be tracked as a variable, so the mutex was
needed (until "Btrfs: add a flags field to btrfs_fs_info" afcdd129e05a9).
Since the status is a bit modified atomically and we don't hold the
mutex beyond the check, we can drop it.

Signed-off-by: David Sterba

David Sterba
2017-02-17 19:03:50 +0800

14 Feb, 2017

3 commits

003d7c59e btrfs: allow unlink to exceed subvolume quota ... Browse Code »

Once a qgroup limit is exceeded, it's impossible to restore normal
operation to the subvolume without modifying the limit or removing
the subvolume. This is a surprising situation for many users used
to the typical workflow with quotas on other file systems where it's
possible to remove files until the used space is back under the limit.

When we go to unlink a file and start the transaction, we'll hit
the qgroup limit while trying to reserve space for the items we'll
modify while removing the file. We discussed last month how best
to handle this situation and agreed that there is no perfect solution.
The best principle-of-least-surprise solution is to handle it similarly
to how we already handle ENOSPC when unlinking, which is to allow
the operation to succeed with the expectation that it will ultimately
release space under most circumstances.

This patch modifies the transaction start path to select whether to
honor the qgroups limits. btrfs_start_transaction_fallback_global_rsv
is the only caller that skips enforcement. The reservation and tracking
still happens normally -- it just skips the enforcement step.

Signed-off-by: Jeff Mahoney
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

Jeff Mahoney
2017-02-14 22:50:59 +0800
4a0cc7ca6 btrfs: Make btrfs_ino take a struct btrfs_inode ... Browse Code »

Currently btrfs_ino takes a struct inode and this causes a lot of
internal btrfs functions which consume this ino to take a VFS inode,
rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
of VFS structs into the internals of btrfs first it's necessary to
eliminate all uses of struct inode for the purpose of inode. This patch
does that by using BTRFS_I to convert an inode to btrfs_inode. With
this problem eliminated subsequent patches will start eliminating the
passing of struct inode altogether, eventually resulting in a lot cleaner
code.

Signed-off-by: Nikolay Borisov
[ fix btrfs_get_extent tracepoint prototype ]
Signed-off-by: David Sterba

Nikolay Borisov
2017-02-14 22:50:51 +0800
20c7bcec6 Btrfs: ACCESS_ONCE cleanup ... Browse Code »

This replaces ACCESS_ONCE macro with the corresponding
READ|WRITE macros

Signed-off-by: Seraphime Kirkovski
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Seraphime Kirkovski
2017-02-14 22:50:50 +0800

06 Dec, 2016

9 commits

3a45bb207 btrfs: remove root parameter from transaction commit/end routines ... Browse Code »

Now we only use the root parameter to print the root objectid in
a tracepoint. We can use the root parameter from the transaction
handle for that. It's also used to join the transaction with
async commits, so we remove the comment that it's just for checking.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:07:00 +0800
bf89d38fe btrfs: split btrfs_wait_marked_extents into normal and tree log functions ... Browse Code »

btrfs_write_and_wait_marked_extents and btrfs_sync_log both call
btrfs_wait_marked_extents, which provides a core loop and then handles
errors differently based on whether it's it's a log root or not.

This means that btrfs_write_and_wait_marked_extents needs to take a root
because btrfs_wait_marked_extents requires one, even though it's only
used to determine whether the root is a log root. The log root code
won't ever call into the transaction commit code using a log root, so we
can factor out the core loop and provide the error handling appropriate
to each waiter in new routines. This allows us to eventually remove
the root argument from btrfs_commit_transaction, and as a result,
btrfs_end_transaction.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:07:00 +0800
2ff7e61e0 btrfs: take an fs_info directly when the root is not used otherwise ... Browse Code »

There are loads of functions in btrfs that accept a root parameter
but only use it to obtain an fs_info pointer. Let's convert those to
just accept an fs_info pointer directly.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:59 +0800
ccdf9b305 btrfs: root->fs_info cleanup, access fs_info->delayed_root directly ... Browse Code »

This results in btrfs_assert_delayed_root_empty and
btrfs_destroy_delayed_inode taking an fs_info instead of a root.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:59 +0800
0b246afa6 btrfs: root->fs_info cleanup, add fs_info convenience variables ... Browse Code »

In routines where someptr->fs_info is referenced multiple times, we
introduce a convenience variable. This makes the code considerably
more readable.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:59 +0800
27965b6c2 btrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_size ... Browse Code »

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:58 +0800
da17066c4 btrfs: pull node/sector/stripe sizes out of root and into fs_info ... Browse Code »

We track the node sizes per-root, but they never vary from the values
in the superblock. This patch messes with the 80-column style a bit,
but subsequent patches to factor out root->fs_info into a convenience
variable fix it up again.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:58 +0800
6bccf3ab1 btrfs: call functions that always use the same root with fs_info instead ... Browse Code »

There are many functions that are always called with the same root
argument. Rather than passing the same root every time, we can
pass an fs_info pointer instead and have the function get the root
pointer itself.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:57 +0800
5b4aacefb btrfs: call functions that overwrite their root parameter with fs_info ... Browse Code »

There are 11 functions that accept a root parameter and immediately
overwrite it. We can pass those an fs_info pointer instead.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:57 +0800

12 Oct, 2016

1 commit

f29135b54 Merge branch 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs updates from Chris Mason:
"This is a big variety of fixes and cleanups.

Liu Bo continues to fixup fuzzer related problems, and some of Josef's
cleanups are prep for his bigger extent buffer changes (slated for
v4.10)"

* 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (39 commits)
Revert "btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs"
Btrfs: remove unnecessary btrfs_mark_buffer_dirty in split_leaf
Btrfs: don't BUG() during drop snapshot
btrfs: fix btrfs_no_printk stub helper
Btrfs: memset to avoid stale content in btree leaf
btrfs: parent_start initialization cleanup
btrfs: Remove already completed TODO comment
btrfs: Do not reassign count in btrfs_run_delayed_refs
btrfs: fix a possible umount deadlock
Btrfs: fix memory leak in do_walk_down
btrfs: btrfs_debug should consume fs_info when DEBUG is not defined
btrfs: convert send's verbose_printk to btrfs_debug
btrfs: convert pr_* to btrfs_* where possible
btrfs: convert printk(KERN_* to use pr_* calls
btrfs: unsplit printed strings
btrfs: clean the old superblocks before freeing the device
Btrfs: kill BUG_ON in run_delayed_tree_ref
Btrfs: don't leak reloc root nodes on error
btrfs: squash lines for simple wrapper functions
Btrfs: improve check_node to avoid reading corrupted nodes
...

Linus Torvalds
2016-10-12 02:23:06 +0800

28 Sep, 2016

1 commit

c2050a454 fs: Replace current_fs_time() with current_time() ... Browse Code »

current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.

Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.

Signed-off-by: Deepa Dinamani
Signed-off-by: Al Viro

Deepa Dinamani
2016-09-28 09:06:22 +0800

27 Sep, 2016

4 commits

ab8d0fc48 btrfs: convert pr_* to btrfs_* where possible ... Browse Code »

For many printks, we want to know which file system issued the message.

This patch converts most pr_* calls to use the btrfs_* versions instead.
In some cases, this means adding plumbing to allow call sites access to
an fs_info pointer.

fs/btrfs/check-integrity.c is left alone for another day.

Signed-off-by: Jeff Mahoney
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Jeff Mahoney
2016-09-27 01:37:04 +0800
62e855771 btrfs: convert printk(KERN_* to use pr_* calls ... Browse Code »

This patch converts printk(KERN_* style messages to use the pr_* versions.

One side effect is that anything that was KERN_DEBUG is now automatically
a dynamic debug message.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-09-27 00:08:44 +0800
5d163e0e6 btrfs: unsplit printed strings ... Browse Code »

CodingStyle chapter 2:
"[...] never break user-visible strings such as printk messages,
because that breaks the ability to grep for them."

This patch unsplits user-visible strings.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-09-27 00:08:44 +0800
a43f7f820 Btrfs: remove BUG_ON in start_transaction ... Browse Code »

Since we could get errors from the concurrent aborted transaction,
the check of this BUG_ON in start_transaction is not true any more.

Say, while flushing free space cache inode's dirty pages,
btrfs_finish_ordered_io
-> btrfs_join_transaction_nolock
(the transaction has been aborted.)
-> BUG_ON(type == TRANS_JOIN_NOLOCK);

Signed-off-by: Liu Bo
Reviewed-by: Josef Bacik
Signed-off-by: David Sterba

Liu Bo
2016-09-27 00:04:01 +0800

26 Sep, 2016

1 commit

afcdd129e Btrfs: add a flags field to btrfs_fs_info ... Browse Code »

We have a lot of random ints in btrfs_fs_info that can be put into flags. This
is mostly equivalent with the exception of how we deal with quota going on or
off, now instead we set a flag when we are turning it on or off and deal with
that appropriately, rather than just having a pending state that the current
quota_enabled gets set to. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: David Sterba

Josef Bacik
2016-09-26 23:59:49 +0800

25 Aug, 2016

1 commit

9e7cc91a6 btrfs: fix fsfreeze hang caused by delayed iputs deal ... Browse Code »

When running fstests generic/068, sometimes we got below deadlock:
xfs_io D ffff8800331dbb20 0 6697 6693 0x00000080
ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000
ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001
ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8
Call Trace:
[] schedule+0x35/0x80
[] rwsem_down_read_failed+0xf2/0x140
[] ? __filemap_fdatawrite_range+0xd1/0x100
[] call_rwsem_down_read_failed+0x18/0x30
[] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs]
[] percpu_down_read+0x35/0x50
[] __sb_start_write+0x2c/0x40
[] start_transaction+0x2a5/0x4d0 [btrfs]
[] btrfs_join_transaction+0x17/0x20 [btrfs]
[] btrfs_evict_inode+0x3c4/0x5d0 [btrfs]
[] evict+0xba/0x1a0
[] iput+0x196/0x200
[] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs]
[] btrfs_commit_transaction+0x928/0xa80 [btrfs]
[] btrfs_freeze+0x30/0x40 [btrfs]
[] freeze_super+0xf0/0x190
[] do_vfs_ioctl+0x4a5/0x5c0
[] ? do_audit_syscall_entry+0x66/0x70
[] ? syscall_trace_enter_phase1+0x11f/0x140
[] SyS_ioctl+0x79/0x90
[] do_syscall_64+0x62/0x110
[] entry_SYSCALL64_slow_path+0x25/0x25

>From this warning, freeze_super() already holds SB_FREEZE_FS, but
btrfs_freeze() will call btrfs_commit_transaction() again, if
btrfs_commit_transaction() finds that it has delayed iputs to handle,
it'll start_transaction(), which will try to get SB_FREEZE_FS lock
again, then deadlock occurs.

The root cause is that in btrfs, sync_filesystem(sb) does not make
sure all metadata is updated. There still maybe some codes adding
delayed iputs, see below sample race window:

CPU1 | CPU2
|-> freeze_super() |
|-> sync_filesystem(sb); |
| |-> cleaner_kthread()
| | |-> btrfs_delete_unused_bgs()
| | |-> btrfs_remove_chunk()
| | |-> btrfs_remove_block_group()
| | |-> btrfs_add_delayed_iput()
| |
|-> sb->s_writers.frozen = SB_FREEZE_FS; |
|-> sb_wait_write(sb, SB_FREEZE_FS); |
| acquire SB_FREEZE_FS lock. |
| |
|-> btrfs_freeze() |
|-> btrfs_commit_transaction() |
|-> btrfs_run_delayed_iputs() |
| will handle delayed iputs, |
| that means start_transaction() |
| will be called, which will try |
| to get SB_FREEZE_FS lock. |

To fix this issue, introduce a "int fs_frozen" to record internally whether
fs has been frozen. If fs has been frozen, we can not handle delayed iputs.

Signed-off-by: Wang Xiaoguang
Reviewed-by: David Sterba
[ add comment to btrfs_freeze ]
Signed-off-by: David Sterba

Signed-off-by: Chris Mason

Wang Xiaoguang
2016-08-25 18:58:26 +0800

26 Jul, 2016

1 commit

66642832f btrfs: btrfs_abort_transaction, drop root parameter ... Browse Code »

__btrfs_abort_transaction doesn't use its root parameter except to
obtain an fs_info pointer. We can obtain that from trans->root->fs_info
for now and from trans->fs_info in a later patch.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-07-26 19:54:26 +0800