Eric Lee / smarc-fsl-linux-kernel

06 Dec, 2016

2 commits

6bccf3ab1 btrfs: call functions that always use the same root with fs_info instead ... Browse Code »

There are many functions that are always called with the same root
argument. Rather than passing the same root every time, we can
pass an fs_info pointer instead and have the function get the root
pointer itself.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:57 +0800
5b4aacefb btrfs: call functions that overwrite their root parameter with fs_info ... Browse Code »

There are 11 functions that accept a root parameter and immediately
overwrite it. We can pass those an fs_info pointer instead.

Signed-off-by: Jeff Mahoney
Signed-off-by: David Sterba

Jeff Mahoney
2016-12-06 23:06:57 +0800

30 Nov, 2016

38 commits

515bdc479 Merge branch 'misc-4.10' into for-chris-4.10-20161130 Browse Code »

David Sterba
2016-11-30 21:02:20 +0800
1d57ee941 btrfs: improve delayed refs iterations ... Browse Code »

This issue was found when I tried to delete a heavily reflinked file,
when deleting such files, other transaction operation will not have a
chance to make progress, for example, start_transaction() will blocked
in wait_current_trans(root) for long time, sometimes it even triggers
soft lockups, and the time taken to delete such heavily reflinked file
is also very large, often hundreds of seconds. Using perf top, it reports
that:

PerfTop: 7416 irqs/sec kernel:99.8% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs)
---------------------------------------------------------------------------------------
84.37% [btrfs] [k] __btrfs_run_delayed_refs.constprop.80
11.02% [kernel] [k] delay_tsc
0.79% [kernel] [k] _raw_spin_unlock_irq
0.78% [kernel] [k] _raw_spin_unlock_irqrestore
0.45% [kernel] [k] do_raw_spin_lock
0.18% [kernel] [k] __slab_alloc
It seems __btrfs_run_delayed_refs() took most cpu time, after some debug
work, I found it's select_delayed_ref() causing this issue, for a delayed
head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but
select_delayed_ref() will firstly try to iterate node list to find
BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and
waste much time.

To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head,
then in select_delayed_ref(), if this list is not empty, we can directly use
nodes in this list. With this patch, it just took about 10~15 seconds to
delte the same file. Now using perf top, it reports that:

PerfTop: 2734 irqs/sec kernel:99.5% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs)
----------------------------------------------------------------------------------------

20.74% [kernel] [k] _raw_spin_unlock_irqrestore
16.33% [kernel] [k] __slab_alloc
5.41% [kernel] [k] lock_acquired
4.42% [kernel] [k] lock_acquire
4.05% [kernel] [k] lock_release
3.37% [kernel] [k] _raw_spin_unlock_irq

For normal files, this patch also gives help, at least we do not need to
iterate whole list to found BTRFS_ADD_DELAYED_REF nodes.

Signed-off-by: Wang Xiaoguang
Reviewed-by: Liu Bo
Tested-by: Holger Hoffstätte
Signed-off-by: David Sterba

Wang Xiaoguang
2016-11-30 20:45:21 +0800
824d8dff8 btrfs: qgroup: Fix qgroup data leaking by using subtree tracing ... Browse Code »

Commit 62b99540a1d91e464 (btrfs: relocation: Fix leaking qgroups numbers
on data extents) only fixes the problem partly.

The previous fix is to trace all new data extents at transaction commit
time when balance finishes.

However balance is not done in a large transaction, every path
replacement can happen in its own transaction.
This makes the fix useless if transaction commits during relocation.

For example:
relocate_block_group()
|-merge_reloc_roots()
| |- merge_reloc_root()
| |- btrfs_start_transaction()
Signed-off-by: Qu Wenruo
Reviewed-and-Tested-by: Goldwyn Rodrigues
Signed-off-by: David Sterba

Qu Wenruo
2016-11-30 20:45:21 +0800
33d1f05cc btrfs: Export and move leaf/subtree qgroup helpers to qgroup.c ... Browse Code »

Move account_shared_subtree() to qgroup.c and rename it to
btrfs_qgroup_trace_subtree().

Do the same thing for account_leaf_items() and rename it to
btrfs_qgroup_trace_leaf_items().

Since all these functions are only for qgroup, move them to qgroup.c and
export them is more appropriate.

Signed-off-by: Qu Wenruo
Reviewed-and-Tested-by: Goldwyn Rodrigues
Signed-off-by: David Sterba

Qu Wenruo
2016-11-30 20:45:21 +0800
50b3e040b btrfs: qgroup: Rename functions to make it follow reserve,trace,account steps ... Browse Code »

Rename btrfs_qgroup_insert_dirty_extent(_nolock) to
btrfs_qgroup_trace_extent(_nolock), according to the new
reserve/trace/account naming schema.

Signed-off-by: Qu Wenruo
Reviewed-and-Tested-by: Goldwyn Rodrigues
Signed-off-by: David Sterba

Qu Wenruo
2016-11-30 20:45:21 +0800
1d2beaa95 btrfs: qgroup: Add comments explaining how btrfs qgroup works ... Browse Code »

Add explaination how btrfs qgroups work.

Qgroup is split into 3 main phrases:
1) Reserve
To ensure qgroup doesn't exceed its limit

2) Trace
To info qgroup to trace which extent

3) Account
Calculate qgroup number change for each traced extent.

This should save quite some time for new developers.

Signed-off-by: Qu Wenruo
Reviewed-by: Goldwyn Rodrigues
Signed-off-by: David Sterba

Qu Wenruo
2016-11-30 20:45:21 +0800
1621f8f3f btrfs: use bio_for_each_segment_all in __btrfsic_submit_bio ... Browse Code »

And remove the bogus check for a NULL return value from kmap, which
can't happen. While we're at it: I don't think that kmapping up to 256
will work without deadlocks on highmem machines, a better idea would
be to use vm_map_ram to map all of them into a single virtual address
range. Incidentally that would also simplify the code a lot.

Signed-off-by: Christoph Hellwig
Reviewed-by: Omar Sandoval
Signed-off-by: David Sterba

Christoph Hellwig
2016-11-30 20:45:20 +0800
4989d277e btrfs: refactor __btrfs_lookup_bio_sums to use bio_for_each_segment_all ... Browse Code »

Rework the loop a little bit to use the generic bio_for_each_segment_all
helper for iterating over the bio.

Signed-off-by: Christoph Hellwig
Reviewed-by: Omar Sandoval
Signed-off-by: David Sterba

Christoph Hellwig
2016-11-30 20:45:20 +0800
2a4d0c906 btrfs: calculate end of bio offset properly ... Browse Code »

Use the bvec offset and len members to prepare for multipage bvecs.

Signed-off-by: Christoph Hellwig
Signed-off-by: David Sterba

Christoph Hellwig
2016-11-30 20:45:20 +0800
81381053d btrfs: use bi_size ... Browse Code »

Instead of using bi_vcnt to calculate it.

Signed-off-by: Christoph Hellwig
Reviewed-by: Omar Sandoval
Signed-off-by: David Sterba

Christoph Hellwig
2016-11-30 20:45:20 +0800
6cd7ce493 btrfs: don't access the bio directly in btrfs_csum_one_bio ... Browse Code »

Use bio_for_each_segment_all to iterate over the segments instead.
This requires a bit of reshuffling so that we only lookup up the ordered
item once inside the bio_for_each_segment_all loop.

Signed-off-by: Christoph Hellwig
Reviewed-by: Omar Sandoval
Signed-off-by: David Sterba

Christoph Hellwig
2016-11-30 20:45:20 +0800
6a2de22f6 btrfs: don't access the bio directly in the direct I/O code ... Browse Code »

Just use bio_for_each_segment_all to iterate over all segments.

Signed-off-by: Christoph Hellwig
Reviewed-by: Omar Sandoval
Signed-off-by: David Sterba

Christoph Hellwig
2016-11-30 20:45:20 +0800
80ace3e40 btrfs: don't access the bio directly in the raid5/6 code ... Browse Code »

Just use bio_for_each_segment_all to iterate over all segments.

Signed-off-by: Christoph Hellwig
Reviewed-by: Omar Sandoval
Signed-off-by: David Sterba

Christoph Hellwig
2016-11-30 20:45:19 +0800
974b1adc3 btrfs: use bio iterators for the decompression handlers ... Browse Code »

Pass the full bio to the decompression routines and use bio iterators
to iterate over the data in the bio.

Signed-off-by: Christoph Hellwig
Signed-off-by: David Sterba

Christoph Hellwig
2016-11-30 20:45:19 +0800
0c476a5d7 btrfs: Ensure proper sector alignment for btrfs_free_reserved_data_space ... Browse Code »

This fixes the WARN_ON on BTRFS_I(inode)->reserved_extents in
btrfs_destroy_inode and the WARN_ON on nonzero delalloc bytes on umount
with qgroups enabled.

I was able to reproduce this by setting up a small (~500kb) quota limit
and writing a file one byte at a time until I hit the limit. The warnings
would all hit on umount.

The root cause is that we would reserve a block-sized range in both
the reservation and the quota in btrfs_check_data_free_space, but if we
encountered a problem (like e.g. EDQUOT), we would only release the single
byte in the qgroup reservation. That caused an iotree state split, which
increased the number of outstanding extents, in turn disallowing releasing
the metadata reservation.

Signed-off-by: Jeff Mahoney
Reviewed-by: Qu Wenruo
Signed-off-by: David Sterba

Jeff Mahoney
2016-11-30 20:45:19 +0800
f94480bd7 Btrfs: abort transaction if fill_holes() fails ... Browse Code »

At this point we will have dropped extent entries from the file, so if we fail
to insert the new hole entries then we are leaving the fs in a corrupt state
(albeit an easily fixed one). Abort the transaciton if this happens so we can
avoid corrupting the fs. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: David Sterba

Josef Bacik
2016-11-30 20:45:19 +0800
62fe51c1d Btrfs: fix file extent corruption ... Browse Code »

In order to do hole punching we have a block reserve to hold the reservation we
need to drop the extents in our range. Since we could end up dropping a lot of
extents we set rsv->failfast so we can just loop around again and drop the
remaining of the range. Unfortunately we unconditionally fill the hole extents
in and start from the last extent we encountered, which we may or may not have
dropped. So this can result in overlapping file extent entries, which can be
tripped over in a variety of ways, either by hitting BUG_ON(!ret) in
fill_holes() after the search, or in btrfs_set_item_key_safe() in
btrfs_drop_extent() at a later time by an unrelated task. Fix this by only
setting drop_end to the last extent we did actually drop. This way our holes
are filled in properly for the range that we did drop, and the rest of the range
that remains to be dropped is actually dropped. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: David Sterba

Josef Bacik
2016-11-30 20:45:19 +0800
d2fbb2b58 btrfs: increment ctx->pos for every emitted or skipped dirent in readdir ... Browse Code »

If we process the last item in the leaf and hit an I/O error while
reading the next leaf, we return -EIO without having adjusted the
position. Since we have emitted dirents, getdents() will return
the byte count to the user instead of the error. Subsequent callers
will emit the last successful dirent again, and return -EIO again,
with the same result. Callers loop forever.

Instead, if we always increment ctx->pos after emitting or skipping
the dirent, we'll be sure that we won't hit the same one again. When
we go to process the next leaf, we won't have emitted any dirents
and the -EIO will be returned to the user properly. We also don't
need to track if we've emitted a dirent already or if we've changed
the position yet.

Signed-off-by: Jeff Mahoney
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Jeff Mahoney
2016-11-30 20:45:19 +0800
c2951f32d btrfs: remove old tree_root dirent processing in btrfs_real_readdir() ... Browse Code »

Commit 3de4586c527 (Btrfs: Allow subvolumes and snapshots anywhere
in the directory tree) introduced the current system of placing
snapshots in the directory tree. It also introduced the behavior of
creating the snapshot and then creating the directory entries for it.

We've kept this code around for compatibility reasons, but it turns
out that no file systems with the old tree_root based snapshots can
be mounted on newer (>= 2009) kernels anyway. About a month after the
above commit, commit 2a7108ad89e (Btrfs: rev the disk format for the
inode compat and csum selection changes) landed, changing the superblock
magic number.

As a result, we know that we'll never encounter tree_root-based dirents
or have to deal with skipping our own snapshot dirents. Since that
also means that we're now only iterating over DIR_INDEX items, which only
contain one directory entry per leaf item, we don't need to loop over
the leaf item contents anymore either.

Signed-off-by: Jeff Mahoney
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Jeff Mahoney
2016-11-30 20:45:19 +0800
d1111a754 btrfs: Call kunmap if zlib_inflateInit2 fails ... Browse Code »

If zlib_inflateInit2 fails, the input page is never unmapped.
Add a call to kunmap when it fails.

Signed-off-by: Nick Terrell
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nick Terrell
2016-11-30 20:45:18 +0800
ed0df618b btrfs: store and load values of stripes_min/stripes_max in balance status item ... Browse Code »

The balance status item contains currently known filter values, but the
stripes filter was unintentionally not among them. This would mean, that
interrupted and automatically restarted balance does not apply the
stripe filters.

Fixes: dee32d0ac3719ef8d640efaf0884111df444730f
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:18 +0800
4d5106a12 btrfs: remove redundant check of btrfs_iget return value ... Browse Code »

'btrfs_iget()' can not return NULL, so this test can be removed.

Signed-off-by: Christophe JAILLET
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Christophe JAILLET
2016-11-30 20:45:18 +0800
0b5e3dafb btrfs: change btrfs_csum_final result param type to u8 ... Browse Code »

csum member of struct btrfs_super_block has array type of u8. It makes
sense that function btrfs_csum_final should be also declared to accept
u8 *. I changed the declaration of method void btrfs_csum_final(u32 crc,
char *result); to void btrfs_csum_final(u32 crc, u8 *result);

Signed-off-by: Domagoj Tršan
[ changed cast to u8 at several call sites ]
Signed-off-by: David Sterba

Domagoj Tršan
2016-11-30 20:45:18 +0800
a23eaa875 Btrfs: adjust len of writes if following a preallocated extent ... Browse Code »

If we have

|0--hole--4095||4096--preallocate--12287|

instead of using preallocated space, a 8K direct write will just
create a new 8K extent and it'll end up with

|0--new extent--8191||8192--preallocate--12287|

It's because we find a hole em and then go to create a new 8K
extent directly without adjusting @len.

Signed-off-by: Liu Bo
Reviewed-by: Chris Mason
Signed-off-by: David Sterba

Liu Bo
2016-11-30 20:45:18 +0800
7b9ea6279 btrfs: return early from failed memory allocations in ioctl handlers ... Browse Code »

There is no need to call kfree() if memdup_user() fails, as no memory
was allocated and the error in the error-valued pointer should be returned.

Signed-off-by: Shailendra Verma
[ edit subject ]
Signed-off-by: David Sterba

Shailendra Verma
2016-11-30 20:45:18 +0800
58e8012cc btrfs: add optimized version of eb to eb copy ... Browse Code »

Using copy_extent_buffer is suitable for copying betwenn buffers from an
arbitrary offset and deals with page boundaries. This is not necessary
when doing a full extent_buffer-to-extent_buffer copy. We can utilize
the copy_page helper as well.

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:17 +0800
b159fa280 btrfs: remove constant parameter to memset_extent_buffer and rename it ... Browse Code »

The only memset we do is to 0, so sink the parameter to the function and
simplify all calls. Rename the function to reflect the behaviour.

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:17 +0800
fba1acf9f btrfs: use specialized page copying helpers in btrfs_clone_extent_buffer ... Browse Code »

The copy_page is usually optimized and can be faster than memcpy.

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:17 +0800
d24ee97b9 btrfs: use new helpers to set uuids in eb ... Browse Code »

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:17 +0800
f157bf765 btrfs: introduce helpers for updating eb uuids ... Browse Code »

The fsid and chunk tree uuid are always located in the first page,
we don't need the to use write_extent_buffer.

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:17 +0800
2230adffe btrfs: delete unused member from superblock ... Browse Code »

__bdev' has never been used since
0b86a832a1f38abec695864ec2eaedc9d2383f1b (2008).

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:17 +0800
62d1f9fe9 btrfs: remove trivial helper btrfs_find_tree_block ... Browse Code »

During the time, the function has been shrunk to the point that it just
calls find_extent_buffer, just passing the parameters.

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:16 +0800
b917bb387 btrfs: reada, remove pointless BUG_ON check for fs_info ... Browse Code »

We dereference fs_info several times, besides that post-mount functions
should never see a NULL fs_info.

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:16 +0800
8694bb613 btrfs: reada, remove pointless BUG_ON in reada_find_extent ... Browse Code »

The lock is held, we make the same lookup that previously failed with
EEXIST and we don't insert NULL pointers.

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:16 +0800
fc2e901f2 btrfs: reada, sink start parameter to btree_readahead_hook ... Browse Code »

Originally, the eb and start were passed separately in case eb is NULL.
Since the readahead has been refactored in 4.6, this is not true anymore
and we can get rid of the parameter.

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:16 +0800
bcdc51b20 btrfs: reada, remove unused parameter from __readahead_hook ... Browse Code »

'start' is not used since "btrfs: reada: Pass reada_extent into
__readahead_hook directly" (6e39dbe8b9e55280c).

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:16 +0800
04998b332 btrfs: reada, cleanup remove unneeded variable in __readahead_hook ... Browse Code »

We can't touch the eb directly in case the function is called with a
non-zero error, so we can read the eb level when needed.

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:15 +0800
ef2fff64f btrfs: rename helper macros for qgroup and aux data casts ... Browse Code »

The helpers are not meant to be generic, the name is misleading. Convert
them to static inlines for type checking.

Signed-off-by: David Sterba

David Sterba
2016-11-30 20:45:15 +0800