27 Jul, 2020
6 commits
-
There's no cleanup that occurs so we can simply return 0 directly.
Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
Reviewed-by: Johannes Thumshirn
Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
While at it use the opportunity to simplify find_logical_bio_stripe by
reducing the scope of 'stripe_start' variable and squash the
sector-to-bytes conversion on one line.Reviewed-by: Johannes Thumshirn
Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
Unify the style in the file such that return value of bio_list_pop is
assigned directly in the while loop. This is in line with the rest of
the kernel.Reviewed-by: Johannes Thumshirn
Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
The merging logic is always executed if the current stripe's device
is not missing. So there's no point in duplicating the check. Simply
remove it, while at it reduce the scope of the 'last_end' variable.
If the current stripe's device is missing we fail the stripe early on.Reviewed-by: Johannes Thumshirn
Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
Instead of recording stripe_index and using that to access correct
btrfs_device from btrfs_bio::stripes record the btrfs_device in
btrfs_io_bio. This will enable endio handlers to increment device
error counters on checksum errors.Reviewed-by: Johannes Thumshirn
Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba
24 Mar, 2020
2 commits
-
The struct_size macro does the same calculation and is safe regarding
overflows. Though we're not expecting them to happen, use the helper for
clarity.Reviewed-by: Johannes Thumshirn
Signed-off-by: David Sterba -
There are temporary variables tracking the index of P and Q stripes, but
none of them is really used as such, merely for determining if the Q
stripe is present. This leads to compiler warnings with
-Wunused-but-set-variable and has been reported several times.fs/btrfs/raid56.c: In function ‘finish_rmw’:
fs/btrfs/raid56.c:1199:6: warning: variable ‘p_stripe’ set but not used [-Wunused-but-set-variable]
1199 | int p_stripe = -1;
| ^~~~~~~~
fs/btrfs/raid56.c: In function ‘finish_parity_scrub’:
fs/btrfs/raid56.c:2356:6: warning: variable ‘p_stripe’ set but not used [-Wunused-but-set-variable]
2356 | int p_stripe = -1;
| ^~~~~~~~Replace the two variables with one that has a clear meaning and also get
rid of the warnings. The logic that verifies that there are only 2
valid cases is unchanged.Reviewed-by: Johannes Thumshirn
Signed-off-by: David Sterba
18 Nov, 2019
3 commits
-
In lock_stripe_add() we're caching the bucket for the stripe hash table
just for a single call to dereference the stripe hash.If we just directly call rbio_bucket() we can safe the pointless local
variable.Also move the dereferencing of the stripe hash outside of the variable
declaration block to not break over the 80 characters limit.Reviewed-by: Nikolay Borisov
Signed-off-by: Johannes Thumshirn
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
In lock_stripe_add() we're traversing the stripe hash list and check if
the current list element's raid_map equals is equal to the raid bio's
raid_map. If both are equal we continue processing.If we'd check for inequality instead of equality we can reduce one level
of indentation.Reviewed-by: Nikolay Borisov
Signed-off-by: Johannes Thumshirn
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
Commit 9e0af2376434 ("Btrfs: fix task hang under heavy compressed
write") worked around the issue that a recycled work item could get a
false dependency on the original work item due to how the workqueue code
guarantees non-reentrancy. It did so by giving different work functions
to different types of work.However, the fixes in the previous few patches are more complete, as
they prevent a work item from being recycled at all (except for a tiny
window that the kernel workqueue code handles for us). This obsoletes
the previous fix, so we don't need the unique helpers for correctness.
The only other reason to keep them would be so they show up in stack
traces, but they always seem to be optimized to a tail call, so they
don't show up anyways. So, let's just get rid of the extra indirection.While we're here, rename normal_work_helper() to the more informative
btrfs_work_helper().Reviewed-by: Nikolay Borisov
Reviewed-by: Filipe Manana
Signed-off-by: Omar Sandoval
Reviewed-by: David Sterba
Signed-off-by: David Sterba
09 Sep, 2019
1 commit
-
Reviewed-by: Johannes Thumshirn
Signed-off-by: David Sterba
30 Apr, 2019
1 commit
-
We only have two callers that need the integer loop iterator, and they
can easily maintain it themselves.Suggested-by: Matthew Wilcox
Reviewed-by: Johannes Thumshirn
Acked-by: David Sterba
Reviewed-by: Hannes Reinecke
Acked-by: Coly Li
Reviewed-by: Matthew Wilcox
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
27 Mar, 2019
1 commit
-
Pull btrfs fixes from David Sterba:
- fsync fixes: i_size for truncate vs fsync, dio vs buffered during
snapshotting, remove complicated but incomplete assertion- removed excessive warnigs, misreported device stats updates
- fix raid56 page mapping for 32bit arch
- fixes reported by static analyzer
* tag 'for-5.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix assertion failure on fsync with NO_HOLES enabled
btrfs: Avoid possible qgroup_rsv_size overflow in btrfs_calculate_inode_block_rsv_size
btrfs: Fix bound checking in qgroup_trace_new_subtree_blocks
btrfs: raid56: properly unmap parity page in finish_parity_scrub()
btrfs: don't report readahead errors and don't update statistics
Btrfs: fix file corruption after snapshotting due to mix of buffered/DIO writes
btrfs: remove WARN_ON in log_dir_items
Btrfs: fix incorrect file size after shrinking truncate and fsync
19 Mar, 2019
1 commit
-
Parity page is incorrectly unmapped in finish_parity_scrub(), triggering
a reference counter bug on i386, i.e.:[ 157.662401] kernel BUG at mm/highmem.c:349!
[ 157.666725] invalid opcode: 0000 [#1] SMP PTIThe reason is that kunmap(p_page) was completely left out, so we never
did an unmap for the p_page and the loop unmapping the rbio page was
iterating over the wrong number of stripes: unmapping should be done
with nr_data instead of rbio->real_stripes.Test case to reproduce the bug:
- create a raid5 btrfs filesystem:
# mkfs.btrfs -m raid5 -d raid5 /dev/sdb /dev/sdc /dev/sdd /dev/sde- mount it:
# mount /dev/sdb /mnt- run btrfs scrub in a loop:
# while :; do btrfs scrub start -BR /mnt; doneBugLink: https://bugs.launchpad.net/bugs/1812845
Fixes: 5a6ac9eacb49 ("Btrfs, raid56: support parity scrub on raid56")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Johannes Thumshirn
Signed-off-by: Andrea Righi
Reviewed-by: David Sterba
Signed-off-by: David Sterba
15 Feb, 2019
1 commit
-
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.Reviewed-by: Omar Sandoval
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
17 Dec, 2018
1 commit
-
The typos accumulate over time so once in a while time they get fixed in
a large patch.Signed-off-by: Andrea Gelmini
Reviewed-by: David Sterba
Signed-off-by: David Sterba
06 Aug, 2018
9 commits
-
Add fall-back code to catch failure of full_stripe_write. Proper error
handling from inside run_plug would need more code restructuring as it's
called at arbitrary points by io scheduler.Signed-off-by: David Sterba
-
There's only one call site of the unlocked helper so it can be folded
into the caller.Signed-off-by: David Sterba
-
Signed-off-by: David Sterba
-
Signed-off-by: David Sterba
-
Signed-off-by: David Sterba
-
Add helper that schedules a given function to run on the rmw workqueue.
This will replace several standalone helpers.Signed-off-by: David Sterba
-
The helper is trivial and marked as deprecated.
Signed-off-by: David Sterba
-
Remove includes if none of the interfaces and exports is used in the
given source file.Signed-off-by: David Sterba
-
Use the helper that's possibly optimized for full page copies.
Signed-off-by: David Sterba
30 May, 2018
1 commit
-
In the quest to remove all stack VLA usage from the kernel[1], this
allocates the working buffers during regular init, instead of using stack
space. This refactors the allocation code a bit to make it easier
to review.[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook
Reviewed-by: David Sterba
Signed-off-by: David Sterba
12 Apr, 2018
1 commit
-
Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Add the
SPDX header.Signed-off-by: David Sterba
31 Mar, 2018
1 commit
-
Rebuild on missing device is as same as recover, after it's done, rbio
has data which is consistent with on-disk data, so it can be cached to
avoid further reads.Signed-off-by: Liu Bo
Signed-off-by: Liu Bo
Signed-off-by: David Sterba
26 Mar, 2018
1 commit
-
async_missing_raid56() is identical to async_read_rebuild().
Signed-off-by: Liu Bo
Reviewed-by: David Sterba
Signed-off-by: David Sterba
15 Mar, 2018
1 commit
-
On detaching of a disk which is a part of a RAID6 filesystem, the
following kernel OOPS may happen:[63122.680461] BTRFS error (device sdo): bdev /dev/sdo errs: wr 0, rd 0, flush 1, corrupt 0, gen 0
[63122.719584] BTRFS warning (device sdo): lost page write due to IO error on /dev/sdo
[63122.719587] BTRFS error (device sdo): bdev /dev/sdo errs: wr 1, rd 0, flush 1, corrupt 0, gen 0
[63122.803516] BTRFS warning (device sdo): lost page write due to IO error on /dev/sdo
[63122.803519] BTRFS error (device sdo): bdev /dev/sdo errs: wr 2, rd 0, flush 1, corrupt 0, gen 0
[63122.863902] BTRFS critical (device sdo): fatal error on device /dev/sdo
[63122.935338] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[63122.946554] IP: fail_bio_stripe+0x58/0xa0 [btrfs]
[63122.958185] PGD 9ecda067 P4D 9ecda067 PUD b2b37067 PMD 0
[63122.971202] Oops: 0000 [#1] SMP
[63123.006760] CPU: 0 PID: 3979 Comm: kworker/u8:9 Tainted: G W 4.14.2-16-scst34x+ #8
[63123.007091] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[63123.007402] Workqueue: btrfs-worker btrfs_worker_helper [btrfs]
[63123.007595] task: ffff880036ea4040 task.stack: ffffc90006384000
[63123.007796] RIP: 0010:fail_bio_stripe+0x58/0xa0 [btrfs]
[63123.007968] RSP: 0018:ffffc90006387ad8 EFLAGS: 00010287
[63123.008140] RAX: 0000000000000002 RBX: ffff88004beaa0b8 RCX: ffff8800b2bd5690
[63123.008359] RDX: 0000000000000000 RSI: ffff88007bb43500 RDI: ffff88004beaa000
[63123.008621] RBP: ffffc90006387ae8 R08: 0000000099100000 R09: ffff8800b2bd5600
[63123.008840] R10: 0000000000000004 R11: 0000000000010000 R12: ffff88007bb43500
[63123.009059] R13: 00000000fffffffb R14: ffff880036fc5180 R15: 0000000000000004
[63123.009278] FS: 0000000000000000(0000) GS:ffff8800b7000000(0000) knlGS:0000000000000000
[63123.009564] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63123.009748] CR2: 0000000000000080 CR3: 00000000b0866000 CR4: 00000000000406f0
[63123.009969] Call Trace:
[63123.010085] raid_write_end_io+0x7e/0x80 [btrfs]
[63123.010251] bio_endio+0xa1/0x120
[63123.010378] generic_make_request+0x218/0x270
[63123.010921] submit_bio+0x66/0x130
[63123.011073] finish_rmw+0x3fc/0x5b0 [btrfs]
[63123.011245] full_stripe_write+0x96/0xc0 [btrfs]
[63123.011428] raid56_parity_write+0x117/0x170 [btrfs]
[63123.011604] btrfs_map_bio+0x2ec/0x320 [btrfs]
[63123.011759] ? ___cache_free+0x1c5/0x300
[63123.011909] __btrfs_submit_bio_done+0x26/0x50 [btrfs]
[63123.012087] run_one_async_done+0x9c/0xc0 [btrfs]
[63123.012257] normal_work_helper+0x19e/0x300 [btrfs]
[63123.012429] btrfs_worker_helper+0x12/0x20 [btrfs]
[63123.012656] process_one_work+0x14d/0x350
[63123.012888] worker_thread+0x4d/0x3a0
[63123.013026] ? _raw_spin_unlock_irqrestore+0x15/0x20
[63123.013192] kthread+0x109/0x140
[63123.013315] ? process_scheduled_works+0x40/0x40
[63123.013472] ? kthread_stop+0x110/0x110
[63123.013610] ret_from_fork+0x25/0x30
[63123.014469] RIP: fail_bio_stripe+0x58/0xa0 [btrfs] RSP: ffffc90006387ad8
[63123.014678] CR2: 0000000000000080
[63123.016590] ---[ end trace a295ea7259c17880 ]—This is reproducible in a cycle, where a series of writes is followed by
SCSI device delete command. The test may take up to few minutes.Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index")
[ no signed-off-by provided ]
Author: Dmitriy Gorokh
Reviewed-by: Liu Bo
Reviewed-by: David Sterba
Signed-off-by: David Sterba
22 Jan, 2018
8 commits
-
Before rbio_orig_end_io() goes to free rbio, rbio may get merged with
more bios from other rbios and rbio->bio_list becomes non-empty,
in that case, these newly merged bios don't end properly.Once unlock_stripe() is done, rbio->bio_list will not be updated any
more and we can call bio_endio() on all queued bios.It should only happen in error-out cases, the normal path of recover
and full stripe write have already set RBIO_RMW_LOCKED_BIT to disable
merge before doing IO, so rbio_orig_end_io() called by them doesn't
have the above issue.Reported-by: Jérôme Carretero
Signed-off-by: Liu Bo
Signed-off-by: David Sterba -
Since raid6 recover tries all possible combinations of failed stripes,
- when raid6 rebuild algorithm is used, i.e. raid6_datap_recov() and
raid6_2data_recov(), it may change the in-memory content of failed
stripes, if such a raid bio is cached, a later raid write rmw or recover
can steal @stripe_pages from it instead of reading from disks, such that
it carries the wrong content to do write rmw or recovery and ends up
with corruption or recovery failures.- when raid5 rebuild algorithm is used, i.e. xor, raid bio can be cached
because the only failed stripe which contains @rbio->bio_pages gets
modified, others remain the same so that their in-memory content is
consistent with their on-disk content.This adds a check to skip caching rbio if using raid6 recover.
Signed-off-by: Liu Bo
Signed-off-by: David Sterba -
Bio iterated by set_bio_pages_uptodate() is raid56 internal one, so it
will never be a BIO_CLONED bio, and since this is called by end_io
functions, bio->bi_iter.bi_size is zero, we mustn't use
bio_for_each_segment() as that is a no-op if bi_size is zero.Fixes: 6592e58c6b68e61f003a01ba29a3716e7e2e9484 ("Btrfs: fix write corruption due to bio cloning on raid5/6")
Cc: # v4.12-rc6+
Signed-off-by: Liu Bo
Signed-off-by: David Sterba -
Since fail stripe index in rbio would be used to decide which
algorithm reconstruction would be run, we cannot merge rbios if
their's fail striped indexes are different, otherwise, one of the two
reconstructions would fail.Signed-off-by: Liu Bo
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
Given the above
'
if (last->operation != cur->operation)
return 0;
',
it's guaranteed that two operations are same.Signed-off-by: Liu Bo
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
There is a scenario that can end up with rebuild process failing to
return good content, i.e.
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,- the 1st retry is to rebuild with all other stripes, it'll eventually
be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
that it will do raid6 style rebuild,however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content, and users will think of this as data loss.
More seriouly, if the loss happens on some important internal btree
roots, it could refuse to mount.This extends btrfs to do more retries and each retry fails only one
stripe. Since raid6 can tolerate 2 disk failures, if there is one
more failure besides the failure on which we're recovering, this can
always work.The worst case is to retry as many times as the number of raid6 disks,
but given the fact that such a scenario is really rare in practice,
it's still acceptable.Signed-off-by: Liu Bo
Signed-off-by: David Sterba -
In fact nobody is waiting on @wait's waitqueue, it can be safely
removed.Signed-off-by: Liu Bo
Reviewed-by: David Sterba
Signed-off-by: David Sterba -
The defined wait is not used anywhere.
Signed-off-by: Liu Bo
Reviewed-by: David Sterba
Signed-off-by: David Sterba
30 Oct, 2017
1 commit
-
The local bio_list may have pending bios when doing cleanup, it can
end up with memory leak if they don't get freed.Signed-off-by: Liu Bo
Reviewed-by: David Sterba
Signed-off-by: David Sterba