Eric Lee / smarc-fsl-linux-kernel

06 Dec, 2018

2 commits

9f268b5cf btrfs: tree-checker: Verify block_group_item ... Browse Code »

commit fce466eab7ac6baa9d2dcd88abcf945be3d4a089 upstream.

A crafted image with invalid block group items could make free space cache
code to cause panic.

We could detect such invalid block group item by checking:
1) Item size
Known fixed value.
2) Block group size (key.offset)
We have an upper limit on block group item (10G)
3) Chunk objectid
Known fixed value.
4) Type
Only 4 valid type values, DATA, METADATA, SYSTEM and DATA|METADATA.
No more than 1 bit set for profile type.
5) Used space
No more than the block group size.

This should allow btrfs to detect and refuse to mount the crafted image.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199849
Reported-by: Xu Wen
Signed-off-by: Qu Wenruo
Reviewed-by: Gu Jinxiang
Reviewed-by: Nikolay Borisov
Tested-by: Gu Jinxiang
Reviewed-by: David Sterba
Signed-off-by: David Sterba
[bwh: Backported to 4.14:
- In check_leaf_item(), pass root->fs_info to check_block_group_item()
- Adjust context]
Signed-off-by: Ben Hutchings
Signed-off-by: Sasha Levin

Qu Wenruo
2018-12-06 02:41:12 +0800
f7eef132c btrfs: validate type when reading a chunk ... Browse Code »

commit 315409b0098fb2651d86553f0436b70502b29bb2 upstream.

Reported in https://bugzilla.kernel.org/show_bug.cgi?id=199839, with an
image that has an invalid chunk type but does not return an error.

Add chunk type check in btrfs_check_chunk_valid, to detect the wrong
type combinations.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199839
Reported-by: Xu Wen
Reviewed-by: Qu Wenruo
Signed-off-by: Gu Jinxiang
Signed-off-by: David Sterba
Signed-off-by: Ben Hutchings
Signed-off-by: Sasha Levin

Gu Jinxiang
2018-12-06 02:41:11 +0800

10 Oct, 2018

1 commit

4e380c50c btrfs: btrfs_shrink_device should call commit transaction at the end ... Browse Code »

[ Upstream commit 801660b040d132f67fac6a95910ad307c5929b49 ]

Test case btrfs/164 reports use-after-free:

[ 6712.084324] general protection fault: 0000 [#1] PREEMPT SMP
..
[ 6712.195423] btrfs_update_commit_device_size+0x75/0xf0 [btrfs]
[ 6712.201424] btrfs_commit_transaction+0x57d/0xa90 [btrfs]
[ 6712.206999] btrfs_rm_device+0x627/0x850 [btrfs]
[ 6712.211800] btrfs_ioctl+0x2b03/0x3120 [btrfs]

Reason for this is that btrfs_shrink_device adds the resized device to
the fs_devices::resized_devices after it has called the last commit
transaction.

So the list fs_devices::resized_devices is not empty when
btrfs_shrink_device returns. Now the parent function
btrfs_rm_device calls:

btrfs_close_bdev(device);
call_rcu(&device->rcu, free_device_rcu);

and then does the transactio ncommit. It goes through the
fs_devices::resized_devices in btrfs_update_commit_device_size and
leads to use-after-free.

Fix this by making sure btrfs_shrink_device calls the last needed
btrfs_commit_transaction before the return. This is consistent with what
the grow counterpart does and this makes sure the on-disk state is
persistent when the function returns.

Reported-by: Lu Fengqi
Tested-by: Lu Fengqi
Signed-off-by: Anand Jain
Reviewed-by: David Sterba
[ update changelog ]
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Anand Jain
2018-10-10 14:54:19 +0800

15 Sep, 2018

1 commit

145b1f56b btrfs: Exit gracefully when chunk map cannot be inserted to the tree ... Browse Code »

[ Upstream commit 64f64f43c89aca1782aa672e0586f6903c5d8979 ]

It's entirely possible that a crafted btrfs image contains overlapping
chunks.

Although we can't detect such problem by tree-checker, it's not a
catastrophic problem, current extent map can already detect such problem
and return -EEXIST.

We just only need to exit gracefully and fail the mount.

Reported-by: Xu Wen
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200409
Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Qu Wenruo
2018-09-15 15:45:32 +0800

21 Jun, 2018

2 commits

7ab8fc065 Btrfs: make raid6 rebuild retry more ... Browse Code »

[ Upstream commit 8810f7517a3bc4ca2d41d022446d3f5fd6b77c09 ]

There is a scenario that can end up with rebuild process failing to
return good content, i.e.
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,

- the 1st retry is to rebuild with all other stripes, it'll eventually
be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
that it will do raid6 style rebuild,

however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content, and users will think of this as data loss.
More seriouly, if the loss happens on some important internal btree
roots, it could refuse to mount.

This extends btrfs to do more retries and each retry fails only one
stripe. Since raid6 can tolerate 2 disk failures, if there is one
more failure besides the failure on which we're recovering, this can
always work.

The worst case is to retry as many times as the number of raid6 disks,
but given the fact that such a scenario is really rare in practice,
it's still acceptable.

Signed-off-by: Liu Bo
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Liu Bo
2018-06-21 03:03:02 +0800
db5f02cc7 Revert "Btrfs: fix scrub to repair raid6 corruption" ... Browse Code »

This reverts commit d91bb7c6988bd6450284c762b33f2e1ea3fe7c97.

This commit used an incorrect log message.

Signed-off-by: Sasha Levin
Reported-by: Ben Hutchings
Signed-off-by: Greg Kroah-Hartman

Sasha Levin
2018-06-21 03:03:01 +0800

23 May, 2018

1 commit

1d16f615b btrfs: fix crash when trying to resume balance without the resume flag ... Browse Code »

commit 02ee654d3a04563c67bfe658a05384548b9bb105 upstream.

We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance()
only, which isn't called during the remount. So when resuming from
the paused balance we hit the bug:

kernel: kernel BUG at fs/btrfs/volumes.c:3890!
::
kernel: balance_kthread+0x51/0x60 [btrfs]
kernel: kthread+0x111/0x130
::
kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8

Reproducer:
On a mounted filesystem:

btrfs balance start --full-balance /btrfs
btrfs balance pause /btrfs
mount -o remount,ro /dev/sdb /btrfs
mount -o remount,rw /dev/sdb /btrfs

To fix this set the BTRFS_BALANCE_RESUME flag in
btrfs_resume_balance_async().

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Anand Jain
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman

Anand Jain
2018-05-23 00:54:01 +0800

26 Apr, 2018

1 commit

d91bb7c69 Btrfs: fix scrub to repair raid6 corruption ... Browse Code »

[ Upstream commit 762221f095e3932669093466aaf4b85ed9ad2ac1 ]

The raid6 corruption is that,
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,

- the 1st retry is to rebuild with all other stripes, it'll eventually
be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
that it will do raid6 style rebuild,

however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content.

We've fixed normal reads to rebuild raid6 correctly with more retries
in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix
scrub to do the exactly same rebuild process.

[1]: https://patchwork.kernel.org/patch/10091755/

Signed-off-by: Liu Bo
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Liu Bo
2018-04-26 17:02:09 +0800

21 Mar, 2018

3 commits

1a8902505 btrfs: Fix memory barriers usage with device stats counters ... Browse Code »

commit 9deae9689231964972a94bb56a79b669f9d47ac1 upstream.

Commit addc3fa74e5b ("Btrfs: Fix the problem that the dirty flag of dev
stats is cleared") reworked the way device stats changes are tracked. A
new atomic dev_stats_ccnt counter was introduced which is incremented
every time any of the device stats counters are changed. This serves as
a flag whether there are any pending stats changes. However, this patch
only partially implemented the correct memory barriers necessary:

- It only ordered the stores to the counters but not the reads e.g.
btrfs_run_dev_stats
- It completely omitted any comments documenting the intended design and
how the memory barriers pair with each-other

This patch provides the necessary comments as well as adds a missing
smp_rmb in btrfs_run_dev_stats. Furthermore since dev_stats_cnt is only
a snapshot at best there was no point in reading the counter twice -
once in btrfs_dev_stats_dirty and then again when assigning stats_cnt.
Just collapse both reads into 1.

Fixes: addc3fa74e5b ("Btrfs: Fix the problem that the dirty flag of dev stats is cleared")
Signed-off-by: Nikolay Borisov
Reviewed-by: Mathieu Desnoyers
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman

Nikolay Borisov
2018-03-21 19:06:44 +0800
cb6945546 btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device ... Browse Code »

commit fd649f10c3d21ee9d7542c609f29978bdf73ab94 upstream.

Commit 4fde46f0cc71 ("Btrfs: free the stale device") introduced
btrfs_free_stale_device which iterates the device lists for all
registered btrfs filesystems and deletes those devices which aren't
mounted. In a btrfs_devices structure has only 1 device attached to it
and it is unused then btrfs_free_stale_devices will proceed to also free
the btrfs_fs_devices struct itself. Currently this leads to a use after
free since list_for_each_entry will try to perform a check on the
already freed memory to see if it has to terminate the loop.

The fix is to use 'break' when we know we are freeing the current
fs_devs.

Fixes: 4fde46f0cc71 ("Btrfs: free the stale device")
Signed-off-by: Nikolay Borisov
Reviewed-by: Anand Jain
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman

Nikolay Borisov
2018-03-21 19:06:44 +0800
0136bd723 btrfs: alloc_chunk: fix DUP stripe size handling ... Browse Code »

commit 92e222df7b8f05c565009c7383321b593eca488b upstream.

In case of using DUP, we search for enough unallocated disk space on a
device to hold two stripes.

The devices_info[ndevs-1].max_avail that holds the amount of unallocated
space found is directly assigned to stripe_size, while it's actually
twice the stripe size.

Later on in the code, an unconditional division of stripe_size by
dev_stripes corrects the value, but in the meantime there's a check to
see if the stripe_size does not exceed max_chunk_size. Since during this
check stripe_size is twice the amount as intended, the check will reduce
the stripe_size to max_chunk_size if the actual correct to be used
stripe_size is more than half the amount of max_chunk_size.

The unconditional division later tries to correct stripe_size, but will
actually make sure we can't allocate more than half the max_chunk_size.

Fix this by moving the division by dev_stripes before the max chunk size
check, so it always contains the right value, instead of putting a duct
tape division in further on to get it fixed again.

Since in all other cases than DUP, dev_stripes is 1, this change only
affects DUP.

Other attempts in the past were made to fix this:
* 37db63a400 "Btrfs: fix max chunk size check in chunk allocator" tried
to fix the same problem, but still resulted in part of the code acting
on a wrongly doubled stripe_size value.
* 86db25785a "Btrfs: fix max chunk size on raid5/6" unintentionally
broke this fix again.

The real problem was already introduced with the rest of the code in
73c5de0051.

The user visible result however will be that the max chunk size for DUP
will suddenly double, while it's actually acting according to the limits
in the code again like it was 5 years ago.

Reported-by: Naohiro Aota
Link: https://www.spinics.net/lists/linux-btrfs/msg69752.html
Fixes: 73c5de0051 ("btrfs: quasi-round-robin for chunk allocation")
Fixes: 86db25785a ("Btrfs: fix max chunk size on raid5/6")
Signed-off-by: Hans van Kranenburg
Reviewed-by: David Sterba
[ update comment ]
Signed-off-by: David Sterba
Signed-off-by: Greg Kroah-Hartman

Hans van Kranenburg
2018-03-21 19:06:44 +0800

03 Mar, 2018

1 commit

d4727e485 btrfs: Fix flush bio leak ... Browse Code »

[ Upstream commit beed9263f4000c48a5c48912f26576f6fa091181 ]

Commit e0ae99941423 ("btrfs: preallocate device flush bio") reworked
the way the flush bio is allocated and used. Concretely it allocates
the bio in __alloc_device and then re-uses it multiple times with a
very simple endio routine that just calls complete() without consuming
a reference. Allocated bios by default come with a ref count of 1,
which is then consumed by the endio routine (or not, in which case they
should be bio_put by the caller). The way the impleementation works now
is that the flush bio has a refcount of 2 and we only ever bio_put it
once, leaving it to hang indefinitely. Fix this by removing the extra
bio_get in __alloc_device.

Fixes: e0ae99941423 ("btrfs: preallocate device flush bio")
Signed-off-by: Nikolay Borisov
Reviewed-by: Liu Bo
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Nikolay Borisov
2018-03-03 17:24:32 +0800

04 Feb, 2018

1 commit

9a8215c0a btrfs: Fix transaction abort during failure in btrfs_rm_dev_item ... Browse Code »

[ Upstream commit 5e9f2ad5b2904a7e81df6d9a3dbef29478952eac ]

btrfs_rm_dev_item calls several function under an active transaction,
however it fails to abort it if an error happens. Fix this by adding
explicit btrfs_abort_transaction/btrfs_end_transaction calls.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Nikolay Borisov
2018-02-04 00:38:54 +0800

20 Dec, 2017

2 commits

da76a65a0 btrfs: undo writable superblocke when sprouting fails ... Browse Code »

[ Upstream commit 0af2c4bf5a012a40a2f9230458087d7f068339d0 ]

When new device is being added to seed FS, seed FS is marked writable,
but when we fail to bring in the new device, we missed to undo the
writable part. This patch fixes it.

Signed-off-by: Anand Jain
Reviewed-by: Nikolay Borisov
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Anand Jain
2017-12-20 17:10:30 +0800
4bcbfac98 btrfs: fix false EIO for missing device ... Browse Code »

[ Upstream commit 102ed2c5ff932439bbbe74c7bd63e6d5baa9f732 ]

When one of the device is missing, bbio_error() takes care of setting
the error status. And if its only IO that is pending in that stripe, it
fails to check the status of the other IO at %bbio_error before setting
the error %bi_status for the %orig_bio. Fix this by checking if
%bbio->error has exceeded the %bbio->max_errors.

Reproducer as below fdatasync error is seen intermittently.

mount -o degraded /dev/sdc /btrfs
dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync

dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error

The reason for the intermittences of the problem is because
the following conditions have to be met, which depends on timing:
In btrfs_map_bio()
- the RAID1 the missing device has to be at %dev_nr = 1
In bbio_error()
. before bbio_error() is called the bio of the not-missing
device at %dev_nr = 0 must be completed so that the below
condition is true
if (atomic_dec_and_test(&bbio->stripes_pending)) {

Signed-off-by: Anand Jain
Reviewed-by: Liu Bo
Signed-off-by: David Sterba
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Anand Jain
2017-12-20 17:10:30 +0800

30 Sep, 2017

1 commit

5ba88cd6e Merge branch 'for-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs fixes from David Sterba:
"We've collected a bunch of isolated fixes, for crashes, user-visible
behaviour or missing bits from other subsystem cleanups from the past.

The overall number is not small but I was not able to make it
significantly smaller. Most of the patches are supposed to go to
stable"

* 'for-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: log csums for all modified extents
Btrfs: fix unexpected result when dio reading corrupted blocks
btrfs: Report error on removing qgroup if del_qgroup_item fails
Btrfs: skip checksum when reading compressed data if some IO have failed
Btrfs: fix kernel oops while reading compressed data
Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block
Btrfs: do not backup tree roots when fsync
btrfs: remove BTRFS_FS_QUOTA_DISABLING flag
btrfs: propagate error to btrfs_cmp_data_prepare caller
btrfs: prevent to set invalid default subvolid
Btrfs: send: fix error number for unknown inode types
btrfs: fix NULL pointer dereference from free_reloc_roots()
btrfs: finish ordered extent cleaning if no progress is found
btrfs: clear ordered flag on cleaning up ordered extents
Btrfs: fix incorrect {node,sector}size endianness from BTRFS_IOC_FS_INFO
Btrfs: do not reset bio->bi_ops while writing bio
Btrfs: use the new helper wbc_to_write_flags

Linus Torvalds
2017-09-30 03:57:35 +0800

26 Sep, 2017

1 commit

bd7d63c2c Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block ... Browse Code »

This seems to be a leftover of commit cf8cddd38bab ("btrfs: don't
abuse REQ_OP_* flags for btrfs_map_block").

It should use btrfs_op() helper to provide one of 'enum btrfs_map_op'
types.

Fixes: cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
Signed-off-by: Liu Bo
Reviewed-by: Satoru Takeuchi
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Liu Bo
2017-09-26 20:53:17 +0800

15 Sep, 2017

1 commit

0f0d12728 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull mount flag updates from Al Viro:
"Another chunk of fmount preparations from dhowells; only trivial
conflicts for that part. It separates MS_... bits (very grotty
mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
only a small subset of MS_... stuff).

This does *not* convert the filesystems to new constants; only the
infrastructure is done here. The next step in that series is where the
conflicts would be; that's the conversion of filesystems. It's purely
mechanical and it's better done after the merge, so if you could run
something like

list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

sed -i -e 's/\/SB_RDONLY/g' \
-e 's/\/SB_NOSUID/g' \
-e 's/\/SB_NODEV/g' \
-e 's/\/SB_NOEXEC/g' \
-e 's/\/SB_SYNCHRONOUS/g' \
-e 's/\/SB_MANDLOCK/g' \
-e 's/\/SB_DIRSYNC/g' \
-e 's/\/SB_NOATIME/g' \
-e 's/\/SB_NODIRATIME/g' \
-e 's/\/SB_SILENT/g' \
-e 's/\/SB_POSIXACL/g' \
-e 's/\/SB_KERNMOUNT/g' \
-e 's/\/SB_I_VERSION/g' \
-e 's/\/SB_LAZYTIME/g' \
$list

and commit it with something along the lines of 'convert filesystems
away from use of MS_... constants' as commit message, it would save a
quite a bit of headache next cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Differentiate mount flags (MS_*) from internal superblock flags
VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags

Linus Torvalds
2017-09-15 09:54:01 +0800

10 Sep, 2017

1 commit

66ba772ee Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs updates from David Sterba:
"The changes range through all types: cleanups, core chagnes, sanity
checks, fixes, other user visible changes, detailed list below:

- deprecated: user transaction ioctl

- mount option ssd does not change allocation alignments

- degraded read-write mount is allowed if all the raid profile
constraints are met, now based on more accurate check

- defrag: do not reset compression afterwards; the NOCOMPRESS flag
can be now overriden by defrag

- prep work for better extent reference tracking (related to the
qgroup slowness with balance)

- prep work for compression heuristics

- memory allocation reductions (may help latencies on a loaded
system)

- better accounting for io waiting states

- error handling improvements (removed BUGs)

- added more sanity checks for shared refs

- fix readdir vs pagefault deadlock under some circumstances

- fix for 'no-hole' mode, certain combination of compressed and
inline extents

- send: fix emission of invalid clone operations

- fixup file mode if setting acls fail

- more fixes from fuzzing

- oher cleanups"

* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (104 commits)
btrfs: submit superblock io with REQ_META and REQ_PRIO
btrfs: remove unnecessary memory barrier in btrfs_direct_IO
btrfs: remove superfluous chunk_tree argument from btrfs_alloc_dev_extent
btrfs: Remove chunk_objectid parameter of btrfs_alloc_dev_extent
btrfs: pass fs_info to btrfs_del_root instead of tree_root
Btrfs: add one more sanity check for shared ref type
Btrfs: remove BUG_ON in __add_tree_block
Btrfs: remove BUG() in add_data_reference
Btrfs: remove BUG() in print_extent_item
Btrfs: remove BUG() in btrfs_extent_inline_ref_size
Btrfs: convert to use btrfs_get_extent_inline_ref_type
Btrfs: add a helper to retrive extent inline ref type
btrfs: scrub: simplify scrub worker initialization
btrfs: scrub: clean up division in scrub_find_csum
btrfs: scrub: clean up division in __scrub_mark_bitmap
btrfs: scrub: use bool for flush_all_writes
btrfs: preserve i_mode if __btrfs_set_acl() fails
btrfs: Remove extraneous chunk_objectid variable
btrfs: Remove chunk_objectid argument from btrfs_make_block_group
btrfs: Remove extra parentheses from condition in copy_items()
...

Linus Torvalds
2017-09-10 04:27:51 +0800

08 Sep, 2017

1 commit

a0725ab0c Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:

- Fix for a registration race in loop, from Anton Volkov.

- Overflow complaint fix from Arnd for DAC960.

- Series of drbd changes from the usual suspects.

- Conversion of the stec/skd driver to blk-mq. From Bart.

- A few BFQ improvements/fixes from Paolo.

- CFQ improvement from Ritesh, allowing idling for group idle.

- A few fixes found by Dan's smatch, courtesy of Dan.

- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.

- A few nbd fixes from Josef.

- Support for cgroup info in blktrace, from Shaohua.

- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.

- Various corner cases and error handling fixes from Weiping Zhang.

- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.

- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.

- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...

Linus Torvalds
2017-09-08 02:59:42 +0800

24 Aug, 2017

2 commits

58efbc9f5 Btrfs: fix blk_status_t/errno confusion ... Browse Code »

This fixes several instances of blk_status_t and bare errno ints being
mixed up, some of which are real bugs.

In the normal case, 0 matches BLK_STS_OK, so we don't observe any
effects of the missing conversion, but in case of errors or passes
through the repair/retry paths, the errors get mixed up.

The changes were identified using 'sparse', we don't have reports of the
buggy behaviour.

Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t")
Signed-off-by: Omar Sandoval
Reviewed-by: Liu Bo
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Omar Sandoval
2017-08-24 23:19:02 +0800
74d46992e block: replace bi_bdev with a gendisk pointer and partitions index ... Browse Code »

This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-08-24 02:49:55 +0800

22 Aug, 2017

2 commits

b5d9071c4 btrfs: remove superfluous chunk_tree argument from btrfs_alloc_dev_extent ... Browse Code »

Currently this function is always called with the object id of the root
key of the chunk_tree, which is always BTRFS_CHUNK_TREE_OBJECTID. So
let's subsume it straight into the function itself. No functional
change.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2017-08-22 00:30:30 +0800
0ca00afb2 btrfs: Remove chunk_objectid parameter of btrfs_alloc_dev_extent ... Browse Code »

THe function is always called with chunk_objectid set to
BTRFS_FIRST_CHUNK_TREE_OBJECTID. Let's collapse the parameter in the
function itself. No functional changes.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2017-08-22 00:30:16 +0800

21 Aug, 2017

3 commits

408fbf19a btrfs: Remove extraneous chunk_objectid variable ... Browse Code »

BTRFS_FIRST_CHUNK_TREE_OBJECTIS id the only objectid being used in the
chunk_tree. So remove a variable which is always set to that value and collapse
its usage in callees which are passed this variable. No functional changes

Signed-off-by: Nikolay Borisov
Signed-off-by: David Sterba

Nikolay Borisov
2017-08-21 23:47:42 +0800
0174484d6 btrfs: Remove chunk_objectid argument from btrfs_make_block_group ... Browse Code »

btrfs_make_block_group is always called with chunk_objectid set to
BTRFS_FIRST_CHUNK_TREE_OBJECTID. There's no reason why this behavior will
change anytime soon, so let's remove the argument and decrease the cognitive
load when reading the code path. No functional change

Signed-off-by: Nikolay Borisov
Signed-off-by: David Sterba

Nikolay Borisov
2017-08-21 23:47:42 +0800
0ce1dd2a4 btrfs: Remove redundant setting of uuid in btrfs_block_header ... Browse Code »

btrfs_alloc_dev_extent currently unconditionally sets the uuid in the
leaf block header the function is working with. This is unnecessary
since this operation is peformed by the core btree handling code
(splitting a node, allocating a new btree block etc). So let's remove
it.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2017-08-21 23:47:42 +0800

18 Aug, 2017

2 commits

db7c942ce btrfs: Remove unused sectorsize variable from struct map_lookup ... Browse Code »

This variable was added in 1abe9b8a138c ("Btrfs: add initial tracepointi
support for btrfs"), yet it never really got used, only assigned to. So
let's remove it.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2017-08-18 22:36:29 +0800
44880fdc9 btrfs: use appropriate define for the fsid ... Browse Code »

Though BTRFS_FSID_SIZE and BTRFS_UUID_SIZE are of the same size, we
should use the matching constant for the fsid buffer.

Signed-off-by: Anand Jain
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Anand Jain
2017-08-18 22:36:29 +0800

16 Aug, 2017

11 commits

9f6d25103 btrfs: use named constant for bdev blocksize ... Browse Code »

Superblock is read and written using buffer heads, we need to set the
bdev blocksize. The magic constant has been hardcoded in several places,
so replace it with a named constant.

Signed-off-by: David Sterba

David Sterba
2017-08-16 22:12:04 +0800
35c70103a btrfs: refactor find_device helper ... Browse Code »

Polish the helper:
* drop underscores, no special meaning here
* pass fs_devices, as this is what the API implements
* drop noinline, no apparent reason for such simple helper
* constify uuid
* add comment

Signed-off-by: David Sterba

David Sterba
2017-08-16 22:12:04 +0800
2dfeca9bf btrfs: merge alloc_device helpers ... Browse Code »

There are two helpers called in chain from one location, we can merge the
functionaliy.

Originally, alloc_fs_devices could fill the device uuid randomly if we
we didn't give the uuid buffer. This happens for seed devices but the
fsid is generated in btrfs_prepare_sprout, so we can remove it.

Signed-off-by: David Sterba

David Sterba
2017-08-16 22:12:04 +0800
e4ff5fb5d btrfs: Remove unused parameters from volume.c functions ... Browse Code »

This also adjusts the respective callers in other files. Those were
found with -Wunused-parameter.

btrfs_full_stripe_len's mapping_tree - introduced by 53b381b3abeb
("Btrfs: RAID5 and RAID6") but it was never really used even in that
commit

btrfs_is_parity_mirror's mirror_num - same as above

chunk_drange_filter's chunk_offset - introduced by 94e60d5a5c4b ("Btrfs:
devid subset filter") and never used.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2017-08-16 22:12:03 +0800
110840bb6 btrfs: Remove unused variables ... Browse Code »

clear_super - usage was removed in commit cea67ab92d3d ("btrfs: clean
the old superblocks before freeing the device") but that change forgot
to remove the actual variable.

max_key - commit 6174d3cb43aa ("Btrfs: remove unused max_key arg from
btrfs_search_forward") removed the max_key parameter but it forgot to
remove references from callers.

stripe_len - this one was added by e06cd3dd7cea ("Btrfs: add validadtion
checks for chunk loading") but even then it wasn't used.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2017-08-16 22:12:03 +0800
500ceed80 btrfs: Remove find_raid56_stripe_len ... Browse Code »

find_raid56_stripe_len statically returns SZ_64K which equals BTRFS_STRIPE_LEN.
It's sole caller is __btrfs_alloc_chunk and it assigns the return value to ai
variable which is already set to BTRFS_STRIPE_LEN. So remove the function
invocation altogether and remove the function itself. Also remove the variable
since it's only aliasing BTRFS_STRIPE_LEN and use the define directly. Use
the occassion to simplify the rounding down of stripe_size now that the value
we want it to align is a power of 2.

Signed-off-by: Nikolay Borisov
Reviewed-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2017-08-16 22:12:03 +0800
c55024514 btrfs: Enhance message when a device is missing during mount ... Browse Code »

For a missing device, btrfs will just refuse to mount with almost
meaningless kernel message like:

BTRFS info (device vdb6): disk space caching is enabled
BTRFS info (device vdb6): has skinny extents
BTRFS error (device vdb6): failed to read the system array: -5
BTRFS error (device vdb6): open_ctree failed

This patch will print a new message about the missing device:

BTRFS info (device vdb6): disk space caching is enabled
BTRFS info (device vdb6): has skinny extents
BTRFS warning (device vdb6): devid 2 uuid 80470722-cad2-4b90-b7c3-fee294552f1b is missing
BTRFS error (device vdb6): failed to read the system array: -5
BTRFS error (device vdb6): open_ctree failed

Signed-off-by: Qu Wenruo
Reviewed-by: Anand Jain
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Qu Wenruo
2017-08-16 22:12:02 +0800
bc3cce237 btrfs: Cleanup num_tolerated_disk_barrier_failures ... Browse Code »

As we use per-chunk degradable check, the global
num_tolerated_disk_barrier_failures is of no use.

We can now remove it.

Signed-off-by: Qu Wenruo
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Qu Wenruo
2017-08-16 22:12:02 +0800
21634a19f btrfs: Introduce a function to check if all chunks a OK for degraded rw mount ... Browse Code »

Introduce a new function, btrfs_check_rw_degradable(), to check if all
chunks in btrfs is OK for degraded rw mount.

It provides the new basis for accurate btrfs mount/remount and even
runtime degraded mount check other than old one-size-fit-all method.

Btrfs currently uses num_tolerated_disk_barrier_failures to do global
check for tolerated missing device.

Although the one-size-fit-all solution is quite safe, it's too strict
if data and metadata has different duplication level.

For example, if one use Single data and RAID1 metadata for 2 disks, it
means any missing device will make the fs unable to be degraded
mounted.

But in fact, some times all single chunks may be in the existing
device and in that case, we should allow it to be rw degraded mounted.

Such case can be easily reproduced using the following script:
# mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
# wipefs -f /dev/sdc
# mount /dev/sdb -o degraded,rw

If using btrfs-debug-tree to check /dev/sdb, one should find that the
data chunk is only in sdb, so in fact it should allow degraded mount.

This patchset will introduce a new per-chunk degradable check for
btrfs, allow above case to succeed, and it's quite small anyway.

Signed-off-by: Qu Wenruo
Signed-off-by: Anand Jain
Reviewed-by: David Sterba
[ copied text from cover letter with more details about the problem being
solved ]
Signed-off-by: David Sterba

Qu Wenruo
2017-08-16 22:12:02 +0800
69f03f137 btrfs: Prevent possible ERR_PTR() dereference ... Browse Code »

In btrfs_full_stripe_len/btrfs_is_parity_mirror we have similar code which
gets the chunk map for a particular range via get_chunk_map. However,
get_chunk_map can return an ERR_PTR value and while the 2 callers do catch
this with a WARN_ON they then proceed to indiscriminately dereference the
extent map. This of course leads to a crash. Fix the offenders by making the
dereference conditional on IS_ERR.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2017-08-16 22:12:02 +0800
f148ef4d3 btrfs: Be explicit about usage of min() ... Browse Code »

__btrfs_alloc_chunk contains code which boils down to:

ndevs = min(ndevs, devs_max)

It's conditional upon devs_max not being 0. However, it cannot really be 0
since it's always set to either BTRFS_MAX_DEVS_SYS_CHUNK or
BTRFS_MAX_DEVS(fs_info->chunk_root). So eliminate the condition check and use
min explicitly. This has no functional changes.

Signed-off-by: Nikolay Borisov
Reviewed-by: David Sterba
Signed-off-by: David Sterba

Nikolay Borisov
2017-08-16 20:19:52 +0800