17 Jan, 2021
1 commit
-
commit aebf5db917055b38f4945ed6d621d9f07a44ff30 upstream.
Make sure that bdgrab() is done on the 'block_device' instance before
referring to it for avoiding use-after-free.Cc:
Reported-by: syzbot+825f0f9657d4e528046e@syzkaller.appspotmail.com
Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman
13 Nov, 2020
1 commit
-
Return if the function ended up sending an uevent or not.
Cc: stable@vger.kernel.org # v5.9
Signed-off-by: Christoph Hellwig
Reviewed-by: Petr Vorel
Signed-off-by: Jens Axboe
06 Oct, 2020
1 commit
-
All remaining callers of bdget() outside of fs/block_dev.c want to get a
reference to the struct block_device for a given struct hd_struct. Add
a helper just for that and then mark bdget static.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
25 Sep, 2020
1 commit
-
No need to go through the hd_struct to find the partition number.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
24 Sep, 2020
2 commits
-
Use blkdev_get_by_dev instead of open coding it using bdget_disk +
blkdev_get, and split the code to read the partition table into a
separate helper to make it a little more obvious.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
We can only scan for partitions on the whole disk, so move the flag
from struct block_device to struct gendisk.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
10 Sep, 2020
1 commit
-
Like check_disk_changed, except that it does not call ->revalidate_disk
but leaves that to the caller.Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe
02 Sep, 2020
5 commits
-
Only virtio_blk and xen-blkfront set the revalidate argument to true,
and both do not implement the ->revalidate_disk method. So switch
to the helper that just updates the size instead.Signed-off-by: Christoph Hellwig
Reviewed-by: Josef Bacik
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe -
Replace bd_invalidate with a new BDEV_NEED_PART_SCAN flag in a bd_flags
variable to better describe the condition.Signed-off-by: Christoph Hellwig
Reviewed-by: Josef Bacik
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe -
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
We can trivially derive the gendisk from the hd_struct.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Use early returns and goto-based unwinding to simplify the flow a bit.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
01 Aug, 2020
1 commit
-
Drop the repeated word "to" in multiple places.
Signed-off-by: Randy Dunlap
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe
18 Jul, 2020
1 commit
-
In order to improve consistency and usability in cgroup stat accounting,
we would like to support the root cgroup's io.stat.Since the root cgroup has processes doing io even if the system has no
explicitly created cgroups, we need to be careful to avoid overhead in
that case. For that reason, the rstat algorithms don't handle the root
cgroup, so just turning the file on wouldn't give correct statistics.To get around this, we simulate flushing the iostat struct by filling it
out directly from global disk stats. The result is a root cgroup io.stat
file consistent with both /proc/diskstats and io.stat.Note that in order to collect the disk stats, we needed to iterate over
devices. To facilitate that, we had to change the linkage of a disk_type
to external so that it can be used from blk-cgroup.c to iterate over
disks.Suggested-by: Tejun Heo
Signed-off-by: Boris Burkov
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe
09 Jul, 2020
1 commit
-
md is the last driver using the legacy media_changed method. Switch
it over to (not so) new ->clear_events approach, which also removes the
need for the ->revalidate_disk method.Signed-off-by: Christoph Hellwig
[axboe: remove unused 'bdops' variable in disk_clear_events()]
Signed-off-by: Jens Axboe
24 Jun, 2020
3 commits
-
Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") merged on
v4.12 moved the work behind blk_release_queue() into a workqueue after a
splat floated around which indicated some work on blk_release_queue()
could sleep in blk_exit_rl(). This splat would be possible when a driver
called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue()
as its final call) from an atomic context.blk_put_queue() decrements the refcount for the request_queue kobject, and
upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is
now removed through commit db6d99523560 ("block: remove request_list code")
on v5.0, we reserve the right to be able to sleep within
blk_release_queue() context.The last reference for the request_queue must not be called from atomic
context. *When* the last reference to the request_queue reaches 0 varies,
and so let's take the opportunity to document when that is expected to
happen and also document the context of the related calls as best as
possible so we can avoid future issues, and with the hopes that the
synchronous request_queue removal sticks.We revert back to synchronous request_queue removal because asynchronous
removal creates a regression with expected userspace interaction with
several drivers. An example is when removing the loopback driver, one
uses ioctls from userspace to do so, but upon return and if successful,
one expects the device to be removed. Likewise if one races to add another
device the new one may not be added as it is still being removed. This was
expected behavior before and it now fails as the device is still present
and busy still. Moving to asynchronous request_queue removal could have
broken many scripts which relied on the removal to have been completed if
there was no error. Document this expectation as well so that this
doesn't regress userspace again.Using asynchronous request_queue removal however has helped us find
other bugs. In the future we can test what could break with this
arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE.While at it, update the docs with the context expectations for the
request_queue / gendisk refcount decrement, and make these
expectations explicit by using might_sleep().Fixes: dc9edc44de6c ("block: Fix a blk_exit_rl() regression")
Suggested-by: Nicolai Stange
Signed-off-by: Luis Chamberlain
Reviewed-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Cc: Bart Van Assche
Cc: Omar Sandoval
Cc: Hannes Reinecke
Cc: Nicolai Stange
Cc: Greg Kroah-Hartman
Cc: Michal Hocko
Cc: yu kuai
Signed-off-by: Jens Axboe -
Let us clarify the context under which the helpers to increment the
refcount for the gendisk and request_queue can be called under. We
make this explicit on the places where we may sleep with might_sleep().We don't address the decrement context yet, as that needs some extra
work and fixes, but will be addressed in the next patch.Signed-off-by: Luis Chamberlain
Reviewed-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe -
This adds documentation for the gendisk / request_queue refcount
helpers.Signed-off-by: Luis Chamberlain
Reviewed-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe
27 May, 2020
2 commits
-
The RCU lock is required only in disk_map_sector_rcu() to lookup the
partition. After that request holds reference to related hd_struct.Replace get_cpu() with preempt_disable() - returned cpu index is unused.
[hch: rebased]
Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
percpu variables have a perfectly fine working stub implementation
for UP kernels, so use that.Signed-off-by: Christoph Hellwig
Reviewed-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe
19 May, 2020
2 commits
-
part_inc_in_flight and part_dec_in_flight only have one caller each, and
those callers are purely for bio based drivers. Merge each function into
the only caller, and remove the superflous blk-mq checks.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Don't bother to call part_in_flight / part_in_flight_rw on blk-mq
devices, just call the blk-mq versions directly.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
13 May, 2020
3 commits
-
gendisk can't be gone when there is IO activity, so not hold
part0's refcount in IO path.Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Cc: Yufen Yu
Cc: Christoph Hellwig
Cc: Hou Tao
Signed-off-by: Jens Axboe -
The seqcount of 'nr_sects_seq' is only needed in case of 32bit SMP,
so define it just for 32bit SMP.Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Cc: Yufen Yu
Cc: Christoph Hellwig
Cc: Hou Tao
Signed-off-by: Jens Axboe -
delete_partition() clears the cached last_lookup partition. However the
.last_lookup cache may be overwritten by one IO path after it is cleared
from delete_partition(). Then another IO path may use the cached deleting
partition after hd_struct_free() is called, then use-after-free is triggered
on the cached partition.Fixes the issue by the following approach:
1) always get the partition's refcount via hd_struct_try_get() before
setting .last_lookup2) move clearing .last_lookup from delete_partition() to hd_struct_free()
which is the release handle of the partition's percpu-refcount, so that no
IO path can cache deleteing partition via .last_lookup.It is one candidate approach of Yufen's patch[1] which adds overhead
in fast path by indirect lookup which may introduce one extra cacheline
in IO path. Also this patch relies on percpu-refcount's protection, and
it is easier to understand and verify.[1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#t
Reported-by: Yufen Yu
Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Cc: Christoph Hellwig
Cc: Hou Tao
Signed-off-by: Jens Axboe
10 May, 2020
1 commit
-
Split out a new bdi_set_owner helper to set the owner, and move the policy
for creating the bdi name back into genhd.c, where it belongs.Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Reviewed-by: Greg Kroah-Hartman
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe
21 Apr, 2020
3 commits
-
invalidate_partition and bdev_unhash_inode are always paired, and
invalidate_partition already does an icache lookup for the block device
inode. Piggy back on that to remove the inode from the hash.Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe -
invalidate_partition is only used in genhd.c, so mark it static. Also
drop the return value given that is is always ignored.Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe -
All callers have the hd_struct at hand, so pass it instead of performing
another lookup.Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe
31 Mar, 2020
1 commit
-
Pull block updates from Jens Axboe:
- Online capacity resizing (Balbir)
- Number of hardware queue change fixes (Bart)
- null_blk fault injection addition (Bart)
- Cleanup of queue allocation, unifying the node/no-node API
(Christoph)- Cleanup of genhd, moving code to where it makes sense (Christoph)
- Cleanup of the partition handling code (Christoph)
- disk stat fixes/improvements (Konstantin)
- BFQ improvements (Paolo)
- Various fixes and improvements
* tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block: (72 commits)
block: return NULL in blk_alloc_queue() on error
block: move bio_map_* to blk-map.c
Revert "blkdev: check for valid request queue before issuing flush"
block: simplify queue allocation
bcache: pass the make_request methods to blk_queue_make_request
null_blk: use blk_mq_init_queue_data
block: add a blk_mq_init_queue_data helper
block: move the ->devnode callback to struct block_device_operations
block: move the part_stat* helpers from genhd.h to a new header
block: move block layer internals out of include/linux/genhd.h
block: move guard_bio_eod to bio.c
block: unexport get_gendisk
block: unexport disk_map_sector_rcu
block: unexport disk_get_part
block: mark part_in_flight and part_in_flight_rw static
block: mark block_depr static
block: factor out requeue handling from dispatch code
block/diskstats: replace time_in_queue with sum of request times
block/diskstats: accumulate all per-cpu counters in one pass
block/diskstats: more accurate approximation of io_ticks for slow disks
...
27 Mar, 2020
1 commit
-
There really isn't any good reason to stash a method directly into
struct gendisk. Move it together with the other block device
operations.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
25 Mar, 2020
7 commits
-
get_gendisk is not used by any modular code.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
disk_map_sector_rcu is not used by any modular code.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
disk_get_part is not used by any modular code.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Column "time_in_queue" in diskstats is supposed to show total waiting time
of all requests. I.e. value should be equal to the sum of times from other
columns. But this is not true, because column "time_in_queue" is counted
separately in jiffies rather than in nanoseconds as other times.This patch removes redundant counter for "time_in_queue" and shows total
time of read, write, discard and flush requests.Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe -
Reading /proc/diskstats iterates over all cpus for summing each field.
It's faster to sum all fields in one pass.Hammering /proc/diskstats with fio shows 2x performance improvement:
fio --name=test --numjobs=$JOBS --filename=/proc/diskstats \
--size=1k --bs=1k --fallocate=none --create_on_open=1 \
--time_based=1 --runtime=10 --invalidate=0 --group_reportJOBS=1 JOBS=10
Before: 7k iops 64k iops
After: 18k iops 120k iopsAlso this way code is more compact:
add/remove: 1/0 grow/shrink: 0/2 up/down: 194/-1540 (-1346)
Function old new delta
part_stat_read_all - 194 +194
diskstats_show 1344 631 -713
part_stat_show 1219 392 -827
Total: Before=14966947, After=14965601, chg -0.01%Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe
24 Mar, 2020
2 commits
-
Move the sysfs _show methods that are used both on the full disk and
partition nodes to genhd.c instead of hiding them in the partitioning
code. Also move the declaration for these methods to block/blk.h so
that we don't expose them to drivers.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Thes functions aren't really related to partition support, so move them
to a more suitable place.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe