Eric Lee / smarc-fsl-linux-kernel

25 Mar, 2020

3 commits

8cd5b8fc0 block/diskstats: replace time_in_queue with sum of request times ... Browse Code »

Column "time_in_queue" in diskstats is supposed to show total waiting time
of all requests. I.e. value should be equal to the sum of times from other
columns. But this is not true, because column "time_in_queue" is counted
separately in jiffies rather than in nanoseconds as other times.

This patch removes redundant counter for "time_in_queue" and shows total
time of read, write, discard and flush requests.

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe

Konstantin Khlebnikov
2020-03-25 22:49:12 +0800
ea18e0f0a block/diskstats: accumulate all per-cpu counters in one pass ... Browse Code »

Reading /proc/diskstats iterates over all cpus for summing each field.
It's faster to sum all fields in one pass.

Hammering /proc/diskstats with fio shows 2x performance improvement:

fio --name=test --numjobs=$JOBS --filename=/proc/diskstats \
--size=1k --bs=1k --fallocate=none --create_on_open=1 \
--time_based=1 --runtime=10 --invalidate=0 --group_report

JOBS=1 JOBS=10
Before: 7k iops 64k iops
After: 18k iops 120k iops

Also this way code is more compact:

add/remove: 1/0 grow/shrink: 0/2 up/down: 194/-1540 (-1346)
Function old new delta
part_stat_read_all - 194 +194
diskstats_show 1344 631 -713
part_stat_show 1219 392 -827
Total: Before=14966947, After=14965601, chg -0.01%

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe

Konstantin Khlebnikov
2020-03-25 22:49:10 +0800
2b8bd4236 block/diskstats: more accurate approximation of io_ticks for slow disks ... Browse Code »

Currently io_ticks is approximated by adding one at each start and end of
requests if jiffies counter has changed. This works perfectly for requests
shorter than a jiffy or if one of requests starts/ends at each jiffy.

If disk executes just one request at a time and they are longer than two
jiffies then only first and last jiffies will be accounted.

Fix is simple: at the end of request add up into io_ticks jiffies passed
since last update rather than just one jiffy.

Example: common HDD executes random read 4k requests around 12ms.

fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
iostat -x 10 sdb

Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch:

Before:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43

After:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99

Now io_ticks does not loose time between start and end of requests, but
for queue-depth > 1 some I/O time between adjacent starts might be lost.

For load estimation "%util" is not as useful as average queue length,
but it clearly shows how often disk queue is completely empty.

Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: Konstantin Khlebnikov
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Konstantin Khlebnikov
2020-03-25 22:49:08 +0800

24 Mar, 2020

21 commits

387048bf6 block: merge partition-generic.c and check.c ... Browse Code »

Merge block/partition-generic.c and block/partitions/check.c into
a single block/partitions/core.c as the content is closely related
and both files are tiny.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
3f4fc59c1 block: move the various x86 Unix label formats out of genhd.h ... Browse Code »

All these are just used in block/partitions/msdos.c, so move them out of the
genhd.h driver included by every driver.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
cb0ab5265 partitions/msdos: remove LINUX_SWAP_PARTITION ... Browse Code »

Just always use NEW_SOLARIS_X86_PARTITION and explain the situation,
as that is less confusing than two names for a single value.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
0226e9ead block: move the *_PARTITION enum out of genhd.h ... Browse Code »

The enum containing the *_PARTITION symbolic names is only relevant
for the partition parser. More specifically most values are MSDOS
partition table system indicators and thus should go straight into
msdos.c. One value is only used by the sun partition parser, and the
sun and sgi partition parsers use the same value as the x86 Linux
RAID indicator to also indicate RAID autodetection. Duplicate them
in sun.c and sgi.c given that the different partition types use
entirely different values otherwise.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
1442f76d4 block: move struct partition out of genhd.h ... Browse Code »

struct partition is the on-disk format of a MSDOS partition table entry.
Move it out of genhd.h into a new msdos_partition.h header and give it
a msdos_ prefix to avoid confusion.
Also move the magic number from block/partitions/msdos.h to the new
header so that it can be used by the SCSI drivers looking at the DOS
partition tables.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
cbb5cb3b2 block: remove block/partitions/sun.h ... Browse Code »

Just move the two defines to block/partitions/sun.c.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
95f77ef35 block: remove block/partitions/sgi.h ... Browse Code »

Just move the single define to block/partitions/sgi.c.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
3466f63a7 block: remove block/partitions/osf.h ... Browse Code »

Just move the single define to block/partitions/osf.c.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
f6d17358d block: remove block/partitions/karma.h ... Browse Code »

Just move the single define to block/partitions/karma.c.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
3f1b95ef8 block: declare all partition detection routines in check.h ... Browse Code »

There is no good reason to include one header per partition type in
core.c. Instead move the prototypes for the detection routins to
check.h, and remove all now empty headers in block/partitions/.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
ffa9ed647 block: remove warn_no_part ... Browse Code »

The warn_no_part is initialized to 1 and never changed. Remove
it and execute the code keyed off from it unconditionally.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
74cc979c3 block: cleanup how md_autodetect_dev is called ... Browse Code »

Add a new include/linux/raid/detect.h header to declare the
md_autodetect_dev prototype which can be shared between md and
the partition code. Then use IS_BUILTIN to call it instead of the
ifdef magic.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
1a9fba3a7 block: unexport read_dev_sector and put_dev_sector ... Browse Code »

read_dev_sector and put_dev_sector are now only used by the partition
parsing code. Remove the export for read_dev_sector and merge it into
the only caller. Clean the mess up a bit by using goto labels and
the SECTOR_SHIFT constant.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:08 +0800
a10183d74 scsi: simplify scsi_partsize ... Browse Code »

Call scsi_bios_ptable from scsi_partsize instead of requiring boilerplate
code in the callers. Also switch the calling convention to match that
of the ->bios_param instances calling this function, and use true/false
for the return value instead of the weird -1 convention.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:07 +0800
26ae3533a scsi: move scsicam_bios_param to the end of scsicam.c ... Browse Code »

This avoids the need for a forward declaration and generally keeps the
file in the lower level first, high level last order.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:07 +0800
e63105df8 scsi: simplify scsi_bios_ptable ... Browse Code »

Use read_mapping_page and kmemdup instead of the odd read_dev_sector and
put_dev_sector helpers from the partitioning code.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:07 +0800
f17c21c1e block: remove alloc_part_info and free_part_info ... Browse Code »

There isn't any good reason not to simply open code the allocation and
freeing of the partition_meta_info structure. Especially as one of
the branches in alloc_part_info is entirely dead code.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:07 +0800
3ad5cee5c block: move sysfs methods shared by disks and partitions to genhd.c ... Browse Code »

Move the sysfs _show methods that are used both on the full disk and
partition nodes to genhd.c instead of hiding them in the partitioning
code. Also move the declaration for these methods to block/blk.h so
that we don't expose them to drivers.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:07 +0800
5cbd28e3c block: move disk_name and related helpers out of partition-generic.c ... Browse Code »

Thes functions aren't really related to partition support, so move them
to a more suitable place.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:07 +0800
ea3edd4dc block: remove __bdevname ... Browse Code »

There is no good reason for __bdevname to exist. Just open code
printing the string in the callers. For three of them the format
string can be trivially merged into existing printk statements,
and in init/do_mounts.c we can at least do the scnprintf once at
the start of the function, and unconditional of CONFIG_BLOCK to
make the output for tiny configfs a little more helpful.

Acked-by: Theodore Ts'o # for ext4
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:07 +0800
d2332c5c0 block: remove the blk_lookup_devt export ... Browse Code »

This function is only used by init/do_mounts.c, which can't be modular.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:07 +0800

22 Mar, 2020

5 commits

4d38a87fb block, bfq: invoke flush_idle_tree after reparent_active_queues in pd_offline ... Browse Code »

In bfq_pd_offline(), the function bfq_flush_idle_tree() is invoked to
flush the rb tree that contains all idle entities belonging to the pd
(cgroup) being destroyed. In particular, bfq_flush_idle_tree() is
invoked before bfq_reparent_active_queues(). Yet the latter may happen
to add some entities to the idle tree. It happens if, in some of the
calls to bfq_bfqq_move() performed by bfq_reparent_active_queues(),
the queue to move is empty and gets expired.

This commit simply reverses the invocation order between
bfq_flush_idle_tree() and bfq_reparent_active_queues().

Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Paolo Valente
2020-03-22 04:31:03 +0800
576682fa5 block, bfq: make reparent_leaf_entity actually work only on leaf entities ... Browse Code »

bfq_reparent_leaf_entity() reparents the input leaf entity (a leaf
entity represents just a bfq_queue in an entity tree). Yet, the input
entity is guaranteed to always be a leaf entity only in two-level
entity trees. In this respect, because of the error fixed by
commit 14afc5936197 ("block, bfq: fix overwrite of bfq_group pointer
in bfq_find_set_group()"), all (wrongly collapsed) entity trees happened
to actually have only two levels. After the latter commit, this does not
hold any longer.

This commit fixes this problem by modifying
bfq_reparent_leaf_entity(), so that it searches an active leaf entity
down the path that stems from the input entity. Such a leaf entity is
guaranteed to exist when bfq_reparent_leaf_entity() is invoked.

Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Paolo Valente
2020-03-22 04:31:02 +0800
c89977366 block, bfq: turn put_queue into release_process_ref in __bfq_bic_change_cgroup ... Browse Code »

A bfq_put_queue() may be invoked in __bfq_bic_change_cgroup(). The
goal of this put is to release a process reference to a bfq_queue. But
process-reference releases may trigger also some extra operation, and,
to this goal, are handled through bfq_release_process_ref(). So, turn
the invocation of bfq_put_queue() into an invocation of
bfq_release_process_ref().

Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Paolo Valente
2020-03-22 04:31:00 +0800
fd1bb3ae5 block, bfq: move forward the getting of an extra ref in bfq_bfqq_move ... Browse Code »

Commit ecedd3d7e199 ("block, bfq: get extra ref to prevent a queue
from being freed during a group move") gets an extra reference to a
bfq_queue before possibly deactivating it (temporarily), in
bfq_bfqq_move(). This prevents the bfq_queue from disappearing before
being reactivated in its new group.

Yet, the bfq_queue may also be expired (i.e., its service may be
stopped) before the bfq_queue is deactivated. And also an expiration
may lead to a premature freeing. This commit fixes this issue by
simply moving forward the getting of the extra reference already
introduced by commit ecedd3d7e199 ("block, bfq: get extra ref to
prevent a queue from being freed during a group move").

Reported-by: cki-project@redhat.com
Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Paolo Valente
2020-03-22 04:30:58 +0800
2f95fa5c9 block, bfq: fix use-after-free in bfq_idle_slice_timer_body ... Browse Code »

In bfq_idle_slice_timer func, bfqq = bfqd->in_service_queue is
not in bfqd-lock critical section. The bfqq, which is not
equal to NULL in bfq_idle_slice_timer, may be freed after passing
to bfq_idle_slice_timer_body. So we will access the freed memory.

In addition, considering the bfqq may be in race, we should
firstly check whether bfqq is in service before doing something
on it in bfq_idle_slice_timer_body func. If the bfqq in race is
not in service, it means the bfqq has been expired through
__bfq_bfqq_expire func, and wait_request flags has been cleared in
__bfq_bfqd_reset_in_service func. So we do not need to re-clear the
wait_request of bfqq which is not in service.

KASAN log is given as follows:
[13058.354613] ==================================================================
[13058.354640] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0xac/0x290
[13058.354644] Read of size 8 at addr ffffa02cf3e63f78 by task fork13/19767
[13058.354646]
[13058.354655] CPU: 96 PID: 19767 Comm: fork13
[13058.354661] Call trace:
[13058.354667] dump_backtrace+0x0/0x310
[13058.354672] show_stack+0x28/0x38
[13058.354681] dump_stack+0xd8/0x108
[13058.354687] print_address_description+0x68/0x2d0
[13058.354690] kasan_report+0x124/0x2e0
[13058.354697] __asan_load8+0x88/0xb0
[13058.354702] bfq_idle_slice_timer+0xac/0x290
[13058.354707] __hrtimer_run_queues+0x298/0x8b8
[13058.354710] hrtimer_interrupt+0x1b8/0x678
[13058.354716] arch_timer_handler_phys+0x4c/0x78
[13058.354722] handle_percpu_devid_irq+0xf0/0x558
[13058.354731] generic_handle_irq+0x50/0x70
[13058.354735] __handle_domain_irq+0x94/0x110
[13058.354739] gic_handle_irq+0x8c/0x1b0
[13058.354742] el1_irq+0xb8/0x140
[13058.354748] do_wp_page+0x260/0xe28
[13058.354752] __handle_mm_fault+0x8ec/0x9b0
[13058.354756] handle_mm_fault+0x280/0x460
[13058.354762] do_page_fault+0x3ec/0x890
[13058.354765] do_mem_abort+0xc0/0x1b0
[13058.354768] el0_da+0x24/0x28
[13058.354770]
[13058.354773] Allocated by task 19731:
[13058.354780] kasan_kmalloc+0xe0/0x190
[13058.354784] kasan_slab_alloc+0x14/0x20
[13058.354788] kmem_cache_alloc_node+0x130/0x440
[13058.354793] bfq_get_queue+0x138/0x858
[13058.354797] bfq_get_bfqq_handle_split+0xd4/0x328
[13058.354801] bfq_init_rq+0x1f4/0x1180
[13058.354806] bfq_insert_requests+0x264/0x1c98
[13058.354811] blk_mq_sched_insert_requests+0x1c4/0x488
[13058.354818] blk_mq_flush_plug_list+0x2d4/0x6e0
[13058.354826] blk_flush_plug_list+0x230/0x548
[13058.354830] blk_finish_plug+0x60/0x80
[13058.354838] read_pages+0xec/0x2c0
[13058.354842] __do_page_cache_readahead+0x374/0x438
[13058.354846] ondemand_readahead+0x24c/0x6b0
[13058.354851] page_cache_sync_readahead+0x17c/0x2f8
[13058.354858] generic_file_buffered_read+0x588/0xc58
[13058.354862] generic_file_read_iter+0x1b4/0x278
[13058.354965] ext4_file_read_iter+0xa8/0x1d8 [ext4]
[13058.354972] __vfs_read+0x238/0x320
[13058.354976] vfs_read+0xbc/0x1c0
[13058.354980] ksys_read+0xdc/0x1b8
[13058.354984] __arm64_sys_read+0x50/0x60
[13058.354990] el0_svc_common+0xb4/0x1d8
[13058.354994] el0_svc_handler+0x50/0xa8
[13058.354998] el0_svc+0x8/0xc
[13058.354999]
[13058.355001] Freed by task 19731:
[13058.355007] __kasan_slab_free+0x120/0x228
[13058.355010] kasan_slab_free+0x10/0x18
[13058.355014] kmem_cache_free+0x288/0x3f0
[13058.355018] bfq_put_queue+0x134/0x208
[13058.355022] bfq_exit_icq_bfqq+0x164/0x348
[13058.355026] bfq_exit_icq+0x28/0x40
[13058.355030] ioc_exit_icq+0xa0/0x150
[13058.355035] put_io_context_active+0x250/0x438
[13058.355038] exit_io_context+0xd0/0x138
[13058.355045] do_exit+0x734/0xc58
[13058.355050] do_group_exit+0x78/0x220
[13058.355054] __wake_up_parent+0x0/0x50
[13058.355058] el0_svc_common+0xb4/0x1d8
[13058.355062] el0_svc_handler+0x50/0xa8
[13058.355066] el0_svc+0x8/0xc
[13058.355067]
[13058.355071] The buggy address belongs to the object at ffffa02cf3e63e70#012 which belongs to the cache bfq_queue of size 464
[13058.355075] The buggy address is located 264 bytes inside of#012 464-byte region [ffffa02cf3e63e70, ffffa02cf3e64040)
[13058.355077] The buggy address belongs to the page:
[13058.355083] page:ffff7e80b3cf9800 count:1 mapcount:0 mapping:ffff802db5c90780 index:0xffffa02cf3e606f0 compound_mapcount: 0
[13058.366175] flags: 0x2ffffe0000008100(slab|head)
[13058.370781] raw: 2ffffe0000008100 ffff7e80b53b1408 ffffa02d730c1c90 ffff802db5c90780
[13058.370787] raw: ffffa02cf3e606f0 0000000000370023 00000001ffffffff 0000000000000000
[13058.370789] page dumped because: kasan: bad access detected
[13058.370791]
[13058.370792] Memory state around the buggy address:
[13058.370797] ffffa02cf3e63e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb
[13058.370801] ffffa02cf3e63e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370805] >ffffa02cf3e63f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370808] ^
[13058.370811] ffffa02cf3e63f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370815] ffffa02cf3e64000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[13058.370817] ==================================================================
[13058.370820] Disabling lock debugging due to kernel taint

Here, we directly pass the bfqd to bfq_idle_slice_timer_body func.
--
V2->V3: rewrite the comment as suggested by Paolo Valente
V1->V2: add one comment, and add Fixes and Reported-by tag.

Fixes: aee69d78d ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
Acked-by: Paolo Valente
Reported-by: Wang Wang
Signed-off-by: Zhiqiang Liu
Signed-off-by: Feilong Lin
Signed-off-by: Jens Axboe

Zhiqiang Liu
2020-03-22 04:29:44 +0800

19 Mar, 2020

5 commits

78317c5d5 scsi: Convert to use set_capacity_revalidate_and_notify ... Browse Code »

block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE
notifications via uevents. This notification is newly added to scsi sd.

Signed-off-by: Balbir Singh
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Balbir Singh
2020-03-19 05:13:21 +0800
cb224c3af nvme: Convert to use set_capacity_revalidate_and_notify ... Browse Code »

block/genhd provides set_capacity_revalidate_and_notify() for
sending RESIZE notifications via uevents. This notification is
newly added to NVME devices

Signed-off-by: Balbir Singh
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Acked-by: Keith Busch
Signed-off-by: Jens Axboe

Balbir Singh
2020-03-19 05:13:21 +0800
3cbc28bb9 xen-blkfront.c: Convert to use set_capacity_revalidate_and_notify ... Browse Code »

block/genhd provides set_capacity_revalidate_and_notify() for
sending RESIZE notifications via uevents.

Signed-off-by: Balbir Singh
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Balbir Singh
2020-03-19 05:13:21 +0800
662155e28 virtio_blk.c: Convert to use set_capacity_revalidate_and_notify ... Browse Code »

block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE
notifications via uevents.

Signed-off-by: Balbir Singh
Signed-off-by: Jens Axboe

Balbir Singh
2020-03-19 05:13:21 +0800
e598a72fa block/genhd: Notify udev about capacity change ... Browse Code »

Allow block/genhd to notify user space (via udev) about disk size changes
using a new helper set_capacity_revalidate_and_notify(), which is a wrapper
on top of set_capacity(). set_capacity_revalidate_and_notify() will only
notify via udev if the current capacity or the target capacity is not zero
and iff the capacity changes.

Suggested-by: Christoph Hellwig
Signed-off-by: Someswarudu Sangaraju
Signed-off-by: Balbir Singh
Reviewed-by: Bob Liu
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Balbir Singh
2020-03-19 05:13:21 +0800

18 Mar, 2020

2 commits

de6a78b60 block: Prevent hung_check firing during long sync IO ... Browse Code »

submit_bio_wait() can be called from ioctl(BLKSECDISCARD), which
may take long time to complete, as Salman mentioned, 4K BLKSECDISCARD
takes up to 100 second on some devices. Also any block I/O operation
that occurs after the BLKSECDISCARD is submitted will also potentially
be affected by the hung task timeouts.

Another report is that task hang can be observed when running mkfs
over raid10 which takes a small max discard sectors limit because
of chunk size.

So prevent hung_check from firing by taking same approach used
in blk_execute_rq(), and the wake-up interval is set as half the
hung_check timer period, which keeps overhead low enough.

Cc: Salman Qazi
Cc: Jesse Barnes
Cc: Bart Van Assche
Link: https://lkml.org/lkml/2020/2/12/1193
Reported-by: Salman Qazi
Reviewed-by: Jesse Barnes
Reviewed-by: Salman Qazi
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2020-03-18 22:48:03 +0800
d981cb5b9 block: fix a device invalidation regression ... Browse Code »

Historically we only set the capacity to zero for devices that support
partitions (independ of actually having partitions created). Doing that
is rather inconsistent, but changing it broke legacy udisks polling for
legacy ide-cdrom devices. Use the crude a crude check for devices that
either are non-removable or partitionable to get the sane behavior for
most device while not breaking userspace for this particular setup.

Fixes: a1548b674403 ("block: move rescan_partitions to fs/block_dev.c")
Reported-by: He Zhe
Signed-off-by: Christoph Hellwig
Tested-by: He Zhe
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-18 22:47:04 +0800

12 Mar, 2020

4 commits

11bde9860 block, zoned: fix integer overflow with BLKRESETZONE et al ... Browse Code »

Check for overflow in addition before checking for end-of-block-device.

Steps to reproduce:

#define _GNU_SOURCE 1
#include
#include
#include
#include

typedef unsigned long long __u64;

struct blk_zone_range {
__u64 sector;
__u64 nr_sectors;
};

#define BLKRESETZONE _IOW(0x12, 131, struct blk_zone_range)

int main(void)
{
int fd = open("/dev/nullb0", O_RDWR|O_DIRECT);
struct blk_zone_range zr = {4096, 0xfffffffffffff000ULL};
ioctl(fd, BLKRESETZONE, &zr);
return 0;
}

BUG: KASAN: null-ptr-deref in submit_bio_wait+0x74/0xe0
Write of size 8 at addr 0000000000000040 by task a.out/1590

CPU: 8 PID: 1590 Comm: a.out Not tainted 5.6.0-rc1-00019-g359c92c02bfa #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
Call Trace:
dump_stack+0x76/0xa0
__kasan_report.cold+0x5/0x3e
kasan_report+0xe/0x20
submit_bio_wait+0x74/0xe0
blkdev_zone_mgmt+0x26f/0x2a0
blkdev_zone_mgmt_ioctl+0x14b/0x1b0
blkdev_ioctl+0xb28/0xe60
block_ioctl+0x69/0x80
ksys_ioctl+0x3af/0xa50

Reviewed-by: Christoph Hellwig
Signed-off-by: Alexey Dobriyan (SK hynix)
Signed-off-by: Jens Axboe

Alexey Dobriyan
2020-03-12 23:10:52 +0800
fa800d73c blk-iocost: remove duplicated lines in comments ... Browse Code »

Acked-by: Tejun Heo
Signed-off-by: Weiping Zhang
Signed-off-by: Jens Axboe

Weiping Zhang
2020-03-12 22:02:20 +0800
88d6041d0 block: sed-opal: Change the check condition for regular session validity ... Browse Code »

This patch changes the check condition for the validity/authentication
of the session.

1. The Host Session Number(HSN) in the response should match the HSN for
the session.
2. The TPER Session Number(TSN) can never be less than 4096 for a regular
session.

Reference:
Section 3.2.2.1 of https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage_Opal_SSC_Application_Note_1-00_1-00-Final.pdf
Section 3.3.7.1.1 of https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage_Architecture_Core_Spec_v2.01_r1.00.pdf

Co-developed-by: Andrzej Jakowski
Signed-off-by: Andrzej Jakowski
Signed-off-by: Revanth Rajashekar
Signed-off-by: Jens Axboe

Revanth Rajashekar
2020-03-12 22:00:10 +0800
9243c6f3e block: Document genhd capability flags ... Browse Code »

The kernel documentation includes a brief section about genhd
capabilities, but it turns out that the only documented
capability (GENHD_FL_MEDIA_CHANGE_NOTIFY) isn't used any more.

This patch removes that flag, and documents the rest, based on my
understanding of the current uses of these flags in the kernel. The
documentation is kept in the header file, alongside the declarations,
in the hope that it will be kept up-to-date in future; the kernel
documentation is changed to include the documentation generated from
the header file.

Because the ultimate goal is to provide some end-user
documentation (or end-administrator documentation), the comments are
perhaps more user-oriented than might be expected. Since the values
are shown to users in hexadecimal, the documentation lists them in
hexadecimal, and the constant declarations are adjusted to match.

Reviewed-by: Matthew Wilcox (Oracle)
Signed-off-by: Stephen Kitt
Signed-off-by: Jens Axboe

Stephen Kitt
2020-03-12 21:47:22 +0800