Eric Lee / smarc-fsl-linux-kernel

23 Sep, 2022

1 commit

5f285e4c4 block: blk_queue_enter() / __bio_queue_enter() must return -EAGAIN for nowait ... Browse Code »

[ Upstream commit 56f99b8d06ef1ed1c9730948f9f05ac2b930a20b ]

Today blk_queue_enter() and __bio_queue_enter() return -EBUSY for the
nowait code path. This is not correct: they should return -EAGAIN
instead.

This problem was detected by fio. The following command exposed the
above problem:

t/io_uring -p0 -d128 -b4096 -s32 -c32 -F1 -B0 -R0 -X1 -n24 -P1 -u1 -O0 /dev/ng0n1

By applying the patch, the retry case is handled correctly in the slow
path.

Signed-off-by: Stefan Roesch
Fixes: bfd343aa1718 ("blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set")
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Stefan Roesch
2022-09-23 20:15:48 +0800

23 Mar, 2022

1 commit

d4ad8736a block: release rq qos structures for queue without disk ... Browse Code »

commit daaca3522a8e67c46e39ef09c1d542e866f85f3b upstream.

blkcg_init_queue() may add rq qos structures to request queue, previously
blk_cleanup_queue() calls rq_qos_exit() to release them, but commit
8e141f9eb803 ("block: drain file system I/O on del_gendisk")
moves rq_qos_exit() into del_gendisk(), so memory leak is caused
because queues may not have disk, such as un-present scsi luns, nvme
admin queue, ...

Fixes the issue by adding rq_qos_exit() to blk_cleanup_queue() back.

BTW, v5.18 won't need this patch any more since we move
blkcg_init_queue()/blkcg_exit_queue() into disk allocation/release
handler, and patches have been in for-5.18/block.

Cc: Christoph Hellwig
Cc: stable@vger.kernel.org
Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk")
Reported-by: syzbot+b42749a851a47a0f581b@syzkaller.appspotmail.com
Signed-off-by: Ming Lei
Reviewed-by: Bart Van Assche
Reviewed-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20220314043018.177141-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Ming Lei
2022-03-23 16:16:41 +0800

23 Feb, 2022

1 commit

e65450a12 block: fix surprise removal for drivers calling blk_set_queue_dying ... Browse Code »

commit 7a5428dcb7902700b830e912feee4e845df7c019 upstream.

Various block drivers call blk_set_queue_dying to mark a disk as dead due
to surprise removal events, but since commit 8e141f9eb803 that doesn't
work given that the GD_DEAD flag needs to be set to stop I/O.

Replace the driver calls to blk_set_queue_dying with a new (and properly
documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the
only remaining caller.

Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk")
Reported-by: Markus Blöchl
Signed-off-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Christoph Hellwig
2022-02-23 19:03:15 +0800

02 Feb, 2022

1 commit

4cca3e3ef block: add bio_start_io_acct_time() to control start_time ... Browse Code »

commit e45c47d1f94e0cc7b6b079fdb4bcce2995e2adc4 upstream.

bio_start_io_acct_time() interface is like bio_start_io_acct() that
allows start_time to be passed in. This gives drivers the ability to
defer starting accounting until after IO is issued (but possibily not
entirely due to bio splitting).

Reviewed-by: Christoph Hellwig
Signed-off-by: Mike Snitzer
Link: https://lore.kernel.org/r/20220128155841.39644-2-snitzer@redhat.com
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Mike Snitzer
2022-02-02 00:27:03 +0800

01 Dec, 2021

1 commit

e03513f58 blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release() ... Browse Code »

commit 2a19b28f7929866e1cec92a3619f4de9f2d20005 upstream.

For avoiding to slow down queue destroy, we don't call
blk_mq_quiesce_queue() in blk_cleanup_queue(), instead of delaying to
cancel dispatch work in blk_release_queue().

However, this way has caused kernel oops[1], reported by Changhui. The log
shows that scsi_device can be freed before running blk_release_queue(),
which is expected too since scsi_device is released after the scsi disk
is closed and the scsi_device is removed.

Fixes the issue by canceling blk-mq dispatch work in both blk_cleanup_queue()
and disk_release():

1) when disk_release() is run, the disk has been closed, and any sync
dispatch activities have been done, so canceling dispatch work is enough to
quiesce filesystem I/O dispatch activity.

2) in blk_cleanup_queue(), we only focus on passthrough request, and
passthrough request is always explicitly allocated & freed by
its caller, so once queue is frozen, all sync dispatch activity
for passthrough request has been done, then it is enough to just cancel
dispatch work for avoiding any dispatch activity.

[1] kernel panic log
[12622.769416] BUG: kernel NULL pointer dereference, address: 0000000000000300
[12622.777186] #PF: supervisor read access in kernel mode
[12622.782918] #PF: error_code(0x0000) - not-present page
[12622.788649] PGD 0 P4D 0
[12622.791474] Oops: 0000 [#1] PREEMPT SMP PTI
[12622.796138] CPU: 10 PID: 744 Comm: kworker/10:1H Kdump: loaded Not tainted 5.15.0+ #1
[12622.804877] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015
[12622.813321] Workqueue: kblockd blk_mq_run_work_fn
[12622.818572] RIP: 0010:sbitmap_get+0x75/0x190
[12622.823336] Code: 85 80 00 00 00 41 8b 57 08 85 d2 0f 84 b1 00 00 00 45 31 e4 48 63 cd 48 8d 1c 49 48 c1 e3 06 49 03 5f 10 4c 8d 6b 40 83 f0 01 8b 33 44 89 f2 4c 89 ef 0f b6 c8 e8 fa f3 ff ff 83 f8 ff 75 58
[12622.844290] RSP: 0018:ffffb00a446dbd40 EFLAGS: 00010202
[12622.850120] RAX: 0000000000000001 RBX: 0000000000000300 RCX: 0000000000000004
[12622.858082] RDX: 0000000000000006 RSI: 0000000000000082 RDI: ffffa0b7a2dfe030
[12622.866042] RBP: 0000000000000004 R08: 0000000000000001 R09: ffffa0b742721334
[12622.874003] R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000000
[12622.881964] R13: 0000000000000340 R14: 0000000000000000 R15: ffffa0b7a2dfe030
[12622.889926] FS: 0000000000000000(0000) GS:ffffa0baafb40000(0000) knlGS:0000000000000000
[12622.898956] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12622.905367] CR2: 0000000000000300 CR3: 0000000641210001 CR4: 00000000001706e0
[12622.913328] Call Trace:
[12622.916055]
[12622.918394] scsi_mq_get_budget+0x1a/0x110
[12622.922969] __blk_mq_do_dispatch_sched+0x1d4/0x320
[12622.928404] ? pick_next_task_fair+0x39/0x390
[12622.933268] __blk_mq_sched_dispatch_requests+0xf4/0x140
[12622.939194] blk_mq_sched_dispatch_requests+0x30/0x60
[12622.944829] __blk_mq_run_hw_queue+0x30/0xa0
[12622.949593] process_one_work+0x1e8/0x3c0
[12622.954059] worker_thread+0x50/0x3b0
[12622.958144] ? rescuer_thread+0x370/0x370
[12622.962616] kthread+0x158/0x180
[12622.966218] ? set_kthread_struct+0x40/0x40
[12622.970884] ret_from_fork+0x22/0x30
[12622.974875]
[12622.977309] Modules linked in: scsi_debug rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs sunrpc dm_multipath intel_rapl_msr intel_rapl_common dell_wmi_descriptor sb_edac rfkill video x86_pkg_temp_thermal intel_powerclamp dcdbas coretemp kvm_intel kvm mgag200 irqbypass i2c_algo_bit rapl drm_kms_helper ipmi_ssif intel_cstate intel_uncore syscopyarea sysfillrect sysimgblt fb_sys_fops pcspkr cec mei_me lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sr_mod cdrom sd_mod t10_pi sg ixgbe ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata megaraid_sas ghash_clmulni_intel tg3 wdat_wdt mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug]

Reported-by: ChanghuiZhong
Cc: Christoph Hellwig
Cc: "Martin K. Petersen"
Cc: Bart Van Assche
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei
Link: https://lore.kernel.org/r/20211116014343.610501-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Ming Lei
2021-12-01 16:04:56 +0800

25 Nov, 2021

1 commit

04096d1b6 blkcg: Remove extra blkcg_bio_issue_init ... Browse Code »

[ Upstream commit b781d8db580c058ecd54ed7d5dde7f8270b25f5b ]

KASAN reports a use-after-free report when doing block test:

==================================================================
[10050.967049] BUG: KASAN: use-after-free in
submit_bio_checks+0x1539/0x1550

[10050.977638] Call Trace:
[10050.978190] dump_stack+0x9b/0xce
[10050.979674] print_address_description.constprop.6+0x3e/0x60
[10050.983510] kasan_report.cold.9+0x22/0x3a
[10050.986089] submit_bio_checks+0x1539/0x1550
[10050.989576] submit_bio_noacct+0x83/0xc80
[10050.993714] submit_bio+0xa7/0x330
[10050.994435] mpage_readahead+0x380/0x500
[10050.998009] read_pages+0x1c1/0xbf0
[10051.002057] page_cache_ra_unbounded+0x4c2/0x6f0
[10051.007413] do_page_cache_ra+0xda/0x110
[10051.008207] force_page_cache_ra+0x23d/0x3d0
[10051.009087] page_cache_sync_ra+0xca/0x300
[10051.009970] generic_file_buffered_read+0xbea/0x2130
[10051.012685] generic_file_read_iter+0x315/0x490
[10051.014472] blkdev_read_iter+0x113/0x1b0
[10051.015300] aio_read+0x2ad/0x450
[10051.023786] io_submit_one+0xc8e/0x1d60
[10051.029855] __se_sys_io_submit+0x125/0x350
[10051.033442] do_syscall_64+0x2d/0x40
[10051.034156] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[10051.048733] Allocated by task 18598:
[10051.049482] kasan_save_stack+0x19/0x40
[10051.050263] __kasan_kmalloc.constprop.1+0xc1/0xd0
[10051.051230] kmem_cache_alloc+0x146/0x440
[10051.052060] mempool_alloc+0x125/0x2f0
[10051.052818] bio_alloc_bioset+0x353/0x590
[10051.053658] mpage_alloc+0x3b/0x240
[10051.054382] do_mpage_readpage+0xddf/0x1ef0
[10051.055250] mpage_readahead+0x264/0x500
[10051.056060] read_pages+0x1c1/0xbf0
[10051.056758] page_cache_ra_unbounded+0x4c2/0x6f0
[10051.057702] do_page_cache_ra+0xda/0x110
[10051.058511] force_page_cache_ra+0x23d/0x3d0
[10051.059373] page_cache_sync_ra+0xca/0x300
[10051.060198] generic_file_buffered_read+0xbea/0x2130
[10051.061195] generic_file_read_iter+0x315/0x490
[10051.062189] blkdev_read_iter+0x113/0x1b0
[10051.063015] aio_read+0x2ad/0x450
[10051.063686] io_submit_one+0xc8e/0x1d60
[10051.064467] __se_sys_io_submit+0x125/0x350
[10051.065318] do_syscall_64+0x2d/0x40
[10051.066082] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[10051.067455] Freed by task 13307:
[10051.068136] kasan_save_stack+0x19/0x40
[10051.068931] kasan_set_track+0x1c/0x30
[10051.069726] kasan_set_free_info+0x1b/0x30
[10051.070621] __kasan_slab_free+0x111/0x160
[10051.071480] kmem_cache_free+0x94/0x460
[10051.072256] mempool_free+0xd6/0x320
[10051.072985] bio_free+0xe0/0x130
[10051.073630] bio_put+0xab/0xe0
[10051.074252] bio_endio+0x3a6/0x5d0
[10051.074984] blk_update_request+0x590/0x1370
[10051.075870] scsi_end_request+0x7d/0x400
[10051.076667] scsi_io_completion+0x1aa/0xe50
[10051.077503] scsi_softirq_done+0x11b/0x240
[10051.078344] blk_mq_complete_request+0xd4/0x120
[10051.079275] scsi_mq_done+0xf0/0x200
[10051.080036] virtscsi_vq_done+0xbc/0x150
[10051.080850] vring_interrupt+0x179/0x390
[10051.081650] __handle_irq_event_percpu+0xf7/0x490
[10051.082626] handle_irq_event_percpu+0x7b/0x160
[10051.083527] handle_irq_event+0xcc/0x170
[10051.084297] handle_edge_irq+0x215/0xb20
[10051.085122] asm_call_irq_on_stack+0xf/0x20
[10051.085986] common_interrupt+0xae/0x120
[10051.086830] asm_common_interrupt+0x1e/0x40

==================================================================

Bio will be checked at beginning of submit_bio_noacct(). If bio needs
to be throttled, it will start the timer and stop submit bio directly.
Bio will submit in blk_throtl_dispatch_work_fn() when the timer expires.
But in the current process, if bio is throttled, it will still set bio
issue->value by blkcg_bio_issue_init(). This is redundant and may cause
the above use-after-free.

CPU0 CPU1
submit_bio
submit_bio_noacct
submit_bio_checks
blk_throtl_bio()
pending_timer
blk_throtl_dispatch_work_fn
submit_bio_noacct() value will be set
here

bio_endio()
bio_put()
bio_free()
Link: https://lore.kernel.org/r/20211112093354.3581504-1-qiulaibin@huawei.com
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Laibin Qiu
2021-11-25 16:48:32 +0800

16 Oct, 2021

4 commits

8e141f9eb block: drain file system I/O on del_gendisk ... Browse Code »

Instead of delaying draining of file system I/O related items like the
blk-qos queues, the integrity read workqueue and timeouts only when the
request_queue is removed, do that when del_gendisk is called. This is
important for SCSI where the upper level drivers that control the gendisk
are separate entities, and the disk can be freed much earlier than the
request_queue, or can even be unbound without tearing down the queue.

Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk")
Reported-by: Ming Lei
Signed-off-by: Christoph Hellwig
Tested-by: Darrick J. Wong
Link: https://lore.kernel.org/r/20210929071241.934472-5-hch@lst.de
Tested-by: Yi Zhang
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-10-16 11:02:50 +0800
a6741536f block: split bio_queue_enter from blk_queue_enter ... Browse Code »

To prepare for fixing a gendisk shutdown race, open code the
blk_queue_enter logic in bio_queue_enter. This also removes the
pointless flags translation.

Signed-off-by: Christoph Hellwig
Tested-by: Darrick J. Wong
Link: https://lore.kernel.org/r/20210929071241.934472-4-hch@lst.de
Tested-by: Yi Zhang
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-10-16 11:02:47 +0800
1f14a0989 block: factor out a blk_try_enter_queue helper ... Browse Code »

Factor out the code to try to get q_usage_counter without blocking into
a separate helper. Both to improve code readability and to prepare for
splitting bio_queue_enter from blk_queue_enter.

Signed-off-by: Christoph Hellwig
Tested-by: Darrick J. Wong
Link: https://lore.kernel.org/r/20210929071241.934472-3-hch@lst.de
Tested-by: Yi Zhang
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-10-16 11:02:44 +0800
cc9c884dd block: call submit_bio_checks under q_usage_counter ... Browse Code »

Ensure all bios check the current values of the queue under freeze
protection, i.e. to make sure the zero capacity set by del_gendisk
is actually seen before dispatching to the driver.

Signed-off-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20210929071241.934472-2-hch@lst.de
Tested-by: Yi Zhang
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-10-16 11:02:36 +0800

31 Aug, 2021

2 commits

3b629f8d6 Merge tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-block ... Browse Code »

Pull support for struct bio recycling from Jens Axboe:
"This adds bio recycling support for polled IO, allowing quick reuse of
a bio for high IOPS scenarios via a percpu bio_set list.

It's good for almost a 10% improvement in performance, bumping our
per-core IO limit from ~3.2M IOPS to ~3.5M IOPS"

* tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-block:
bio: improve kerneldoc documentation for bio_alloc_kiocb()
block: provide bio_clear_hipri() helper
block: use the percpu bio cache in __blkdev_direct_IO
io_uring: enable use of bio alloc cache
block: clear BIO_PERCPU_CACHE flag if polling isn't supported
bio: add allocation cache abstraction
fs: add kiocb alloc cache flag
bio: optimize initialization of a bio

Linus Torvalds
2021-08-31 10:30:30 +0800
679369114 Merge tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"Nothing major in here - lots of good cleanups and tech debt handling,
which is also evident in the diffstats. In particular:

- Add disk sequence numbers (Matteo)

- Discard merge fix (Ming)

- Relax disk zoned reporting restrictions (Niklas)

- Bio error handling zoned leak fix (Pavel)

- Start of proper add_disk() error handling (Luis, Christoph)

- blk crypto fix (Eric)

- Non-standard GPT location support (Dmitry)

- IO priority improvements and cleanups (Damien)o

- blk-throtl improvements (Chunguang)

- diskstats_show() stack reduction (Abd-Alrhman)

- Loop scheduler selection (Bart)

- Switch block layer to use kmap_local_page() (Christoph)

- Remove obsolete disk_name helper (Christoph)

- block_device refcounting improvements (Christoph)

- Ensure gendisk always has a request queue reference (Christoph)

- Misc fixes/cleanups (Shaokun, Oliver, Guoqing)"

* tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block: (129 commits)
sg: pass the device name to blk_trace_setup
block, bfq: cleanup the repeated declaration
blk-crypto: fix check for too-large dun_bytes
blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN
blk-zoned: allow zone management send operations without CAP_SYS_ADMIN
block: mark blkdev_fsync static
block: refine the disk_live check in del_gendisk
mmc: sdhci-tegra: Enable MMC_CAP2_ALT_GPT_TEGRA
mmc: block: Support alternative_gpt_sector() operation
partitions/efi: Support non-standard GPT location
block: Add alternative_gpt_sector() operation
bio: fix page leak bio_add_hw_page failure
block: remove CONFIG_DEBUG_BLOCK_EXT_DEVT
block: remove a pointless call to MINOR() in device_add_disk
null_blk: add error handling support for add_disk()
virtio_blk: add error handling support for add_disk()
block: add error handling for device_add_disk / add_disk
block: return errors from disk_alloc_events
block: return errors from blk_integrity_add
block: call blk_register_queue earlier in device_add_disk
...

Linus Torvalds
2021-08-31 09:52:11 +0800

24 Aug, 2021

2 commits

270a1c913 block: provide bio_clear_hipri() helper ... Browse Code »

Any case that turns off REQ_HIPRI must also clear BIO_PERCPU_CACHE,
as non-polled IO may complete through hard/soft IRQ and hence isn't
safe for our polled bio alloc cache.

Provide a helper that does just that, and use it in the merging code as
well if we split a bio and turn off polling.

Fixes: be863b9e4348 ("block: clear BIO_PERCPU_CACHE flag if polling isn't supported")
Reported-by: Keith Busch
Signed-off-by: Jens Axboe

Jens Axboe
2021-08-24 03:45:15 +0800
be863b9e4 block: clear BIO_PERCPU_CACHE flag if polling isn't supported ... Browse Code »

The bio alloc cache relies on the fact that a polled bio will complete
in process context, clear the cacheable flag if we disable polling
for a given bio.

Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jens Axboe
2021-08-24 03:44:51 +0800

17 Aug, 2021

1 commit

c2da19ed5 blk-mq: fix kernel panic during iterating over flush request ... Browse Code »

For fixing use-after-free during iterating over requests, we grabbed
request's refcount before calling ->fn in commit 2e315dc07df0 ("blk-mq:
grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter").
Turns out this way may cause kernel panic when iterating over one flush
request:

1) old flush request's tag is just released, and this tag is reused by
one new request, but ->rqs[] isn't updated yet

2) the flush request can be re-used for submitting one new flush command,
so blk_rq_init() is called at the same time

3) meantime blk_mq_queue_tag_busy_iter() is called, and old flush request
is retrieved from ->rqs[tag]; when blk_mq_put_rq_ref() is called,
flush_rq->end_io may not be updated yet, so NULL pointer dereference
is triggered in blk_mq_put_rq_ref().

Fix the issue by calling refcount_set(&flush_rq->ref, 1) after
flush_rq->end_io is set. So far the only other caller of blk_rq_init() is
scsi_ioctl_reset() in which the request doesn't enter block IO stack and
the request reference count isn't used, so the change is safe.

Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")
Reported-by: "Blank-Burian, Markus, Dr."
Tested-by: "Blank-Burian, Markus, Dr."
Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Reviewed-by: John Garry
Link: https://lore.kernel.org/r/20210811142624.618598-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe

Ming Lei
2021-08-17 22:33:32 +0800

10 Aug, 2021

2 commits

edb0872f4 block: move the bdi from the request_queue to the gendisk ... Browse Code »

The backing device information only makes sense for file system I/O,
and thus belongs into the gendisk and not the lower level request_queue
structure. Move it there.

Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Link: https://lore.kernel.org/r/20210809141744.1203023-5-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-08-10 01:53:23 +0800
5ed964f8e mm: hide laptop_mode_wb_timer entirely behind the BDI API ... Browse Code »

Don't leak the detaіls of the timer into the block layer, instead
initialize the timer in bdi_alloc and delete it in bdi_unregister.
Note that this means the timer is initialized (but not armed) for
non-block queues as well now.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Reviewed-by: Johannes Thumshirn
Link: https://lore.kernel.org/r/20210809141744.1203023-2-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-08-10 01:52:28 +0800

07 Jul, 2021

1 commit

d80c228d4 block: fix the problem of io_ticks becoming smaller ... Browse Code »

On the IO submission path, blk_account_io_start() may interrupt
the system interruption. When the interruption returns, the value
of part->stamp may have been updated by other cores, so the time
value collected before the interruption may be less than part->
stamp. So when this happens, we should do nothing to make io_ticks
more accurate? For kernels less than 5.0, this may cause io_ticks
to become smaller, which in turn may cause abnormal ioutil values.

Signed-off-by: Chunguang Xu
Reviewed-by: Christoph Hellwig
Link: https://lore.kernel.org/r/1625521646-1069-1-git-send-email-brookxu.cn@gmail.com
Signed-off-by: Jens Axboe

Chunguang Xu
2021-07-07 20:43:20 +0800

01 Jul, 2021

1 commit

da6269da4 block: remove REQ_OP_SCSI_{IN,OUT} ... Browse Code »

With the legacy IDE driver gone drivers now use either REQ_OP_DRV_*
or REQ_OP_SCSI_*, so unify the two concepts of passthrough requests
into a single one.

Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-07-01 05:34:19 +0800

04 Jun, 2021

1 commit

7cc2623d1 block: Update blk_update_request() documentation ... Browse Code »

Although the original intent was to use blk_update_request() in stacking
block drivers only, it is used much more widely today. Reflect this in the
documentation block above this function. See also:
* commit 32fab448e5e8 ("block: add request update interface").
* commit 2e60e02297cf ("block: clean up request completion API").
* commit ed6565e73424 ("block: handle partial completions for special
payload requests").

Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Hannes Reinecke
Signed-off-by: Bart Van Assche
Link: https://lore.kernel.org/r/20210519175226.8853-1-bvanassche@acm.org
Signed-off-by: Jens Axboe

Bart Van Assche
2021-06-04 04:37:24 +0800

01 Jun, 2021

1 commit

da7ba7296 block: unexport blk_alloc_queue ... Browse Code »

blk_alloc_queue is just an internal helper now, unexport it and remove
it from the public header.

Signed-off-by: Christoph Hellwig
Reviewed-by: Ulf Hansson
Link: https://lore.kernel.org/r/20210521055116.1053587-27-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-06-01 21:42:24 +0800

24 May, 2021

1 commit

3af3d772f block_dump: remove block_dump feature ... Browse Code »

We have already delete block_dump feature in mark_inode_dirty() because
it can be replaced by tracepoints, now we also remove the part in
submit_bio() for the same reason. The part of block dump feature in
submit_bio() dump the write process, write region and sectors on the
target disk into kernel message. it can be replaced by
block_bio_queue tracepoint in submit_bio_checks(), so we do not need
block_dump anymore, remove the whole block_dump feature.

Signed-off-by: zhangyi (F)
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20210313030146.2882027-3-yi.zhang@huawei.com
Signed-off-by: Jens Axboe

zhangyi (F)
2021-05-24 20:47:21 +0800

06 Apr, 2021

1 commit

9bb33f24a block: refactor the bounce buffering code ... Browse Code »

Get rid of all the PFN arithmetics and just use an enum for the two
remaining options, and use PageHighMem for the actual bounce decision.

Add a fast path to entirely avoid the call for the common case of a queue
not using the legacy bouncing code.

Signed-off-by: Christoph Hellwig
Acked-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Link: https://lore.kernel.org/r/20210331073001.46776-8-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-04-06 23:28:17 +0800

22 Feb, 2021

1 commit

b357e4a69 block: get rid of the trace rq insert wrapper ... Browse Code »

Get rid of the wrapper for trace_block_rq_insert() and call the function
directly.

Signed-off-by: Chaitanya Kulkarni
Reviewed-by: Johannes Thumshirn
Reviewed-by: Damien Le Moal
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Chaitanya Kulkarni
2021-02-22 21:37:41 +0800

26 Jan, 2021

1 commit

3a905c37c block: skip bio_check_eod for partition-remapped bios ... Browse Code »

When an already remapped bio is resubmitted (e.g. by blk_queue_split),
bio_check_eod will compare the remapped bi_sector against the size
of the partition, leading to spurious I/O failures.

Skip the EOD check in this case.

Fixes: 309dca309fc3 ("block: store a block_device pointer in struct bio")
Reported-by: Jens Axboe
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-01-26 02:41:34 +0800

25 Jan, 2021

7 commits

c495a1767 block: don't pass BIOSET_NEED_BVECS for q->bio_split ... Browse Code »

q->bio_split is only used by bio_split() for fast cloning bio, and no
need to allocate bvecs, so remove this flag.

Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Reviewed-by: Pavel Begunkov
Tested-by: Pavel Begunkov
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Ming Lei
2021-01-25 12:22:45 +0800
0b6e522cd blk-mq: use ->bi_bdev for I/O accounting ... Browse Code »

Remove the reverse map from a sector to a partition for I/O accounting by
simply using ->bi_bdev.

Signed-off-by: Christoph Hellwig
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-01-25 09:17:20 +0800
99dfc43ec block: use ->bi_bdev for bio based I/O accounting ... Browse Code »

Rework the I/O accounting for bio based drivers to use ->bi_bdev. This
means all drivers can now simply use bio_start_io_acct to start
accounting, and it will take partitions into account automatically. To
end I/O account either bio_end_io_acct can be used if the driver never
remaps I/O to a different device, or bio_end_io_acct_remapped if the
driver did remap the I/O.

Signed-off-by: Christoph Hellwig
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-01-25 09:17:20 +0800
30c5d3456 block: do not reassig ->bi_bdev when partition remapping ... Browse Code »

There is no good reason to reassign ->bi_bdev when remapping the
partition-relative block number to the device wide one, as all the
information required by the drivers comes from the gendisk anyway.

Keeping the original ->bi_bdev alive will allow to greatly simplify
the partition-away I/O accounting.

Signed-off-by: Christoph Hellwig
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-01-25 09:17:20 +0800
2f9f6221b block: simplify submit_bio_checks a bit ... Browse Code »

Merge a few checks for whole devices vs partitions to streamline the
sanity checks.

Signed-off-by: Christoph Hellwig
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-01-25 09:17:20 +0800
309dca309 block: store a block_device pointer in struct bio ... Browse Code »

Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device. From that the gendisk can be trivially
accessed with an extra indirection, but it also allows to directly
look up all information related to partition remapping.

Signed-off-by: Christoph Hellwig
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-01-25 09:17:20 +0800
52f019d43 block: add a hard-readonly flag to struct gendisk ... Browse Code »

Commit 20bd1d026aac ("scsi: sd: Keep disk read-only when re-reading
partition") addressed a long-standing problem with user read-only
policy being overridden as a result of a device-initiated revalidate.
The commit has since been reverted due to a regression that left some
USB devices read-only indefinitely.

To fix the underlying problems with revalidate we need to keep track
of hardware state and user policy separately.

The gendisk has been updated to reflect the current hardware state set
by the device driver. This is done to allow returning the device to
the hardware state once the user clears the BLKROSET flag.

The resulting semantics are as follows:

- If BLKROSET sets a given partition read-only, that partition will
remain read-only even if the underlying storage stack initiates a
revalidate. However, the BLKRRPART ioctl will cause the partition
table to be dropped and any user policy on partitions will be lost.

- If BLKROSET has not been set, both the whole disk device and any
partitions will reflect the current write-protect state of the
underlying device.

Based on a patch from Martin K. Petersen .

Reported-by: Oleksii Kurochko
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201221
Signed-off-by: Christoph Hellwig
Reviewed-by: Ming Lei
Reviewed-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-01-25 09:15:57 +0800

02 Jan, 2021

1 commit

eda809aef Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi ... Browse Code »

Pull SCSI fixes from James Bottomley:
"This is a load of driver fixes (12 ufs, 1 mpt3sas, 1 cxgbi).

The big core two fixes are for power management ("block: Do not accept
any requests while suspended" and "block: Fix a race in the runtime
power management code") which finally sorts out the resume problems
we've occasionally been having.

To make the resume fix, there are seven necessary precursors which
effectively renames REQ_PREEMPT to REQ_PM, so every "special" request
in block is automatically a power management exempt one.

All of the non-PM preempt cases are removed except for the one in the
SCSI Parallel Interface (spi) domain validation which is a genuine
case where we have to run requests at high priority to validate the
bus so this becomes an autopm get/put protected request"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (22 commits)
scsi: cxgb4i: Fix TLS dependency
scsi: ufs: Un-inline ufshcd_vops_device_reset function
scsi: ufs: Re-enable WriteBooster after device reset
scsi: ufs-mediatek: Use correct path to fix compile error
scsi: mpt3sas: Signedness bug in _base_get_diag_triggers()
scsi: block: Do not accept any requests while suspended
scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT
scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE
scsi: scsi_transport_spi: Set RQF_PM for domain validation commands
scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT
scsi: ide: Do not set the RQF_PREEMPT flag for sense requests
scsi: block: Introduce BLK_MQ_REQ_PM
scsi: block: Fix a race in the runtime power management code
scsi: ufs-pci: Enable UFSHCD_CAP_RPM_AUTOSUSPEND for Intel controllers
scsi: ufs-pci: Fix recovery from hibernate exit errors for Intel controllers
scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff()
scsi: ufs-pci: Fix restore from S4 for Intel controllers
scsi: ufs-mediatek: Keep VCC always-on for specific devices
scsi: ufs: Allow regulators being always-on
scsi: ufs: Clear UAC for RPMB after ufshcd resets
...

Linus Torvalds
2021-01-02 04:58:07 +0800

10 Dec, 2020

3 commits

52abca64f scsi: block: Do not accept any requests while suspended ... Browse Code »

blk_queue_enter() accepts BLK_MQ_REQ_PM requests independent of the runtime
power management state. Now that SCSI domain validation no longer depends
on this behavior, modify the behavior of blk_queue_enter() as follows:

- Do not accept any requests while suspended.

- Only process power management requests while suspending or resuming.

Submitting BLK_MQ_REQ_PM requests to a device that is runtime suspended
causes runtime-suspended devices not to resume as they should. The request
which should cause a runtime resume instead gets issued directly, without
resuming the device first. Of course the device can't handle it properly,
the I/O fails, and the device remains suspended.

The problem is fixed by checking that the queue's runtime-PM status isn't
RPM_SUSPENDED before allowing a request to be issued, and queuing a
runtime-resume request if it is. In particular, the inline
blk_pm_request_resume() routine is renamed blk_pm_resume_queue() and the
code is unified by merging the surrounding checks into the routine. If the
queue isn't set up for runtime PM, or there currently is no restriction on
allowed requests, the request is allowed. Likewise if the BLK_MQ_REQ_PM
flag is set and the status isn't RPM_SUSPENDED. Otherwise a runtime resume
is queued and the request is blocked until conditions are more suitable.

[ bvanassche: modified commit message and removed Cc: stable because
without the previous patches from this series this patch would break
parallel SCSI domain validation + introduced queue_rpm_status() ]

Link: https://lore.kernel.org/r/20201209052951.16136-9-bvanassche@acm.org
Cc: Jens Axboe
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Can Guo
Cc: Stanley Chu
Cc: Ming Lei
Cc: Rafael J. Wysocki
Reported-and-tested-by: Martin Kepplinger
Reviewed-by: Hannes Reinecke
Reviewed-by: Can Guo
Signed-off-by: Alan Stern
Signed-off-by: Bart Van Assche
Signed-off-by: Martin K. Petersen

Alan Stern
2020-12-10 00:41:42 +0800
a4d34da71 scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT ... Browse Code »

Remove flag RQF_PREEMPT and BLK_MQ_REQ_PREEMPT since these are no longer
used by any kernel code.

Link: https://lore.kernel.org/r/20201209052951.16136-8-bvanassche@acm.org
Cc: Can Guo
Cc: Stanley Chu
Cc: Alan Stern
Cc: Ming Lei
Cc: Rafael J. Wysocki
Cc: Martin Kepplinger
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Jens Axboe
Reviewed-by: Can Guo
Signed-off-by: Bart Van Assche
Signed-off-by: Martin K. Petersen

Bart Van Assche
2020-12-10 00:41:42 +0800
0854bcdcd scsi: block: Introduce BLK_MQ_REQ_PM ... Browse Code »

Introduce the BLK_MQ_REQ_PM flag. This flag makes the request allocation
functions set RQF_PM. This is the first step towards removing
BLK_MQ_REQ_PREEMPT.

Link: https://lore.kernel.org/r/20201209052951.16136-3-bvanassche@acm.org
Cc: Alan Stern
Cc: Stanley Chu
Cc: Ming Lei
Cc: Rafael J. Wysocki
Cc: Can Guo
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Jens Axboe
Reviewed-by: Can Guo
Signed-off-by: Bart Van Assche
Signed-off-by: Martin K. Petersen

Bart Van Assche
2020-12-10 00:41:41 +0800

05 Dec, 2020

2 commits

1c02fca62 block: remove the request_queue argument to the block_bio_remap tracepoint ... Browse Code »

The request_queue can trivially be derived from the bio.

Signed-off-by: Christoph Hellwig
Reviewed-by: Damien Le Moal
Reviewed-by: Hannes Reinecke
Reviewed-by: Chaitanya Kulkarni
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-12-05 00:42:00 +0800
e8a676d61 block: simplify and extend the block_bio_merge tracepoint class ... Browse Code »

The block_bio_merge tracepoint class can be reused for most bio-based
tracepoints. For that it just needs to lose the superfluous q and rq
parameters.

Signed-off-by: Christoph Hellwig
Reviewed-by: Damien Le Moal
Reviewed-by: Hannes Reinecke
Reviewed-by: Chaitanya Kulkarni
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-12-05 00:42:00 +0800

02 Dec, 2020

2 commits

8446fe925 block: switch partition lookup to use struct block_device ... Browse Code »

Use struct block_device to lookup partitions on a disk. This removes
all usage of struct hd_struct from the I/O path.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Reviewed-by: Hannes Reinecke
Acked-by: Coly Li [bcache]
Acked-by: Chao Yu [f2fs]
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-12-02 05:53:40 +0800
cb8432d65 block: allocate struct hd_struct as part of struct bdev_inode ... Browse Code »

Allocate hd_struct together with struct block_device to pre-load
the lifetime rule changes in preparation of merging the two structures.

Note that part0 was previously embedded into struct gendisk, but is
a separate allocation now, and already points to the block_device instead
of the hd_struct. The lifetime of struct gendisk is still controlled by
the struct device embedded in the part0 hd_struct.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-12-02 05:53:40 +0800