Eric Lee / smarc-fsl-linux-kernel

10 Jul, 2021

1 commit

a022f7d57 Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block ... Browse Code »

Pull more block updates from Jens Axboe:
"A combination of changes that ended up depending on both the driver
and core branch (and/or the IDE removal), and a few late arriving
fixes. In detail:

- Fix io ticks wrap-around issue (Chunguang)

- nvme-tcp sock locking fix (Maurizio)

- s390-dasd fixes (Kees, Christoph)

- blk_execute_rq polling support (Keith)

- blk-cgroup RCU iteration fix (Yu)

- nbd backend ID addition (Prasanna)

- Partition deletion fix (Yufen)

- Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph)

- Removal of now dead block request types due to IDE removal
(Christoph)

- Loop probing and control device cleanups (Christoph)

- Device uevent fix (Christoph)

- Misc cleanups/fixes (Tetsuo, Christoph)"

* tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits)
blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs
block: fix the problem of io_ticks becoming smaller
nvme-tcp: can't set sk_user_data without write_lock
loop: remove unused variable in loop_set_status()
block: remove the bdgrab in blk_drop_partitions
block: grab a device refcount in disk_uevent
s390/dasd: Avoid field over-reading memcpy()
dasd: unexport dasd_set_target_state
block: check disk exist before trying to add partition
ubd: remove dead code in ubd_setup_common
nvme: use return value from blk_execute_rq()
block: return errors from blk_execute_rq()
nvme: use blk_execute_rq() for passthrough commands
block: support polling through blk_execute_rq
block: remove REQ_OP_SCSI_{IN,OUT}
block: mark blk_mq_init_queue_data static
loop: rewrite loop_exit using idr_for_each_entry
loop: split loop_lookup
loop: don't allow deleting an unspecified loop device
loop: move loop_ctl_mutex locking into loop_add
...

Linus Torvalds
2021-07-10 03:05:33 +0800

07 Jul, 2021

2 commits

a731763fc blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs ... Browse Code »

We run a test that create millions of cgroups and blkgs, and then trigger
blkg_destroy_all(). blkg_destroy_all() will hold spin lock for a long
time in such situation. Thus release the lock when a batch of blkgs are
destroyed.

blkcg_activate_policy() and blkcg_deactivate_policy() might have the
same problem, however, as they are basically only called from module
init/exit paths, let's leave them alone for now.

Signed-off-by: Yu Kuai
Acked-by: Tejun Heo
Link: https://lore.kernel.org/r/20210707015649.1929797-1-yukuai3@huawei.com
Signed-off-by: Jens Axboe

Yu Kuai
2021-07-07 23:36:36 +0800
d80c228d4 block: fix the problem of io_ticks becoming smaller ... Browse Code »

On the IO submission path, blk_account_io_start() may interrupt
the system interruption. When the interruption returns, the value
of part->stamp may have been updated by other cores, so the time
value collected before the interruption may be less than part->
stamp. So when this happens, we should do nothing to make io_ticks
more accurate? For kernels less than 5.0, this may cause io_ticks
to become smaller, which in turn may cause abnormal ioutil values.

Signed-off-by: Chunguang Xu
Reviewed-by: Christoph Hellwig
Link: https://lore.kernel.org/r/1625521646-1069-1-git-send-email-brookxu.cn@gmail.com
Signed-off-by: Jens Axboe

Chunguang Xu
2021-07-07 20:43:20 +0800

05 Jul, 2021

1 commit

d2500a0c0 scsi: blkcg: Fix application ID config options ... Browse Code »

Commit d2bcbeab4200 ("scsi: blkcg: Add app identifier support for
blkcg") introduced an FC_APPID config option under SCSI. However, the
added config option is not used anywhere. Simply remove it.

The block layer BLK_CGROUP_FC_APPID config option is what actually
controls whether the application ID code should be built or not. Make
this option dependent on NVMe over FC since that is currently the only
transport which supports the capability.

Fixes: d2bcbeab4200 ("scsi: blkcg: Add app identifier support for blkcg")
Reported-by: Linus Torvalds
Signed-off-by: Martin K. Petersen
Reviewed-by: Christoph Hellwig
Signed-off-by: Linus Torvalds

Martin K. Petersen
2021-07-05 02:44:22 +0800

03 Jul, 2021

2 commits

bd31b9efb Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi ... Browse Code »

Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (ufs, ibmvfc,
megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with
elx and mpi3mr being new drivers.

The major core change is a rework to drop the status byte handling
macros and the old bit shifted definitions and the rest of the updates
are minor fixes"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (287 commits)
scsi: aha1740: Avoid over-read of sense buffer
scsi: arcmsr: Avoid over-read of sense buffer
scsi: ips: Avoid over-read of sense buffer
scsi: ufs: ufs-mediatek: Add missing of_node_put() in ufs_mtk_probe()
scsi: elx: libefc: Fix IRQ restore in efc_domain_dispatch_frame()
scsi: elx: libefc: Fix less than zero comparison of a unsigned int
scsi: elx: efct: Fix pointer error checking in debugfs init
scsi: elx: efct: Fix is_originator return code type
scsi: elx: efct: Fix link error for _bad_cmpxchg
scsi: elx: efct: Eliminate unnecessary boolean check in efct_hw_command_cancel()
scsi: elx: efct: Do not use id uninitialized in efct_lio_setup_session()
scsi: elx: efct: Fix error handling in efct_hw_init()
scsi: elx: efct: Remove redundant initialization of variable lun
scsi: elx: efct: Fix spelling mistake "Unexected" -> "Unexpected"
scsi: lpfc: Fix build error in lpfc_scsi.c
scsi: target: iscsi: Remove redundant continue statement
scsi: qla4xxx: Remove redundant continue statement
scsi: ppa: Switch to use module_parport_driver()
scsi: imm: Switch to use module_parport_driver()
scsi: mpt3sas: Fix error return value in _scsih_expander_add()
...

Linus Torvalds
2021-07-03 06:14:36 +0800
4cad67197 Merge tag 'asm-generic-unaligned-5.14' of git://git.kernel.org/pub/scm/linux/ker… ... Browse Code »

…nel/git/arnd/asm-generic

Pull asm/unaligned.h unification from Arnd Bergmann:
"Unify asm/unaligned.h around struct helper

The get_unaligned()/put_unaligned() helpers are traditionally
architecture specific, with the two main variants being the
"access-ok.h" version that assumes unaligned pointer accesses always
work on a particular architecture, and the "le-struct.h" version that
casts the data to a byte aligned type before dereferencing, for
architectures that cannot always do unaligned accesses in hardware.

Based on the discussion linked below, it appears that the access-ok
version is not realiable on any architecture, but the struct version
probably has no downsides. This series changes the code to use the
same implementation on all architectures, addressing the few
exceptions separately"

Link: https://lore.kernel.org/lkml/75d07691-1e4f-741f-9852-38c0b4f520bc@synopsys.com/
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
Link: https://lore.kernel.org/lkml/20210507220813.365382-14-arnd@kernel.org/
Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2
Link: https://lore.kernel.org/lkml/CAHk-=whGObOKruA_bU3aPGZfoDqZM1_9wBkwREp0H0FgR-90uQ@mail.gmail.com/

* tag 'asm-generic-unaligned-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: simplify asm/unaligned.h
asm-generic: uaccess: 1-byte access is always aligned
netpoll: avoid put_unaligned() on single character
mwifiex: re-fix for unaligned accesses
apparmor: use get_unaligned() only for multi-byte words
partitions: msdos: fix one-byte get_unaligned()
asm-generic: unaligned always use struct helpers
asm-generic: unaligned: remove byteshift helpers
powerpc: use linux/unaligned/le_struct.h on LE power7
m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
sh: remove unaligned access for sh4a
openrisc: always use unaligned-struct header
asm-generic: use asm-generic/unaligned.h for most architectures

Linus Torvalds
2021-07-03 03:43:40 +0800

02 Jul, 2021

2 commits

63c38d858 block: remove the bdgrab in blk_drop_partitions ... Browse Code »

There is no need to hold a bdev reference when removing the partition.

Signed-off-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20210701081638.246552-3-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-07-02 00:21:24 +0800
498dcc13f block: grab a device refcount in disk_uevent ... Browse Code »

Sending uevents requires the struct device to be alive. To
ensure that grab the device refcount instead of just an inode
reference.

Fixes: bc359d03c7ec ("block: add a disk_uevent helper")
Signed-off-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20210701081638.246552-2-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-07-02 00:21:24 +0800

01 Jul, 2021

8 commits

b5cfbd35e block: check disk exist before trying to add partition ... Browse Code »

If disk have been deleted, we should return fail for ioctl
BLKPG_DEL_PARTITION. Otherwise, the directory /sys/class/block
may remain invalid symlinks file. The race as following:

blkdev_open
del_gendisk
disk->flags &= ~GENHD_FL_UP;
blk_drop_partitions
blkpg_ioctl
bdev_add_partition
add_partition
device_add
device_add_class_symlinks

ioctl may add_partition after del_gendisk() have tried to delete
partitions. Then, symlinks file will be created.

Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig
Signed-off-by: Yufen Yu
Link: https://lore.kernel.org/r/20210610023241.3646241-1-yuyufen@huawei.com
Signed-off-by: Jens Axboe

Yufen Yu
2021-07-01 09:38:48 +0800
2cfa582be Merge tag 'for-5.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git… ... Browse Code »

…/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- Various DM persistent-data library improvements and fixes that
benefit both the DM thinp and cache targets.

- A few small DM kcopyd efficiency improvements.

- Significant zoned related block core, DM core and DM zoned target
changes that culminate with adding zoned append emulation (which is
required to properly fix DM crypt's zoned support).

- Various DM writecache target changes that improve efficiency. Adds an
optional "metadata_only" feature that only promotes bios flagged with
REQ_META. But the most significant improvement is writecache's
ability to pause writeback, for a confiurable time, if/when the
working set is larger than the cache (and the cache is full) -- this
ensures performance is no worse than the slower origin device.

* tag 'for-5.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (35 commits)
dm writecache: make writeback pause configurable
dm writecache: pause writeback if cache full and origin being written directly
dm io tracker: factor out IO tracker
dm btree remove: assign new_root only when removal succeeds
dm zone: fix dm_revalidate_zones() memory allocation
dm ps io affinity: remove redundant continue statement
dm writecache: add optional "metadata_only" parameter
dm writecache: add "cleaner" and "max_age" to Documentation
dm writecache: write at least 4k when committing
dm writecache: flush origin device when writing and cache is full
dm writecache: have ssd writeback wait if the kcopyd workqueue is busy
dm writecache: use list_move instead of list_del/list_add in writecache_writeback()
dm writecache: commit just one block, not a full page
dm writecache: remove unused gfp_t argument from wc_add_block()
dm crypt: Fix zoned block device support
dm: introduce zone append emulation
dm: rearrange core declarations for extended use from dm-zone.c
block: introduce BIO_ZONE_WRITE_LOCKED bio flag
block: introduce bio zone helpers
block: improve handling of all zones reset operation
...

Linus Torvalds
2021-07-01 09:19:39 +0800
fb9b16e15 block: return errors from blk_execute_rq() ... Browse Code »

The synchronous blk_execute_rq() had not provided a way for its callers
to know if its request was successful or not. Return the blk_status_t
result of the request.

Reviewed-by: Christoph Hellwig
Reviewed-by: Ming Lei
Signed-off-by: Keith Busch
Reviewed-by: Chaitanya Kulkarni
Link: https://lore.kernel.org/r/20210610214437.641245-4-kbusch@kernel.org
Signed-off-by: Jens Axboe

Keith Busch
2021-07-01 05:35:45 +0800
c01b5a814 block: support polling through blk_execute_rq ... Browse Code »

Poll for completions if the request's hctx is a polling type.

Reviewed-by: Christoph Hellwig
Reviewed-by: Ming Lei
Signed-off-by: Keith Busch
Reviewed-by: Chaitanya Kulkarni
Link: https://lore.kernel.org/r/20210610214437.641245-2-kbusch@kernel.org
Signed-off-by: Jens Axboe

Keith Busch
2021-07-01 05:34:21 +0800
da6269da4 block: remove REQ_OP_SCSI_{IN,OUT} ... Browse Code »

With the legacy IDE driver gone drivers now use either REQ_OP_DRV_*
or REQ_OP_SCSI_*, so unify the two concepts of passthrough requests
into a single one.

Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-07-01 05:34:19 +0800
5ec780a6e block: mark blk_mq_init_queue_data static ... Browse Code »

All driver uses are gone now.

Signed-off-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Link: https://lore.kernel.org/r/20210624081012.256464-1-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-07-01 05:34:13 +0800
440462198 Merge tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"Pretty calm round, mostly just NVMe and a bit of MD:

- NVMe updates (via Christoph)
- improve the APST configuration algorithm (Alexey Bogoslavsky)
- look for StorageD3Enable on companion ACPI device
(Mario Limonciello)
- allow selecting the network interface for TCP connections
(Martin Belanger)
- misc cleanups (Amit Engel, Chaitanya Kulkarni, Colin Ian King,
Christoph)
- move the ACPI StorageD3 code to drivers/acpi/ and add quirks
for certain AMD CPUs (Mario Limonciello)
- zoned device support for nvmet (Chaitanya Kulkarni)
- fix the rules for changing the serial number in nvmet
(Noam Gottlieb)
- various small fixes and cleanups (Dan Carpenter, JK Kim,
Chaitanya Kulkarni, Hannes Reinecke, Wesley Sheng, Geert
Uytterhoeven, Daniel Wagner)

- MD updates (Via Song)
- iostats rewrite (Guoqing Jiang)
- raid5 lock contention optimization (Gal Ofri)

- Fall through warning fix (Gustavo)

- Misc fixes (Gustavo, Jiapeng)"

* tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block: (78 commits)
nvmet: use NVMET_MAX_NAMESPACES to set nn value
loop: Fix missing discard support when using LOOP_CONFIGURE
nvme.h: add missing nvme_lba_range_type endianness annotations
nvme: remove zeroout memset call for struct
nvme-pci: remove zeroout memset call for struct
nvmet: remove zeroout memset call for struct
nvmet: add ZBD over ZNS backend support
nvmet: add Command Set Identifier support
nvmet: add nvmet_req_bio put helper for backends
nvmet: add req cns error complete helper
block: export blk_next_bio()
nvmet: remove local variable
nvmet: use nvme status value directly
nvmet: use u32 type for the local variable nsid
nvmet: use u32 for nvmet_subsys max_nsid
nvmet: use req->cmd directly in file-ns fast path
nvmet: use req->cmd directly in bdev-ns fast path
nvmet: make ver stable once connection established
nvmet: allow mn change if subsys not discovered
nvmet: make sn stable once connection was established
...

Linus Torvalds
2021-07-01 03:21:16 +0800
df668a5fe Merge tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:

- disk events cleanup (Christoph)

- gendisk and request queue allocation simplifications (Christoph)

- bdev_disk_changed cleanups (Christoph)

- IO priority improvements (Bart)

- Chained bio completion trace fix (Edward)

- blk-wbt fixes (Jan)

- blk-wbt enable/disable fix (Zhang)

- Scheduler dispatch improvements (Jan, Ming)

- Shared tagset scheduler improvements (John)

- BFQ updates (Paolo, Luca, Pietro)

- BFQ lock inversion fix (Jan)

- Documentation improvements (Kir)

- CLONE_IO block cgroup fix (Tejun)

- Remove of ancient and deprecated block dump feature (zhangyi)

- Discard merge fix (Ming)

- Misc fixes or followup fixes (Colin, Damien, Dan, Long, Max, Thomas,
Yang)

* tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block: (129 commits)
block: fix discard request merge
block/mq-deadline: Remove a WARN_ON_ONCE() call
blk-mq: update hctx->dispatch_busy in case of real scheduler
blk: Fix lock inversion between ioc lock and bfqd lock
bfq: Remove merged request already in bfq_requests_merged()
block: pass a gendisk to bdev_disk_changed
block: move bdev_disk_changed
block: add the events* attributes to disk_attrs
block: move the disk events code to a separate file
block: fix trace completion for chained bio
block/partitions/msdos: Fix typo inidicator -> indicator
block, bfq: reset waker pointer with shared queues
block, bfq: check waker only for queues with no in-flight I/O
block, bfq: avoid delayed merge of async queues
block, bfq: boost throughput by extending queue-merging times
block, bfq: consider also creation time in delayed stable merge
block, bfq: fix delayed stable merge check
block, bfq: let also stably merged queues enjoy weight raising
blk-wbt: make sure throttle is enabled properly
blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled()
...

Linus Torvalds
2021-07-01 03:12:56 +0800

29 Jun, 2021

1 commit

2705dfb20 block: fix discard request merge ... Browse Code »

ll_new_hw_segment() is reached only in case of single range discard
merge, and we don't have max discard segment size limit actually, so
it is wrong to run the following check:

if (req->nr_phys_segments + nr_phys_segs > blk_rq_get_max_segments(req))

it may be always false since req->nr_phys_segments is initialized as
one, and bio's segment count is still 1, blk_rq_get_max_segments(reg)
is 1 too.

Fix the issue by not doing the check and bypassing the calculation of
discard request's nr_phys_segments.

Based on analysis from Wang Shanker.

Cc: Christoph Hellwig
Reported-by: Wang Shanker
Signed-off-by: Ming Lei
Link: https://lore.kernel.org/r/20210628023312.1903255-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe

Ming Lei
2021-06-29 21:41:08 +0800

28 Jun, 2021

1 commit

c06bc5a3f block/mq-deadline: Remove a WARN_ON_ONCE() call ... Browse Code »

The purpose of the WARN_ON_ONCE() statement in dd_insert_request() is to
verify that dd_prepare_request() cleared rq->elv.priv[0]. Since
dd_prepare_request() is called during request initialization but not if a
request is requeued, a warning is triggered if a request is requeued. Fix
this by removing the WARN_ON_ONCE() statement. This patch suppresses the
following kernel warning:

WARNING: CPU: 28 PID: 432 at block/mq-deadline-main.c:740 dd_insert_request+0x4d4/0x5b0
Workqueue: kblockd blk_mq_requeue_work
Call Trace:
dd_insert_requests+0xfa/0x130
blk_mq_sched_insert_request+0x22c/0x240
blk_mq_requeue_work+0x21c/0x2d0
process_one_work+0x4c2/0xa70
worker_thread+0x2e5/0x6d0
kthread+0x21c/0x250
ret_from_fork+0x1f/0x30

Reported-by: Sachin Sant
Fixes: 08a9ad8bf607 ("block/mq-deadline: Add cgroup support")
Signed-off-by: Bart Van Assche
Link: https://lore.kernel.org/r/20210627211112.12720-1-bvanassche@acm.org
Signed-off-by: Jens Axboe

Bart Van Assche
2021-06-28 06:25:10 +0800

25 Jun, 2021

7 commits

cb9516be7 blk-mq: update hctx->dispatch_busy in case of real scheduler ... Browse Code »

Commit 6e6fcbc27e77 ("blk-mq: support batching dispatch in case of io")
starts to support io batching submission by using hctx->dispatch_busy.

However, blk_mq_update_dispatch_busy() isn't changed to update hctx->dispatch_busy
in that commit, so fix the issue by updating hctx->dispatch_busy in case
of real scheduler.

Reported-by: Jan Kara
Reviewed-by: Jan Kara
Fixes: 6e6fcbc27e77 ("blk-mq: support batching dispatch in case of io")
Signed-off-by: Ming Lei
Link: https://lore.kernel.org/r/20210625020248.1630497-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe

Ming Lei
2021-06-25 23:50:31 +0800
fd2ef39cc blk: Fix lock inversion between ioc lock and bfqd lock ... Browse Code »

Lockdep complains about lock inversion between ioc->lock and bfqd->lock:

bfqd -> ioc:
put_io_context+0x33/0x90 -> ioc->lock grabbed
blk_mq_free_request+0x51/0x140
blk_put_request+0xe/0x10
blk_attempt_req_merge+0x1d/0x30
elv_attempt_insert_merge+0x56/0xa0
blk_mq_sched_try_insert_merge+0x4b/0x60
bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
blk_mq_sched_insert_requests+0xd6/0x2b0
blk_mq_flush_plug_list+0x154/0x280
blk_finish_plug+0x40/0x60
ext4_writepages+0x696/0x1320
do_writepages+0x1c/0x80
__filemap_fdatawrite_range+0xd7/0x120
sync_file_range+0xac/0xf0

ioc->bfqd:
bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed
put_io_context_active+0x78/0xb0 -> ioc->lock grabbed
exit_io_context+0x48/0x50
do_exit+0x7e9/0xdd0
do_group_exit+0x54/0xc0

To avoid this inversion we change blk_mq_sched_try_insert_merge() to not
free the merged request but rather leave that upto the caller similarly
to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure
to free all the merged requests after dropping bfqd->lock.

Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
Reviewed-by: Ming Lei
Acked-by: Paolo Valente
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20210623093634.27879-3-jack@suse.cz
Signed-off-by: Jens Axboe

Jan Kara
2021-06-25 08:43:55 +0800
a921c655f bfq: Remove merged request already in bfq_requests_merged() ... Browse Code »

Currently, bfq does very little in bfq_requests_merged() and handles all
the request cleanup in bfq_finish_requeue_request() called from
blk_mq_free_request(). That is currently safe only because
blk_mq_free_request() is called shortly after bfq_requests_merged()
while bfqd->lock is still held. However to fix a lock inversion between
bfqd->lock and ioc->lock, we need to call blk_mq_free_request() after
dropping bfqd->lock. That would mean that already merged request could
be seen by other processes inside bfq queues and possibly dispatched to
the device which is wrong. So move cleanup of the request from
bfq_finish_requeue_request() to bfq_requests_merged().

Acked-by: Paolo Valente
Signed-off-by: Jan Kara
Link: https://lore.kernel.org/r/20210623093634.27879-2-jack@suse.cz
Signed-off-by: Jens Axboe

Jan Kara
2021-06-25 08:43:54 +0800
0384264ea block: pass a gendisk to bdev_disk_changed ... Browse Code »

bdev_disk_changed can only operate on whole devices. Make that clear
by passing a gendisk instead of the struct block_device.

Signed-off-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20210624123240.441814-3-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-06-25 02:01:06 +0800
630161cfd block: move bdev_disk_changed ... Browse Code »

Move bdev_disk_changed to block/partitions/core.c, together with the
rest of the partition scanning code.

Signed-off-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20210624123240.441814-2-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-06-25 02:01:06 +0800
2bc8cda5e block: add the events* attributes to disk_attrs ... Browse Code »

Add the events attributes to the disk_attrs array, which ensures they are
added by the driver core when the device is created rather than adding
them after the device has been added, which is racy versus uevents and
requires more boilerplate code.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Link: https://lore.kernel.org/r/20210624073843.251178-3-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-06-25 02:00:22 +0800
d5870edfa block: move the disk events code to a separate file ... Browse Code »

Move the code for handling disk events from genhd.c into a new file
as it isn't very related to the rest of the file while at the same
time requiring lots of forward declarations.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Link: https://lore.kernel.org/r/20210624073843.251178-2-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-06-25 02:00:22 +0800

24 Jun, 2021

1 commit

60b6a7e6a block: fix trace completion for chained bio ... Browse Code »

For chained bio, trace_block_bio_complete in bio_endio is currently called
only by the parent bio once upon all chained bio completed.
However, the sector and size for the parent bio are modified in bio_split.
Therefore, the size and sector of the complete events might not match the
queue events in blktrace.

The original fix of bio completion trace ("block: trace
completion of all bios.") wants multiple complete events to correspond
to one queue event but missed this.

The issue can be reproduced by md/raid5 read with bio cross chunks.

To fix, move trace completion into the loop for every chained bio to call.

Fixes: fbbaf700e7b1 ("block: trace completion of all bios.")
Reviewed-by: Wade Liang
Reviewed-by: BingJing Chang
Signed-off-by: Edward Hsieh
Reviewed-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20210624123030.27014-1-edwardh@synology.com
Signed-off-by: Jens Axboe

Edward Hsieh
2021-06-24 23:53:50 +0800

22 Jun, 2021

14 commits

ddcc5c544 block/partitions/msdos: Fix typo inidicator -> indicator ... Browse Code »

Just a fix for a small typo in msdos_partition().

Signed-off-by: Thomas Bracht Laumann Jespersen
Link: https://lore.kernel.org/r/20210619195130.19348-1-t@laumann.xyz
Signed-off-by: Jens Axboe

Thomas Bracht Laumann Jespersen
2021-06-22 05:03:41 +0800
9a2ac41b1 block, bfq: reset waker pointer with shared queues ... Browse Code »

Commit 85686d0dc194 ("block, bfq: keep shared queues out of the waker
mechanism") leaves shared bfq_queues out of the waker-detection
mechanism. It attains this goal by not updating the pointer
last_completed_rq_bfqq, if the last request completed belongs to a
shared bfq_queue (so that the pointer will not point to the shared
bfq_queue).

Yet this has a side effect: the pointer last_completed_rq_bfqq keeps
pointing, deceptively, to a bfq_queue that actually is not the last
one to have had a request completed. As a consequence, such a
bfq_queue may deceptively be considered as a waker of some bfq_queue,
even of some shared bfq_queue.

To address this issue, reset last_completed_rq_bfqq if the last
request completed belongs to a shared queue.

Fixes: 85686d0dc194 ("block, bfq: keep shared queues out of the waker mechanism")
Signed-off-by: Paolo Valente
Link: https://lore.kernel.org/r/20210619140948.98712-8-paolo.valente@linaro.org
Signed-off-by: Jens Axboe

Paolo Valente
2021-06-22 05:03:41 +0800
efc72524b block, bfq: check waker only for queues with no in-flight I/O ... Browse Code »

Consider two bfq_queues, say Q1 and Q2, with Q2 empty. If a request of
Q1 gets completed shortly before a new request arrives for Q2, then
BFQ flags Q1 as a candidate waker for Q2. Yet, the arrival of this new
request may have a different cause, in the following case. If also Q2
has requests in flight while waiting for the arrival of a new request,
then the completion of its own requests may be the actual cause of the
awakening of the process that sends I/O to Q2. So Q1 may be flagged
wrongly as a candidate waker.

This commit avoids this deceptive flagging, by disabling
candidate-waker flagging for Q2, if Q2 has in-flight I/O.

Signed-off-by: Paolo Valente
Link: https://lore.kernel.org/r/20210619140948.98712-7-paolo.valente@linaro.org
Signed-off-by: Jens Axboe

Paolo Valente
2021-06-22 05:03:41 +0800
bd3664b36 block, bfq: avoid delayed merge of async queues ... Browse Code »

Since commit 430a67f9d616 ("block, bfq: merge bursts of newly-created
queues"), BFQ may schedule a merge between a newly created sync
bfq_queue, say Q2, and the last sync bfq_queue created, say Q1. To this
goal, BFQ stores the address of Q1 in the field bic->stable_merge_bfqq
of the bic associated with Q2. So, when the time for the possible merge
arrives, BFQ knows which bfq_queue to merge Q2 with. In particular,
BFQ checks for possible merges on request arrivals.

Yet the same bic may also be associated with an async bfq_queue, say
Q3. So, if a request for Q3 arrives, then the above check may happen
to be executed while the bfq_queue at hand is Q3, instead of Q2. In
this case, Q1 happens to be merged with an async bfq_queue. This is
not only a conceptual mistake, because async queues are to be kept out
of queue merging, but also a bug that leads to inconsistent states.

This commits simply filters async queues out of delayed merges.

Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
Tested-by: Holger Hoffstätte
Signed-off-by: Paolo Valente
Link: https://lore.kernel.org/r/20210619140948.98712-6-paolo.valente@linaro.org
Signed-off-by: Jens Axboe

Paolo Valente
2021-06-22 05:03:41 +0800
7812472f9 block, bfq: boost throughput by extending queue-merging times ... Browse Code »

One of the methods with which bfq boosts throughput is by merging queues.
One of the merging variants in bfq is the stable merge.
This mechanism is activated between two queues only if they are created
within a certain maximum time T1 from each other.
Merging can happen soon or be delayed. In the second case, before
merging, bfq needs to evaluate a throughput-boost parameter that
indicates whether the queue generates a high throughput is served alone.
Merging occurs when this throughput-boost is not high enough.
In particular, this parameter is evaluated and late merging may occur
only after at least a time T2 from the creation of the queue.

Currently T1 and T2 are set to 180ms and 200ms, respectively.
In this way the merging mechanism rarely occurs because time is not
enough. This results in a noticeable lowering of the overall throughput
with some workloads (see the example below).

This commit introduces two constants bfq_activation_stable_merging and
bfq_late_stable_merging in order to increase the duration of T1 and T2.
Both the stable merging activation time and the late merging
time are set to 600ms. This value has been experimentally evaluated
using sqlite benchmark in the Phoronix Test Suite on a HDD.
The duration of the benchmark before this fix was 111.02s, while now
it has reached 97.02s, a better result than that of all the other
schedulers.

Signed-off-by: Pietro Pedroni
Signed-off-by: Paolo Valente
Link: https://lore.kernel.org/r/20210619140948.98712-5-paolo.valente@linaro.org
Signed-off-by: Jens Axboe

Pietro Pedroni
2021-06-22 05:03:41 +0800
d4f49983f block, bfq: consider also creation time in delayed stable merge ... Browse Code »

Since commit 430a67f9d616 ("block, bfq: merge bursts of newly-created
queues"), BFQ may schedule a merge between a newly created sync
bfq_queue and the last sync bfq_queue created. Such a merging is not
performed immediately, because BFQ needs first to find out whether the
newly created queue actually reaches a higher throughput if not merged
at all (and in that case BFQ will not perform any stable merging). To
check that, a little time must be waited after the creation of the new
queue, so that some I/O can flow in the queue, and statistics on such
I/O can be computed.

Yet, to evaluate the above waiting time, the last split time is
considered as start time, instead of the creation time of the
queue. This is a mistake, because considering the split time is
correct only in the following scenario.

The queue undergoes a non-stable merges on the arrival of its very
first I/O request, due to close I/O with some other queue. While the
queue is merged for close I/O, stable merging is not considered. Yet
the queue may then happen to be split, if the close I/O finishes (or
happens to be a false positive). From this time on, the queue can
again be considered for stable merging. But, again, a little time must
elapse, to let some new I/O flow in the queue and to get updated
statistics. To wait for this time, the split time is to be taken into
account.

Yet, if the queue does not undergo a non-stable merge on the arrival
of its very first request, then BFQ immediately checks whether the
stable merge is to be performed. It happens because the split time for
a queue is initialized to minus infinity when the queue is created.

This commit fixes this mistake by adding the missing condition. Now
the check for delayed stable-merge is performed after a little time is
elapsed not only from the last queue split time, but also from the
creation time of the queue.

Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
Signed-off-by: Paolo Valente
Link: https://lore.kernel.org/r/20210619140948.98712-4-paolo.valente@linaro.org
Signed-off-by: Jens Axboe

Paolo Valente
2021-06-22 05:03:41 +0800
e03f2ab78 block, bfq: fix delayed stable merge check ... Browse Code »

When attempting to schedule a merge of a given bfq_queue with the currently
in-service bfq_queue or with a cooperating bfq_queue among the scheduled
bfq_queues, delayed stable merge is checked for rotational or non-queueing
devs. For this stable merge to be performed, some conditions must be met.
If the current bfq_queue underwent some split from some merged bfq_queue,
one of these conditions is that two hundred milliseconds must elapse from
split, otherwise this condition is always met.

Unfortunately, by mistake, time_is_after_jiffies() was written instead of
time_is_before_jiffies() for this check, verifying that less than two
hundred milliseconds have elapsed instead of verifying that at least two
hundred milliseconds have elapsed.

Fix this issue by replacing time_is_after_jiffies() with
time_is_before_jiffies().

Signed-off-by: Luca Mariotti
Signed-off-by: Paolo Valente
Signed-off-by: Pietro Pedroni
Link: https://lore.kernel.org/r/20210619140948.98712-3-paolo.valente@linaro.org
Signed-off-by: Jens Axboe

Luca Mariotti
2021-06-22 05:03:41 +0800
511a26992 block, bfq: let also stably merged queues enjoy weight raising ... Browse Code »

Merged bfq_queues are kept out of weight-raising (low-latency)
mechanisms. The reason is that these queues are usually created for
non-interactive and non-soft-real-time tasks. Yet this is not the case
for stably-merged queues. These queues are merged just because they
are created shortly after each other. So they may easily serve the I/O
of an interactive or soft-real time application, if the application
happens to spawn multiple processes.

To address this issue, this commits lets also stably-merged queued
enjoy weight raising.

Signed-off-by: Paolo Valente
Link: https://lore.kernel.org/r/20210619140948.98712-2-paolo.valente@linaro.org
Signed-off-by: Jens Axboe

Paolo Valente
2021-06-22 05:03:41 +0800
76a804081 blk-wbt: make sure throttle is enabled properly ... Browse Code »

After commit a79050434b45 ("blk-rq-qos: refactor out common elements of
blk-wbt"), if throttle was disabled by wbt_disable_default(), we could
not enable again, fix this by set enable_state back to
WBT_STATE_ON_DEFAULT.

Fixes: a79050434b45 ("blk-rq-qos: refactor out common elements of blk-wbt")
Signed-off-by: Zhang Yi
Link: https://lore.kernel.org/r/20210619093700.920393-3-yi.zhang@huawei.com
Signed-off-by: Jens Axboe

Zhang Yi
2021-06-22 05:03:41 +0800
1d0903d61 blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled() ... Browse Code »

Now that we disable wbt by simply zero out rwb->wb_normal in
wbt_disable_default() when switch elevator to bfq, but it's not safe
because it will become false positive if we change queue depth. If it
become false positive between wbt_wait() and wbt_track() when submit
write request, it will lead to drop rqw->inflight to -1 in wbt_done(),
which will end up trigger IO hung. Fix this issue by introduce a new
state which mean the wbt was disabled.

Fixes: a79050434b45 ("blk-rq-qos: refactor out common elements of blk-wbt")
Signed-off-by: Zhang Yi
Link: https://lore.kernel.org/r/20210619093700.920393-2-yi.zhang@huawei.com
Signed-off-by: Jens Axboe

Zhang Yi
2021-06-22 05:03:41 +0800
fb926032b block/mq-deadline: Prioritize high-priority requests ... Browse Code »

While one or more requests with a certain I/O priority are pending, do not
dispatch lower priority requests. Dispatch lower priority requests anyway
after the "aging" time has expired.

This patch has been tested as follows:

modprobe scsi_debug ndelay=1000000 max_queue=16 &&
sd='' &&
while [ -z "$sd" ]; do
sd=/dev/$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
done &&
echo $((100*1000)) > /sys/block/$sd/queue/iosched/aging_expire &&
cd /sys/fs/cgroup/blkio/ &&
echo $$ >cgroup.procs &&
echo restrict-to-be >blkio.prio.class &&
mkdir -p hipri &&
cd hipri &&
echo none-to-rt >blkio.prio.class &&
{ max-iops -a1 -d32 -j1 -e mq-deadline $sd >& ~/low-pri.txt & } &&
echo $$ >cgroup.procs &&
max-iops -a1 -d32 -j1 -e mq-deadline $sd >& ~/hi-pri.txt

Result:
* 11000 IOPS for the high-priority job
* 40 IOPS for the low-priority job

If the aging expiry time is changed from 100s into 0, the IOPS results change
into 6712 and 6796 IOPS.

The max-iops script is a script that runs fio with the following arguments:
--bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
--norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
--iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
--iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
--filename=${positional_argument_1}

Reviewed-by: Damien Le Moal
Cc: Hannes Reinecke
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Johannes Thumshirn
Cc: Himanshu Madhani
Signed-off-by: Bart Van Assche
Link: https://lore.kernel.org/r/20210618004456.7280-17-bvanassche@acm.org
Signed-off-by: Jens Axboe

Bart Van Assche
2021-06-22 05:03:41 +0800
08a9ad8bf block/mq-deadline: Add cgroup support ... Browse Code »

Maintain statistics per cgroup and export these to user space. These
statistics are essential for verifying whether the proper I/O priorities
have been assigned to requests. An example of the statistics data with
this patch applied:

$ cat /sys/fs/cgroup/io.stat
11:2 rbytes=0 wbytes=0 rios=3 wios=0 dbytes=0 dios=0 [NONE] dispatched=0 inserted=0 merged=171 [RT] dispatched=0 inserted=0 merged=0 [BE] dispatched=0 inserted=0 merged=0 [IDLE] dispatched=0 inserted=0 merged=0
8:32 rbytes=2142720 wbytes=0 rios=105 wios=0 dbytes=0 dios=0 [NONE] dispatched=0 inserted=0 merged=171 [RT] dispatched=0 inserted=0 merged=0 [BE] dispatched=0 inserted=0 merged=0 [IDLE] dispatched=0 inserted=0 merged=0

Cc: Damien Le Moal
Cc: Hannes Reinecke
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Johannes Thumshirn
Cc: Himanshu Madhani
Signed-off-by: Bart Van Assche
Link: https://lore.kernel.org/r/20210618004456.7280-16-bvanassche@acm.org
Signed-off-by: Jens Axboe

Bart Van Assche
2021-06-22 05:03:41 +0800
38ba64d12 block/mq-deadline: Track I/O statistics ... Browse Code »

Track I/O statistics per I/O priority and export these statistics to
debugfs. These statistics help developers of the deadline scheduler.

Cc: Damien Le Moal
Cc: Hannes Reinecke
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Johannes Thumshirn
Cc: Himanshu Madhani
Signed-off-by: Bart Van Assche
Link: https://lore.kernel.org/r/20210618004456.7280-15-bvanassche@acm.org
Signed-off-by: Jens Axboe

Bart Van Assche
2021-06-22 05:03:41 +0800
c807ab520 block/mq-deadline: Add I/O priority support ... Browse Code »

Maintain one dispatch list and one FIFO list per I/O priority class: RT, BE
and IDLE. Maintain statistics for each priority level. Split the debugfs
attributes per priority level as follows:

$ ls /sys/kernel/debug/block/.../sched/
async_depth dispatch2 read_next_rq write2_fifo_list
batching read0_fifo_list starved write_next_rq
dispatch0 read1_fifo_list write0_fifo_list
dispatch1 read2_fifo_list write1_fifo_list

Cc: Damien Le Moal
Cc: Hannes Reinecke
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Johannes Thumshirn
Cc: Himanshu Madhani
Signed-off-by: Bart Van Assche
Link: https://lore.kernel.org/r/20210618004456.7280-14-bvanassche@acm.org
Signed-off-by: Jens Axboe

Bart Van Assche
2021-06-22 05:03:40 +0800