Eric Lee / smarc-fsl-linux-kernel

06 Dec, 2020

1 commit

be1515bad Merge tag 'block-5.10-2020-12-05' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fix from Jens Axboe:
"Single fix for an issue with chunk_sectors and stacked devices"

* tag 'block-5.10-2020-12-05' of git://git.kernel.dk/linux-block:
block: use gcd() to fix chunk_sectors limit stacking

Linus Torvalds
2020-12-06 06:45:30 +0800

05 Dec, 2020

2 commits

b3298500b Merge tag 'for-5.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/d… ... Browse Code »

…evice-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- Fix DM's bio splitting changes that were made during v5.9. This
restores splitting in terms of varied per-target ti->max_io_len
rather than use block core's single stacked 'chunk_sectors' limit.

- Like DM crypt, update DM integrity to not use crypto drivers that
have CRYPTO_ALG_ALLOCATES_MEMORY set.

- Fix DM writecache target's argument parsing and status display.

- Remove needless BUG() from dm writecache's persistent_memory_claim()

- Remove old gcc workaround in DM cache target's block_div() for ARM
link errors now that gcc >= 4.9 is required.

- Fix RCU locking in dm_blk_report_zones and dm_dax_zero_page_range.

- Remove old, and now frowned upon, BUG_ON(in_interrupt()) in
dm_table_event().

- Remove invalid sparse annotations from dm_prepare_ioctl() and
dm_unprepare_ioctl().

* tag 'for-5.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: remove invalid sparse __acquires and __releases annotations
dm: fix double RCU unlock in dm_dax_zero_page_range() error path
dm: fix IO splitting
dm writecache: remove BUG() and fail gracefully instead
dm table: Remove BUG_ON(in_interrupt())
dm: fix bug with RCU locking in dm_blk_report_zones
Revert "dm cache: fix arm link errors with inline"
dm writecache: fix the maximum number of arguments
dm writecache: advance the number of arguments when reporting max_age
dm integrity: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY

Linus Torvalds
2020-12-05 05:28:39 +0800
3ee16db39 dm: fix IO splitting ... Browse Code »

Commit 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account
for target-specific splitting") caused a couple regressions:
1) Using lcm_not_zero() when stacking chunk_sectors was a bug because
chunk_sectors must reflect the most limited of all devices in the
IO stack.
2) DM targets that set max_io_len but that do _not_ provide an
.iterate_devices method no longer had there IO split properly.

And commit 5091cdec56fa ("dm: change max_io_len() to use
blk_max_size_offset()") also caused a regression where DM no longer
supported varied (per target) IO splitting. The implication being the
potential for severely reduced performance for IO stacks that use a DM
target like dm-cache to hide performance limitations of a slower
device (e.g. one that requires 4K IO splitting).

Coming full circle: Fix all these issues by discontinuing stacking
chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(),
add optional chunk_sectors override argument to blk_max_size_offset()
and update DM's max_io_len() to pass ti->max_io_len to its
blk_max_size_offset() call.

Passing in an optional chunk_sectors override to blk_max_size_offset()
allows for code reuse of block's centralized calculation for max IO
size based on provided offset and split boundary.

Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
Fixes: 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()")
Cc: stable@vger.kernel.org
Reported-by: John Dorminy
Reported-by: Bruce Johnston
Reported-by: Kirill Tkhai
Reviewed-by: John Dorminy
Signed-off-by: Mike Snitzer
Reviewed-by: Jens Axboe

Mike Snitzer
2020-12-05 03:53:15 +0800

02 Dec, 2020

1 commit

7e7986f9d block: use gcd() to fix chunk_sectors limit stacking ... Browse Code »

commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
reflect the most limited of all devices in the IO stack.

Otherwise malformed IO may result. E.g.: prior to this fix,
->chunk_sectors = lcm_not_zero(8, 128) would result in
blk_max_size_offset() splitting IO at 128 sectors rather than the
required more restrictive 8 sectors.

And since commit 07d098e6bbad ("block: allow 'chunk_sectors' to be
non-power-of-2") care must be taken to properly stack chunk_sectors to
be compatible with the possibility that a non-power-of-2 chunk_sectors
may be stacked. This is why gcd() is used instead of reverting back
to using min_not_zero().

Fixes: 22ada802ede8 ("block: use lcm_not_zero() when stacking chunk_sectors")
Fixes: 07d098e6bbad ("block: allow 'chunk_sectors' to be non-power-of-2")
Reported-by: John Dorminy
Reported-by: Bruce Johnston
Signed-off-by: Mike Snitzer
Reviewed-by: John Dorminy
Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Mike Snitzer
2020-12-02 02:02:55 +0800

21 Nov, 2020

1 commit

47a846536 block/keyslot-manager: prevent crash when num_slots=1 ... Browse Code »

If there is only one keyslot, then blk_ksm_init() computes
slot_hashtable_size=1 and log_slot_ht_size=0. This causes
blk_ksm_find_keyslot() to crash later because it uses
hash_ptr(key, log_slot_ht_size) to find the hash bucket containing the
key, and hash_ptr() doesn't support the bits == 0 case.

Fix this by making the hash table always have at least 2 buckets.

Tested by running:

kvm-xfstests -c ext4 -g encrypt -m inlinecrypt \
-o blk-crypto-fallback.num_keyslots=1

Fixes: 1b2628397058 ("block: Keyslot Manager for Inline Encryption")
Signed-off-by: Eric Biggers
Signed-off-by: Jens Axboe

Eric Biggers
2020-11-21 02:52:52 +0800

15 Nov, 2020

1 commit

b7131ee0b blk-cgroup: fix a hd_struct leak in blkcg_fill_root_iostats ... Browse Code »

disk_get_part needs to be paired with a disk_put_part.

Cc: stable@vger.kernel.org
Fixes: ef45fe470e1 ("blk-cgroup: show global disk stats in root cgroup io.stat")
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-11-15 02:17:34 +0800

14 Nov, 2020

1 commit

9f16a6673 block: mark flush request as IDLE when it is really finished ... Browse Code »

For avoiding use-after-free on flush request, we call its .end_io() from
both timeout code path and __blk_mq_end_request().

When flush request's ref doesn't drop to zero, it is still used, we
can't mark it as IDLE, so fix it by marking IDLE when its refcount drops
to zero really.

Fixes: 65ff5cd04551 ("blk-mq: mark flush request as IDLE in flush_end_io()")
Signed-off-by: Ming Lei
Cc: Yi Zhang
Signed-off-by: Jens Axboe

Ming Lei
2020-11-14 05:24:16 +0800

13 Nov, 2020

1 commit

7e890c37c block: add a return value to set_capacity_revalidate_and_notify ... Browse Code »

Return if the function ended up sending an uevent or not.

Cc: stable@vger.kernel.org # v5.9
Signed-off-by: Christoph Hellwig
Reviewed-by: Petr Vorel
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-11-13 04:59:04 +0800

30 Oct, 2020

1 commit

65ff5cd04 blk-mq: mark flush request as IDLE in flush_end_io() ... Browse Code »

Mark flush request as IDLE in its .end_io(), aligning it with how normal
requests behave. The flush request stays in in-flight tags if we're not
using an IO scheduler, so we need to change its state into IDLE.
Otherwise, we will hang in blk_mq_tagset_wait_completed_request() during
error recovery because flush the request state is kept as COMPLETED.

Reported-by: Yi Zhang
Signed-off-by: Ming Lei
Tested-by: Yi Zhang
Cc: Chao Leng
Cc: Sagi Grimberg
Signed-off-by: Jens Axboe

Ming Lei
2020-10-30 22:33:49 +0800

28 Oct, 2020

1 commit

4977d121b block: advance iov_iter on bio_add_hw_page failure ... Browse Code »

When the bio's size reaches max_append_sectors, bio_add_hw_page returns
0 then __bio_iov_append_get_pages returns -EINVAL. This is an expected
result of building a small enough bio not to be split in the IO path.
However, iov_iter is not advanced in this case, causing the same pages
are filled for the bio again and again.

Fix the case by properly advancing the iov_iter for already processed
pages.

Fixes: 0512a75b98f8 ("block: Introduce REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org # 5.8+
Reviewed-by: Johannes Thumshirn
Signed-off-by: Naohiro Aota
Signed-off-by: Jens Axboe

Naohiro Aota
2020-10-28 21:51:02 +0800

26 Oct, 2020

2 commits

f255c19b3 blk-cgroup: Pre-allocate tree node on blkg_conf_prep ... Browse Code »

Similarly to commit 457e490f2b741 ("blkcg: allocate struct blkcg_gq
outside request queue spinlock"), blkg_create can also trigger
occasional -ENOMEM failures at the radix insertion because any
allocation inside blkg_create has to be non-blocking, making it more
likely to fail. This causes trouble for userspace tools trying to
configure io weights who need to deal with this condition.

This patch reduces the occurrence of -ENOMEMs on this path by preloading
the radix tree element on a GFP_KERNEL context, such that we guarantee
the later non-blocking insertion won't fail.

A similar solution exists in blkcg_init_queue for the same situation.

Acked-by: Tejun Heo
Signed-off-by: Gabriel Krisman Bertazi
Signed-off-by: Jens Axboe

Gabriel Krisman Bertazi
2020-10-26 21:57:47 +0800
52abfcbd5 blk-cgroup: Fix memleak on error path ... Browse Code »

If new_blkg allocation raced with blk_policy change and
blkg_lookup_check fails, new_blkg is leaked.

Acked-by: Tejun Heo
Signed-off-by: Gabriel Krisman Bertazi
Signed-off-by: Jens Axboe

Gabriel Krisman Bertazi
2020-10-26 21:57:46 +0800

25 Oct, 2020

1 commit

d76913908 Merge tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- NVMe pull request from Christoph
- rdma error handling fixes (Chao Leng)
- fc error handling and reconnect fixes (James Smart)
- fix the qid displace when tracing ioctl command (Keith Busch)
- don't use BLK_MQ_REQ_NOWAIT for passthru (Chaitanya Kulkarni)
- fix MTDT for passthru (Logan Gunthorpe)
- blacklist Write Same on more devices (Kai-Heng Feng)
- fix an uninitialized work struct (zhenwei pi)"

- lightnvm out-of-bounds fix (Colin)

- SG allocation leak fix (Doug)

- rnbd fixes (Gioh, Guoqing, Jack)

- zone error translation fixes (Keith)

- kerneldoc markup fix (Mauro)

- zram lockdep fix (Peter)

- Kill unused io_context members (Yufen)

- NUMA memory allocation cleanup (Xianting)

- NBD config wakeup fix (Xiubo)

* tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block: (27 commits)
block: blk-mq: fix a kernel-doc markup
nvme-fc: shorten reconnect delay if possible for FC
nvme-fc: wait for queues to freeze before calling update_hr_hw_queues
nvme-fc: fix error loop in create_hw_io_queues
nvme-fc: fix io timeout to abort I/O
null_blk: use zone status for max active/open
nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru
nvmet: cleanup nvmet_passthru_map_sg()
nvmet: limit passthru MTDS by BIO_MAX_PAGES
nvmet: fix uninitialized work for zero kato
nvme-pci: disable Write Zeroes on Sandisk Skyhawk
nvme: use queuedata for nvme_req_qid
nvme-rdma: fix crash due to incorrect cqe
nvme-rdma: fix crash when connect rejected
block: remove unused members for io_context
blk-mq: remove the calling of local_memory_node()
zram: Fix __zram_bvec_{read,write}() locking order
skd_main: remove unused including
sgl_alloc_order: fix memory leak
lightnvm: fix out-of-bounds write to array devices->info[]
...

Linus Torvalds
2020-10-25 03:46:42 +0800

24 Oct, 2020

1 commit

24f7bb886 block: blk-mq: fix a kernel-doc markup ... Browse Code »

Fix a typo:
blk_mq_run_hw_queue -> blk_mq_run_hw_queues

Signed-off-by: Mauro Carvalho Chehab
Signed-off-by: Jens Axboe

Mauro Carvalho Chehab
2020-10-24 02:20:17 +0800

20 Oct, 2020

1 commit

576e85c5e blk-mq: remove the calling of local_memory_node() ... Browse Code »

We don't need to check whether the node is memoryless numa node before
calling allocator interface. SLUB(and SLAB,SLOB) relies on the page
allocator to pick a node. Page allocator should deal with memoryless
nodes just fine. It has zonelists constructed for each possible nodes.
And it will automatically fall back into a node which is closest to the
requested node. As long as __GFP_THISNODE is not enforced of course.

The code comments of kmem_cache_alloc_node() of SLAB also showed this:
* Fallback to other node is possible if __GFP_THISNODE is not set.

blk-mq code doesn't set __GFP_THISNODE, so we can remove the calling
of local_memory_node().

Signed-off-by: Xianting Tian
Signed-off-by: Jens Axboe

Xianting Tian
2020-10-20 21:08:17 +0800

15 Oct, 2020

3 commits

5cd3ddc18 docs: bio: fix a kerneldoc markup ... Browse Code »

Fix this warning:

./block/bio.c:1098: WARNING: Inline emphasis start-string without end-string.

The thing is that *iter is not a valid markup.

That seems to be a typo:
*iter -> @iter

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2020-10-15 13:49:48 +0800
5b874af62 block: bio: fix a warning at the kernel-doc markups ... Browse Code »

Using "@bio's parent" causes the following waring:
./block/bio.c:10: WARNING: Inline emphasis start-string without end-string.

The main problem here is that this would be converted into:

**bio**'s parent

By kernel-doc, which is not a valid notation. It would be
possible to use, instead, this kernel-doc markup:

``bio's`` parent

Yet, here, is probably simpler to just use an altenative language:

the parent of @bio

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2020-10-15 13:49:47 +0800
4815519ed Merge tag 'for-5.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git… ... Browse Code »

…/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- Improve DM core's bio splitting to use blk_max_size_offset(). Also
fix bio splitting for bios that were deferred to the worker thread
due to a DM device being suspended.

- Remove DM core's special handling of NVMe devices now that block core
has internalized efficiencies drivers previously needed to be
concerned about (via now removed direct_make_request).

- Fix request-based DM to not bounce through indirect dm_submit_bio;
instead have block core make direct call to blk_mq_submit_bio().

- Various DM core cleanups to simplify and improve code.

- Update DM cryot to not use drivers that set
CRYPTO_ALG_ALLOCATES_MEMORY.

- Fix DM raid's raid1 and raid10 discard limits for the purposes of
linux-stable. But then remove DM raid's discard limits settings now
that MD raid can efficiently handle large discards.

- A couple small cleanups across various targets.

* tag 'for-5.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix request-based DM to not bounce through indirect dm_submit_bio
dm: remove special-casing of bio-based immutable singleton target on NVMe
dm: export dm_copy_name_and_uuid
dm: fix comment in __dm_suspend()
dm: fold dm_process_bio() into dm_submit_bio()
dm: fix missing imposition of queue_limits from dm_wq_work() thread
dm snap persistent: simplify area_io()
dm thin metadata: Remove unused local variable when create thin and snap
dm raid: remove unnecessary discard limits for raid10
dm raid: fix discard limits for raid1 and raid10
dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
dm: use dm_table_get_device_name() where appropriate in targets
dm table: make 'struct dm_table' definition accessible to all of DM core
dm: eliminate need for start_io_acct() forward declaration
dm: simplify __process_abnormal_io()
dm: push use of on-stack flush_bio down to __send_empty_flush()
dm: optimize max_io_len() by inlining max_io_len_target_boundary()
dm: push md->immutable_target optimization down to __process_bio()
dm: change max_io_len() to use blk_max_size_offset()
dm table: stack 'chunk_sectors' limit to account for target-specific splitting

Linus Torvalds
2020-10-15 06:05:38 +0800

14 Oct, 2020

3 commits

3b481d913 block: add zone specific block statuses ... Browse Code »

A zoned device with limited resources to open or activate zones may
return an error when the host exceeds those limits. The same command may
be successful if retried later, but the host needs to wait for specific
zone states before it should expect a retry to succeed. Have the block
layer provide an appropriate status for these conditions so applications
can distinuguish this error for special handling.

Cc: linux-api@vger.kernel.org
Cc: Niklas Cassel
Reviewed-by: Christoph Hellwig
Reviewed-by: Damien Le Moal
Reviewed-by: Johannes Thumshirn
Reviewed-by: Martin K. Petersen
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe

Keith Busch
2020-10-14 05:05:05 +0800
7cd4ecd91 Merge tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"Here are the driver updates for 5.10.

A few SCSI updates in here too, in coordination with Martin as they
depend on core block changes for the shared tag bitmap.

This contains:

- NVMe pull requests via Christoph:
- fix keep alive timer modification (Amit Engel)
- order the PCI ID list more sensibly (Andy Shevchenko)
- cleanup the open by controller helper (Chaitanya Kulkarni)
- use an xarray for the CSE log lookup (Chaitanya Kulkarni)
- support ZNS in nvmet passthrough mode (Chaitanya Kulkarni)
- fix nvme_ns_report_zones (Christoph Hellwig)
- add a sanity check to nvmet-fc (James Smart)
- fix interrupt allocation when too many polled queues are
specified (Jeffle Xu)
- small nvmet-tcp optimization (Mark Wunderlich)
- fix a controller refcount leak on init failure (Chaitanya
Kulkarni)
- misc cleanups (Chaitanya Kulkarni)
- major refactoring of the scanning code (Christoph Hellwig)

- MD updates via Song:
- Bug fixes in bitmap code, from Zhao Heming
- Fix a work queue check, from Guoqing Jiang
- Fix raid5 oops with reshape, from Song Liu
- Clean up unused code, from Jason Yan
- Discard improvements, from Xiao Ni
- raid5/6 page offset support, from Yufen Yu

- Shared tag bitmap for SCSI/hisi_sas/null_blk (John, Kashyap,
Hannes)

- null_blk open/active zone limit support (Niklas)

- Set of bcache updates (Coly, Dongsheng, Qinglang)"

* tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (78 commits)
md/raid5: fix oops during stripe resizing
md/bitmap: fix memory leak of temporary bitmap
md: fix the checking of wrong work queue
md/bitmap: md_bitmap_get_counter returns wrong blocks
md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks
md/raid0: remove unused function is_io_in_chunk_boundary()
nvme-core: remove extra condition for vwc
nvme-core: remove extra variable
nvme: remove nvme_identify_ns_list
nvme: refactor nvme_validate_ns
nvme: move nvme_validate_ns
nvme: query namespace identifiers before adding the namespace
nvme: revalidate zone bitmaps in nvme_update_ns_info
nvme: remove nvme_update_formats
nvme: update the known admin effects
nvme: set the queue limits in nvme_update_ns_info
nvme: remove the 0 lba_shift check in nvme_update_ns_info
nvme: clean up the check for too large logic block sizes
nvme: freeze the queue over ->lba_shift updates
nvme: factor out a nvme_configure_metadata helper
...

Linus Torvalds
2020-10-14 04:04:41 +0800
3ad11d7ac Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:

- Series of merge handling cleanups (Baolin, Christoph)

- Series of blk-throttle fixes and cleanups (Baolin)

- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)

- Removal of bdget() as a generic API (Christoph)

- Removal of blkdev_get() as a generic API (Christoph)

- Cleanup of is-partition checks (Christoph)

- Series reworking disk revalidation (Christoph)

- Series cleaning up bio flags (Christoph)

- bio crypt fixes (Eric)

- IO stats inflight tweak (Gabriel)

- blk-mq tags fixes (Hannes)

- Buffer invalidation fixes (Jan)

- Allow soft limits for zone append (Johannes)

- Shared tag set improvements (John, Kashyap)

- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

- DM no-wait support (Mike, Konstantin)

- Request allocation improvements (Ming)

- Allow md/dm/bcache to use IO stat helpers (Song)

- Series improving blk-iocost (Tejun)

- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...

Linus Torvalds
2020-10-14 03:12:44 +0800

13 Oct, 2020

1 commit

85ed13e78 Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull compat iovec cleanups from Al Viro:
"Christoph's series around import_iovec() and compat variant thereof"

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
security/keys: remove compat_keyctl_instantiate_key_iov
mm: remove compat_process_vm_{readv,writev}
fs: remove compat_sys_vmsplice
fs: remove the compat readv/writev syscalls
fs: remove various compat readv/writev helpers
iov_iter: transparently handle compat iovecs in import_iovec
iov_iter: refactor rw_copy_check_uvector and import_iovec
iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c
compat.h: fix a spelling error in

Linus Torvalds
2020-10-13 07:35:51 +0800

10 Oct, 2020

8 commits

47ce030b7 blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue ... Browse Code »

blk_exit_queue will free elevator_data, while blk_mq_run_work_fn
will access it. Move cancel of hctx->run_work to the front of
blk_exit_queue to avoid use-after-free.

Fixes: 1b97871b501f ("blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release")
Signed-off-by: Yang Yang
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Yang Yang
2020-10-10 02:46:28 +0800
c72815241 blk-mq: get rid of the dead flush handle code path ... Browse Code »

After commit 923218f6166a ("blk-mq: don't allocate driver tag upfront
for flush rq"), blk_mq_submit_bio() will call blk_insert_flush()
directly to handle flush request rather than blk_mq_sched_insert_request()
in the case of elevator.

Then, all flush request either have set RQF_FLUSH_SEQ flag when call
blk_mq_sched_insert_request(), or have inserted into hctx->dispatch.
So, remove the dead code path.

Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe

Yufen Yu
2020-10-10 02:35:39 +0800
0546858c5 block: get rid of unnecessary local variable ... Browse Code »

Since whole elevator register is protectd by sysfs_lock, we
don't need extras 'has_elevator'. Just use q->elevator directly.

Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe

Yufen Yu
2020-10-10 02:34:06 +0800
f0c6ae09d block: fix comment and add lockdep assert ... Browse Code »

After commit b89f625e28d4 ("block: don't release queue's sysfs
lock during switching elevator"), whole elevator register and
unregister function are covered by sysfs_lock. So, remove wrong
comment and add lockdep assert.

Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe

Yufen Yu
2020-10-10 02:34:06 +0800
0841031ab blk-mq: use helper function to test hw stopped ... Browse Code »

We have introduced helper function blk_mq_hctx_stopped() to test
BLK_MQ_S_STOPPED.

Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe

Yufen Yu
2020-10-10 02:34:06 +0800
75e6c00fc block: use helper function to test queue register ... Browse Code »

We have defined common interface blk_queue_registered() to
test QUEUE_FLAG_REGISTERED. Just use it.

Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe

Yufen Yu
2020-10-10 02:34:06 +0800
6251b754f block: remove redundant mq check ... Browse Code »

elv_support_iosched() will check queue_is_mq() for us. So, remove
the redundant check to clean code.

Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe

Yufen Yu
2020-10-10 02:34:06 +0800
dd1c372d6 block: invoke blk_mq_exit_sched no matter whether have .exit_sched ... Browse Code »

We will register debugfs for scheduler no matter whether it have
defined callback funciton .exit_sched. So, blk_mq_exit_sched()
is always needed to unregister debugfs. Also, q->elevator should
be set as NULL after exiting scheduler.

For now, since all register scheduler have defined .exit_sched,
it will not cause any actual problem. But It will be more reasonable
to do this change.

Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe

Yufen Yu
2020-10-10 02:34:06 +0800

09 Oct, 2020

2 commits

583090b1b Merge tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A few fixes that should go into this release:

- NVMe controller error path reference fix (Chaitanya)

- Fix regression with IBM partitions on non-dasd devices (Christoph)

- Fix a missing clear in the compat CDROM packet structure (Peilin)"

* tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block:
partitions/ibm: fix non-DASD devices
nvme-core: put ctrl ref when module ref get fail
block/scsi-ioctl: Fix kernel-infoleak in scsi_put_cdrom_generic_arg()

Linus Torvalds
2020-10-09 09:48:34 +0800
f4ac712e4 block: ratelimit handle_bad_sector() message ... Browse Code »

syzbot is reporting unkillable task [1], for the caller is failing to
handle a corrupted filesystem image which attempts to access beyond
the end of the device. While we need to fix the caller, flooding the
console with handle_bad_sector() message is unlikely useful.

[1] https://syzkaller.appspot.com/bug?id=f1f49fb971d7a3e01bd8ab8cff2ff4572ccf3092

Signed-off-by: Tetsuo Handa
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Tetsuo Handa
2020-10-09 00:16:59 +0800

08 Oct, 2020

8 commits

1da30f952 blk-throttle: Re-use the throtl_set_slice_end() ... Browse Code »

Re-use throtl_set_slice_end() to remove duplicate code.

Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe

Baolin Wang
2020-10-08 22:01:38 +0800
29379674b blk-throttle: Open code __throtl_de/enqueue_tg() ... Browse Code »

The __throtl_de/enqueue_tg() functions are only be called by
throtl_de/enqueue_tg(), thus we can just open code them to
make code more readable.

Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe

Baolin Wang
2020-10-08 22:01:38 +0800
2397611ac blk-throttle: Move service tree validation out of the throtl_rb_first() ... Browse Code »

The throtl_schedule_next_dispatch() will validate if the service queue
is empty before calling update_min_dispatch_time(), and the
update_min_dispatch_time() will call throtl_rb_first(), which will
validate service queue again.

Thus we can move the service queue validation out of the
throtl_rb_first() to remove the redundant validation in the fast path.

Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe

Baolin Wang
2020-10-08 22:01:38 +0800
b7b609de5 blk-throttle: Move the list operation after list validation ... Browse Code »

We should move the list operation after validation.

Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe

Baolin Wang
2020-10-08 22:01:38 +0800
5b7048b89 blk-throttle: Fix IO hang for a corner case ... Browse Code »

It can not scale up in throtl_adjusted_limit() if we set bps or iops is
1, which will cause IO hang when enable low limit. Thus we should treat
1 as a illegal value to avoid this issue.

Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe

Baolin Wang
2020-10-08 22:01:38 +0800
b185efa78 blk-throttle: Avoid tracking latency if low limit is invalid ... Browse Code »

The IO latency tracking is only for LOW limit, so we should add a
validation to avoid redundant latency tracking if the LOW limit
is not valid.

Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe

Baolin Wang
2020-10-08 22:01:37 +0800
7901601ae blk-throttle: Avoid getting the current time if tg->last_finish_time is 0 ... Browse Code »

We only update the tg->last_finish_time when the low limitaion is
enabled, so we can move the tg->last_finish_time validation a little
forward to avoid getting the unnecessary current time stamp if the
the low limitation is not enabled.

Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe

Baolin Wang
2020-10-08 22:01:37 +0800
4247d9c8b blk-throttle: Remove a meaningless parameter for throtl_downgrade_state() ... Browse Code »

The throtl_downgrade_state() is always used to change to LIMIT_LOW
limitation, thus remove the latter meaningless parameter which
indicates the limitation index.

Signed-off-by: Baolin Wang
Signed-off-by: Jens Axboe

Baolin Wang
2020-10-08 22:01:37 +0800