13 Sep, 2021
1 commit
-
blk-mq can't run allocating driver tag and updating ->rqs[tag]
atomically, meantime blk-mq doesn't clear ->rqs[tag] after the driver
tag is released.So there is chance to iterating over one stale request just after the
tag is allocated and before updating ->rqs[tag].scsi_host_busy_iter() calls scsi_host_check_in_flight() to count scsi
in-flight requests after scsi host is blocked, so no new scsi command can
be marked as SCMD_STATE_INFLIGHT. However, driver tag allocation still can
be run by blk-mq core. One request is marked as SCMD_STATE_INFLIGHT,
but this request may have been kept in another slot of ->rqs[], meantime
the slot can be allocated out but ->rqs[] isn't updated yet. Then this
in-flight request is counted twice as SCMD_STATE_INFLIGHT. This way causes
trouble in handling scsi error.Fixes the issue by not iterating over stale request.
Cc: linux-scsi@vger.kernel.org
Cc: "Martin K. Petersen"
Reported-by: luojiaxing
Signed-off-by: Ming Lei
Link: https://lore.kernel.org/r/20210906065003.439019-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe
24 May, 2021
4 commits
-
The tags used for an IO scheduler are currently per hctx.
As such, when q->nr_hw_queues grows, so does the request queue total IO
scheduler tag depth.This may cause problems for SCSI MQ HBAs whose total driver depth is
fixed.Ming and Yanhui report higher CPU usage and lower throughput in scenarios
where the fixed total driver tag depth is appreciably lower than the total
scheduler tag depth:
https://lore.kernel.org/linux-block/440dfcfc-1a2c-bd98-1161-cec4d78c6dfc@huawei.com/T/#mc0d6d4f95275a2743d1c8c3e4dc9ff6c9aa3a76bIn that scenario, since the scheduler tag is got first, much contention
is introduced since a driver tag may not be available after we have got
the sched tag.Improve this scenario by introducing request queue-wide tags for when
a tagset-wide sbitmap is used. The static sched requests are still
allocated per hctx, as requests are initialised per hctx, as in
blk_mq_init_request(..., hctx_idx, ...) ->
set->ops->init_request(.., hctx_idx, ...).For simplicity of resizing the request queue sbitmap when updating the
request queue depth, just init at the max possible size, so we don't need
to deal with the possibly with swapping out a new sbitmap for old if
we need to grow.Signed-off-by: John Garry
Reviewed-by: Ming Lei
Link: https://lore.kernel.org/r/1620907258-30910-3-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe -
The tag allocation code to alloc the sbitmap pairs is common for regular
bitmaps tags and shared sbitmap, so refactor into a common function.Also remove superfluous "flags" argument from blk_mq_init_shared_sbitmap().
Signed-off-by: John Garry
Reviewed-by: Ming Lei
Link: https://lore.kernel.org/r/1620907258-30910-2-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe -
refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.Fix the issue by the following approach:
1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.Tested-by: John Garry
Reviewed-by: David Jeffery
Reviewed-by: Bart Van Assche
Signed-off-by: Ming Lei
Link: https://lore.kernel.org/r/20210511152236.763464-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe -
Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.Fix request use-after-free(UAF) related with completion race or queue
releasing:- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.Tested-by: John Garry
Reviewed-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Signed-off-by: Ming Lei
Link: https://lore.kernel.org/r/20210511152236.763464-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe
06 Apr, 2021
1 commit
-
Signed-off-by: Nikolay Borisov
Reviewed-by: Johannes Thumshirn
Reviewed-by: Hannes Reinecke
Reviewed-by: Himanshu Madhani
Link: https://lore.kernel.org/r/20210311081713.2763171-1-nborisov@suse.com
Signed-off-by: Jens Axboe
26 Mar, 2021
1 commit
-
Sentence reconstruction for better readability.
Signed-off-by: Bhaskar Chowdhury
Signed-off-by: Jens Axboe
29 Sep, 2020
1 commit
-
'f5bbbbe4d635 ("blk-mq: sync the update nr_hw_queues with
blk_mq_queue_tag_busy_iter")' introduce a bug what we may sleep between
rcu lock. Then '530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter
callbacks")' fix it by get request_queue's ref. And 'a9a808084d6a ("block:
Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues()")'
remove the synchronize_rcu in __blk_mq_update_nr_hw_queues. We need
update the confused comments in blk_mq_queue_tag_busy_iter.Signed-off-by: yangerkun
Signed-off-by: Jens Axboe
11 Sep, 2020
1 commit
-
NVMe shares tagset between fabric queue and admin queue or between
connect_q and NS queue, so hctx_may_queue() can be called to allocate
request for these queues.Tags can be reserved in these tagset. Before error recovery, there is
often lots of in-flight requests which can't be completed, and new
reserved request may be needed in error recovery path. However,
hctx_may_queue() can always return false because there is too many
in-flight requests which can't be completed during error handling.
Finally, nothing can proceed.Fix this issue by always allowing reserved tag allocation in
hctx_may_queue(). This is reasonable because reserved tags are supposed
to always be available.Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Cc: David Milburn
Cc: Ewan D. Milne
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
04 Sep, 2020
5 commits
-
For when using a shared sbitmap, no longer should the number of active
request queues per hctx be relied on for when judging how to share the tag
bitmap.Instead maintain the number of active request queues per tag_set, and make
the judgement based on that.Originally-from: Kashyap Desai
Signed-off-by: John Garry
Tested-by: Don Brace #SCSI resv cmds patches used
Tested-by: Douglas Gilbert
Signed-off-by: Jens Axboe -
Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown. That problem is solved in commit bf0beec0607d
("blk-mq: drain I/O when all CPUs in a hctx are offline").However, to take advantage of that blk-mq feature, the HBA HW queuess are
required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW
queues need to be exposed to the upper layer.In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.However another problem is that blk-mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e092ef ("scsi:
core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy
counter was removed, which would stop the LLDD being sent more than
.can_queue commands; however, it should still be ensured that the block
layer does not issue more than .can_queue commands to the Scsi host.To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests
are still allocated per hctx; the reason for this is that if tags and
requests were only allocated for a single hctx - like hctx0 - it may break
block drivers which expect a request be associated with a specific hctx,
i.e. not always hctx0. This will introduce extra memory usage.This change is based on work originally from Ming Lei in [1] and from
Bart's suggestion in [2].[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144beSigned-off-by: John Garry
Tested-by: Don Brace #SCSI resv cmds patches used
Tested-by: Douglas Gilbert
Signed-off-by: Jens Axboe -
Introduce pointers for the blk_mq_tags regular and reserved bitmap tags,
with the goal of later being able to use a common shared tag bitmap across
all HW contexts in a set.Signed-off-by: John Garry
Tested-by: Don Brace #SCSI resv cmds patches used
Tested-by: Douglas Gilbert
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
Pass hctx/tagset flags argument down to blk_mq_init_tags() and
blk_mq_free_tags() for selective init/free.For now, make it include the alloc policy flag, which can be evaluated
when needed (in blk_mq_init_tags()).Signed-off-by: John Garry
Tested-by: Douglas Gilbert
Signed-off-by: Jens Axboe -
Since the tags are allocated in blk_mq_init_tags(), it's better practice
to free in that same function upon error, rather than a callee which is to
init the bitmap tags (blk_mq_init_tags()).[jpg: Split from an earlier patch with a new commit message]
Signed-off-by: Hannes Reinecke
Signed-off-by: John Garry
Tested-by: Douglas Gilbert
Signed-off-by: Jens Axboe
01 Jul, 2020
1 commit
-
blk_mq_get_driver_tag() is only used by blk-mq.c and is supposed to
stay in blk-mq.c, so move it and preparing for cleanup code of
get/put driver tag.Meantime hctx_may_queue() is moved to header file and it is fine
since it is defined as inline always.No functional change.
Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe
29 Jun, 2020
1 commit
-
Just check for a non-NULL elevator directly to make the code more clear.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
15 Jun, 2020
1 commit
-
The blk_mq_all_tag_iter() is a void function, thus remove
the redundant 'return' statement in this function.Signed-off-by: Baolin Wang
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe
07 Jun, 2020
2 commits
-
blk_mq_all_tag_iter() is added to iterate all requests, so we should
fetch the request from ->static_rqs][] instead of ->rqs[] which is for
holding in-flight request only.Fix it by adding flag of BT_TAG_ITER_STATIC_RQS.
Fixes: bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are offline")
Signed-off-by: Ming Lei
Tested-by: John Garry
Cc: Dongli Zhang
Cc: Hannes Reinecke
Cc: Daniel Wagner
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe -
Allocation of the driver tag in the case of using a scheduler shares very
little code with the "normal" tag allocation. Split out a new helper to
streamline this path, and untangle it from the complex normal tag
allocation.This way also avoids to fail driver tag allocation because of inactive hctx
during cpu hotplug, and fixes potential hang risk.Fixes: bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are offline")
Signed-off-by: Ming Lei
Signed-off-by: Christoph Hellwig
Tested-by: John Garry
Cc: Dongli Zhang
Cc: Hannes Reinecke
Cc: Daniel Wagner
Signed-off-by: Jens Axboe
30 May, 2020
4 commits
-
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
up queue mapping. Thomas mentioned the following point[1]:"That was the constraint of managed interrupts from the very beginning:
The driver/subsystem has to quiesce the interrupt line and the associated
queue _before_ it gets shutdown in CPU unplug and not fiddle with it
until it's restarted by the core when the CPU is plugged in again."However, current blk-mq implementation doesn't quiesce hw queue before
the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a
cpuhp state handled after the CPU is down, so there isn't any chance to
quiesce the hctx before shutting down the CPU.Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs
where the last CPU goes away, and wait for completion of in-flight
requests. This guarantees that there is no inflight I/O before shutting
down the managed IRQ.Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need
to wait for completion of in-flight requests from these drivers to avoid
a potential dead-lock. It is safe to do this for stacking drivers as those
do not use interrupts at all and their I/O completions are triggered by
underlying devices I/O completion.[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[hch: different retry mechanism, merged two patches, minor cleanups]
Signed-off-by: Ming Lei
Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Daniel Wagner
Signed-off-by: Jens Axboe -
Add a new blk_mq_all_tag_iter function to iterate over all allocated
scheduler tags and driver tags. This is more flexible than the existing
blk_mq_all_tag_busy_iter function as it allows the callers to do whatever
they want on allocated request instead of being limited to started
requests.It will be used to implement draining allocated requests on specified
hctx in this patchset.[hch: switch from the two booleans to a more readable flags field and
consolidate the tags iter functions]Signed-off-by: Ming Lei
Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Daniel Wagner
Reviewed-by: Bart van Assche
Signed-off-by: Jens Axboe -
Replace various magic -1 constants for tags with BLK_MQ_NO_TAG.
Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Reviewed-by: Bart Van Assche
Reviewed-by: Daniel Wagner
Signed-off-by: Jens Axboe -
To prepare for wider use of this constant give it a more applicable name.
Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Reviewed-by: Bart Van Assche
Reviewed-by: Daniel Wagner
Signed-off-by: Jens Axboe
27 Feb, 2020
1 commit
-
The struct blk_mq_hw_ctx pointer argument in blk_mq_put_tag(),
blk_mq_poll_nsecs(), and blk_mq_poll_hybrid_sleep() is unused, so remove
it.Overall obj code size shows a minor reduction, before:
text data bss dec hex filename
27306 1312 0 28618 6fca block/blk-mq.o
4303 272 0 4575 11df block/blk-mq-tag.oafter:
27282 1312 0 28594 6fb2 block/blk-mq.o
4311 272 0 4583 11e7 block/blk-mq-tag.oReviewed-by: Johannes Thumshirn
Reviewed-by: Hannes Reinecke
Signed-off-by: John Garry
--
This minor patch had been carried as part of the blk-mq shared tags RFC,
I'd rather not carry it anymore as it required rebasing, so now or never..
Signed-off-by: Jens Axboe
14 Nov, 2019
1 commit
-
These functions are not referenced, so delete them.
Signed-off-by: John Garry
Signed-off-by: Jens Axboe
05 Aug, 2019
1 commit
-
blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but doesn't provide any way to synchronize the request's complete
fn. The current queue freeze interface can't provide the synchonization
because aborted requests stay at blk-mq queues during EH.In some driver's EH(such as NVMe), hardware queue's resource may be freed &
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.Prepare for fixing this kind of issue by introducing
blk_mq_tagset_wait_completed_request().Cc: Max Gurtovoy
Cc: Sagi Grimberg
Cc: Keith Busch
Cc: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
03 Jul, 2019
1 commit
-
No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends
on preemption being disabled for its correctness. Since removing the CPU
preemption calls does not measurably affect performance, simplify the
blk-mq code by removing the blk_mq_put_ctx() function and also by not
disabling preemption in blk_mq_get_ctx().Cc: Hannes Reinecke
Cc: Omar Sandoval
Reviewed-by: Christoph Hellwig
Reviewed-by: Ming Lei
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe
01 May, 2019
1 commit
-
Various block layer files do not have any licensing information at all.
Add SPDX tags for the default kernel GPLv2 license to those.Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
01 Feb, 2019
1 commit
-
Currently, the queue mapping result is saved in a two-dimensional
array. In the hot path, to get a hctx, we need do following:q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]]
This isn't very efficient. We could save the queue mapping result into
ctx directly with different hctx type, like,ctx->hctxs[type]
Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe
01 Dec, 2018
1 commit
-
Even if we have no waiters on any of the sbitmap_queue wait states, we
still have to loop every entry to check. We do this for every IO, so
the cost adds up.Shift a bit of the cost to the slow path, when we actually have waiters.
Wrap prepare_to_wait_exclusive() and finish_wait(), so we can maintain
an internal count of how many are currently active. Then we can simply
check this count in sbq_wake_ptr() and not have to loop if we don't
have any sleepers.Convert the two users of sbitmap with waiting, blk-mq-tag and iSCSI.
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe
09 Nov, 2018
2 commits
-
Document the fact that the strategy function passed in can
control whether to continue iterating or not.Suggested-by: Bart Van Assche
Signed-off-by: Jens Axboe -
We have this functionality in sbitmap, but we don't export it in
blk-mq for users of the tags busy iteration. This can be useful
for stopping the iteration, if the caller doesn't need to find
more requests.Reviewed-by: Mike Snitzer
Signed-off-by: Jens Axboe
08 Nov, 2018
3 commits
-
We call blk_mq_map_queue() a lot, at least two times for each
request per IO, sometimes more. Since we now have an indirect
call as well in that function. cache the mapping so we don't
have to re-call blk_mq_map_queue() for the same request
multiple times.Reviewed-by: Keith Busch
Reviewed-by: Sagi Grimberg
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
Prep patch for being able to place request based not just on
CPU location, but also on the type of request.Reviewed-by: Hannes Reinecke
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe -
It's now unused, kill it.
Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe
01 Oct, 2018
1 commit
-
Merge -rc6 in, for two reasons:
1) Resolve a trivial conflict in the blk-mq-tag.c documentation
2) A few important regression fixes went into upstream directly, so
they aren't in the 4.20 branch.Signed-off-by: Jens Axboe
* tag 'v4.19-rc6': (780 commits)
Linux 4.19-rc6
MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c
cpufreq: qcom-kryo: Fix section annotations
perf/core: Add sanity check to deal with pinned event failure
xen/blkfront: correct purging of persistent grants
Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
selftests/powerpc: Fix Makefiles for headers_install change
blk-mq: I/O and timer unplugs are inverted in blktrace
dax: Fix deadlock in dax_lock_mapping_entry()
x86/boot: Fix kexec booting failure in the SEV bit detection code
bcache: add separate workqueue for journal_write to avoid deadlock
drm/amd/display: Fix Edid emulation for linux
drm/amd/display: Fix Vega10 lightup on S3 resume
drm/amdgpu: Fix vce work queue was not cancelled when suspend
Revert "drm/panel: Add device_link from panel device to DRM device"
xen/blkfront: When purging persistent grants, keep them in the buffer
clocksource/drivers/timer-atmel-pit: Properly handle error cases
block: fix deadline elevator drain for zoned block devices
ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
...Signed-off-by: Jens Axboe
26 Sep, 2018
1 commit
-
A recent commit runs tag iterator callbacks under the rcu read lock,
but existing callbacks do not satisfy the non-blocking requirement.
The commit intended to prevent an iterator from accessing a queue that's
being modified. This patch fixes the original issue by taking a queue
reference instead of reading it, which allows callbacks to make blocking
calls.Fixes: f5bbbbe4d6357 ("blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter")
Acked-by: Jianchao Wang
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe
22 Sep, 2018
1 commit
-
Make it easier to understand the purpose of the functions that iterate
over requests by documenting their purpose. Fix several minor spelling
and grammer mistakes in comments in these functions.Signed-off-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Jianchao Wang
Cc: Hannes Reinecke
Signed-off-by: Jens Axboe
23 Aug, 2018
1 commit
-
Pull more block updates from Jens Axboe:
- Set of bcache fixes and changes (Coly)
- The flush warn fix (me)
- Small series of BFQ fixes (Paolo)
- wbt hang fix (Ming)
- blktrace fix (Steven)
- blk-mq hardware queue count update fix (Jianchao)
- Various little fixes
* tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block: (31 commits)
block/DAC960.c: make some arrays static const, shrinks object size
blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter
blk-mq: init hctx sched after update ctx and hctx mapping
block: remove duplicate initialization
tracing/blktrace: Fix to allow setting same value
pktcdvd: fix setting of 'ret' error return for a few cases
block: change return type to bool
block, bfq: return nbytes and not zero from struct cftype .write() method
block, bfq: improve code of bfq_bfqq_charge_time
block, bfq: reduce write overcharge
block, bfq: always update the budget of an entity when needed
block, bfq: readd missing reset of parent-entity service
blk-wbt: fix IO hang in wbt_wait()
block: don't warn for flush on read-only device
bcache: add the missing comments for smp_mb()/smp_wmb()
bcache: remove unnecessary space before ioctl function pointer arguments
bcache: add missing SPDX header
bcache: move open brace at end of function definitions to next line
bcache: add static const prefix to char * array declarations
bcache: fix code comments style
...
21 Aug, 2018
1 commit
-
For blk-mq, part_in_flight/rw will invoke blk_mq_in_flight/rw to
account the inflight requests. It will access the queue_hw_ctx and
nr_hw_queues w/o any protection. When updating nr_hw_queues and
blk_mq_in_flight/rw occur concurrently, panic comes up.Before update nr_hw_queues, the q will be frozen. So we could use
q_usage_counter to avoid the race. percpu_ref_is_zero is used here
so that we will not miss any in-flight request. The access to
nr_hw_queues and queue_hw_ctx in blk_mq_queue_tag_busy_iter are
under rcu critical section, __blk_mq_update_nr_hw_queues could use
synchronize_rcu to ensure the zeroed q_usage_counter to be globally
visible.Signed-off-by: Jianchao Wang
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe