Eric Lee / smarc-fsl-linux-kernel

29 Sep, 2020

1 commit

76cffccd6 block-mq: fix comments in blk_mq_queue_tag_busy_iter ... Browse Code »

'f5bbbbe4d635 ("blk-mq: sync the update nr_hw_queues with
blk_mq_queue_tag_busy_iter")' introduce a bug what we may sleep between
rcu lock. Then '530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter
callbacks")' fix it by get request_queue's ref. And 'a9a808084d6a ("block:
Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues()")'
remove the synchronize_rcu in __blk_mq_update_nr_hw_queues. We need
update the confused comments in blk_mq_queue_tag_busy_iter.

Signed-off-by: yangerkun
Signed-off-by: Jens Axboe

yangerkun
2020-09-29 22:11:00 +0800

11 Sep, 2020

1 commit

285008501 blk-mq: always allow reserved allocation in hctx_may_queue ... Browse Code »

NVMe shares tagset between fabric queue and admin queue or between
connect_q and NS queue, so hctx_may_queue() can be called to allocate
request for these queues.

Tags can be reserved in these tagset. Before error recovery, there is
often lots of in-flight requests which can't be completed, and new
reserved request may be needed in error recovery path. However,
hctx_may_queue() can always return false because there is too many
in-flight requests which can't be completed during error handling.
Finally, nothing can proceed.

Fix this issue by always allowing reserved tag allocation in
hctx_may_queue(). This is reasonable because reserved tags are supposed
to always be available.

Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Cc: David Milburn
Cc: Ewan D. Milne
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2020-09-11 19:26:19 +0800

04 Sep, 2020

5 commits

f1b49fdc1 blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap ... Browse Code »

For when using a shared sbitmap, no longer should the number of active
request queues per hctx be relied on for when judging how to share the tag
bitmap.

Instead maintain the number of active request queues per tag_set, and make
the judgement based on that.

Originally-from: Kashyap Desai
Signed-off-by: John Garry
Tested-by: Don Brace #SCSI resv cmds patches used
Tested-by: Douglas Gilbert
Signed-off-by: Jens Axboe

John Garry
2020-09-04 05:20:47 +0800
32bc15afe blk-mq: Facilitate a shared sbitmap per tagset ... Browse Code »

Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.

In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown. That problem is solved in commit bf0beec0607d
("blk-mq: drain I/O when all CPUs in a hctx are offline").

However, to take advantage of that blk-mq feature, the HBA HW queuess are
required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW
queues need to be exposed to the upper layer.

In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.

However another problem is that blk-mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e092ef ("scsi:
core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy
counter was removed, which would stop the LLDD being sent more than
.can_queue commands; however, it should still be ensured that the block
layer does not issue more than .can_queue commands to the Scsi host.

To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.

New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.

Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests
are still allocated per hctx; the reason for this is that if tags and
requests were only allocated for a single hctx - like hctx0 - it may break
block drivers which expect a request be associated with a specific hctx,
i.e. not always hctx0. This will introduce extra memory usage.

This change is based on work originally from Ming Lei in [1] and from
Bart's suggestion in [2].

[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be

Signed-off-by: John Garry
Tested-by: Don Brace #SCSI resv cmds patches used
Tested-by: Douglas Gilbert
Signed-off-by: Jens Axboe

John Garry
2020-09-04 05:20:47 +0800
222a5ae03 blk-mq: Use pointers for blk_mq_tags bitmap tags ... Browse Code »

Introduce pointers for the blk_mq_tags regular and reserved bitmap tags,
with the goal of later being able to use a common shared tag bitmap across
all HW contexts in a set.

Signed-off-by: John Garry
Tested-by: Don Brace #SCSI resv cmds patches used
Tested-by: Douglas Gilbert
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

John Garry
2020-09-04 05:20:47 +0800
1c0706a70 blk-mq: Pass flags for tag init/free ... Browse Code »

Pass hctx/tagset flags argument down to blk_mq_init_tags() and
blk_mq_free_tags() for selective init/free.

For now, make it include the alloc policy flag, which can be evaluated
when needed (in blk_mq_init_tags()).

Signed-off-by: John Garry
Tested-by: Douglas Gilbert
Signed-off-by: Jens Axboe

John Garry
2020-09-04 05:20:46 +0800
4d063237b blk-mq: Free tags in blk_mq_init_tags() upon error ... Browse Code »

Since the tags are allocated in blk_mq_init_tags(), it's better practice
to free in that same function upon error, rather than a callee which is to
init the bitmap tags (blk_mq_init_tags()).

[jpg: Split from an earlier patch with a new commit message]

Signed-off-by: Hannes Reinecke
Signed-off-by: John Garry
Tested-by: Douglas Gilbert
Signed-off-by: Jens Axboe

Hannes Reinecke
2020-09-04 05:20:46 +0800

01 Jul, 2020

1 commit

570e9b73b blk-mq: move blk_mq_get_driver_tag into blk-mq.c ... Browse Code »

blk_mq_get_driver_tag() is only used by blk-mq.c and is supposed to
stay in blk-mq.c, so move it and preparing for cleanup code of
get/put driver tag.

Meantime hctx_may_queue() is moved to header file and it is fine
since it is defined as inline always.

No functional change.

Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe

Ming Lei
2020-07-01 02:57:59 +0800

29 Jun, 2020

1 commit

42fdc5e49 blk-mq: remove the BLK_MQ_REQ_INTERNAL flag ... Browse Code »

Just check for a non-NULL elevator directly to make the code more clear.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-06-29 23:56:18 +0800

15 Jun, 2020

1 commit

a8a5e383c blk-mq: Remove redundant 'return' statement ... Browse Code »

The blk_mq_all_tag_iter() is a void function, thus remove
the redundant 'return' statement in this function.

Signed-off-by: Baolin Wang
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Baolin Wang
2020-06-15 22:34:43 +0800

07 Jun, 2020

2 commits

22f614bc0 blk-mq: fix blk_mq_all_tag_iter ... Browse Code »

blk_mq_all_tag_iter() is added to iterate all requests, so we should
fetch the request from ->static_rqs][] instead of ->rqs[] which is for
holding in-flight request only.

Fix it by adding flag of BT_TAG_ITER_STATIC_RQS.

Fixes: bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are offline")
Signed-off-by: Ming Lei
Tested-by: John Garry
Cc: Dongli Zhang
Cc: Hannes Reinecke
Cc: Daniel Wagner
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe

Ming Lei
2020-06-07 22:56:50 +0800
d94ecfc39 blk-mq: split out a __blk_mq_get_driver_tag helper ... Browse Code »

Allocation of the driver tag in the case of using a scheduler shares very
little code with the "normal" tag allocation. Split out a new helper to
streamline this path, and untangle it from the complex normal tag
allocation.

This way also avoids to fail driver tag allocation because of inactive hctx
during cpu hotplug, and fixes potential hang risk.

Fixes: bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are offline")
Signed-off-by: Ming Lei
Signed-off-by: Christoph Hellwig
Tested-by: John Garry
Cc: Dongli Zhang
Cc: Hannes Reinecke
Cc: Daniel Wagner
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-06-07 22:56:50 +0800

30 May, 2020

4 commits

bf0beec06 blk-mq: drain I/O when all CPUs in a hctx are offline ... Browse Code »

Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
up queue mapping. Thomas mentioned the following point[1]:

"That was the constraint of managed interrupts from the very beginning:

The driver/subsystem has to quiesce the interrupt line and the associated
queue _before_ it gets shutdown in CPU unplug and not fiddle with it
until it's restarted by the core when the CPU is plugged in again."

However, current blk-mq implementation doesn't quiesce hw queue before
the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a
cpuhp state handled after the CPU is down, so there isn't any chance to
quiesce the hctx before shutting down the CPU.

Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs
where the last CPU goes away, and wait for completion of in-flight
requests. This guarantees that there is no inflight I/O before shutting
down the managed IRQ.

Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need
to wait for completion of in-flight requests from these drivers to avoid
a potential dead-lock. It is safe to do this for stacking drivers as those
do not use interrupts at all and their I/O completions are triggered by
underlying devices I/O completion.

[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/

[hch: different retry mechanism, merged two patches, minor cleanups]

Signed-off-by: Ming Lei
Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Daniel Wagner
Signed-off-by: Jens Axboe

Ming Lei
2020-05-30 00:23:25 +0800
602380d28 blk-mq: add blk_mq_all_tag_iter ... Browse Code »

Add a new blk_mq_all_tag_iter function to iterate over all allocated
scheduler tags and driver tags. This is more flexible than the existing
blk_mq_all_tag_busy_iter function as it allows the callers to do whatever
they want on allocated request instead of being limited to started
requests.

It will be used to implement draining allocated requests on specified
hctx in this patchset.

[hch: switch from the two booleans to a more readable flags field and
consolidate the tags iter functions]

Signed-off-by: Ming Lei
Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Daniel Wagner
Reviewed-by: Bart van Assche
Signed-off-by: Jens Axboe

Ming Lei
2020-05-30 00:23:25 +0800
766473681 blk-mq: use BLK_MQ_NO_TAG in more places ... Browse Code »

Replace various magic -1 constants for tags with BLK_MQ_NO_TAG.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Reviewed-by: Bart Van Assche
Reviewed-by: Daniel Wagner
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-30 00:23:25 +0800
419c3d5e8 blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG ... Browse Code »

To prepare for wider use of this constant give it a more applicable name.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Reviewed-by: Bart Van Assche
Reviewed-by: Daniel Wagner
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-30 00:23:25 +0800

27 Feb, 2020

1 commit

cae740a04 blk-mq: Remove some unused function arguments ... Browse Code »

The struct blk_mq_hw_ctx pointer argument in blk_mq_put_tag(),
blk_mq_poll_nsecs(), and blk_mq_poll_hybrid_sleep() is unused, so remove
it.

Overall obj code size shows a minor reduction, before:
text data bss dec hex filename
27306 1312 0 28618 6fca block/blk-mq.o
4303 272 0 4575 11df block/blk-mq-tag.o

after:
27282 1312 0 28594 6fb2 block/blk-mq.o
4311 272 0 4583 11e7 block/blk-mq-tag.o

Reviewed-by: Johannes Thumshirn
Reviewed-by: Hannes Reinecke
Signed-off-by: John Garry
--
This minor patch had been carried as part of the blk-mq shared tags RFC,
I'd rather not carry it anymore as it required rebasing, so now or never..
Signed-off-by: Jens Axboe

John Garry
2020-02-27 01:34:41 +0800

14 Nov, 2019

1 commit

cb711b91a blk-mq: Delete blk_mq_has_free_tags() and blk_mq_can_queue() ... Browse Code »

These functions are not referenced, so delete them.

Signed-off-by: John Garry
Signed-off-by: Jens Axboe

John Garry
2019-11-14 03:50:38 +0800

05 Aug, 2019

1 commit

f9934a80f blk-mq: introduce blk_mq_tagset_wait_completed_request() ... Browse Code »

blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but doesn't provide any way to synchronize the request's complete
fn. The current queue freeze interface can't provide the synchonization
because aborted requests stay at blk-mq queues during EH.

In some driver's EH(such as NVMe), hardware queue's resource may be freed &
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.

Prepare for fixing this kind of issue by introducing
blk_mq_tagset_wait_completed_request().

Cc: Max Gurtovoy
Cc: Sagi Grimberg
Cc: Keith Busch
Cc: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-08-05 11:41:29 +0800

03 Jul, 2019

1 commit

c05f42206 blk-mq: remove blk_mq_put_ctx() ... Browse Code »

No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends
on preemption being disabled for its correctness. Since removing the CPU
preemption calls does not measurably affect performance, simplify the
blk-mq code by removing the blk_mq_put_ctx() function and also by not
disabling preemption in blk_mq_get_ctx().

Cc: Hannes Reinecke
Cc: Omar Sandoval
Reviewed-by: Christoph Hellwig
Reviewed-by: Ming Lei
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2019-07-03 11:03:27 +0800

01 May, 2019

1 commit

3dcf60bcb block: add SPDX tags to block layer files missing licensing information ... Browse Code »

Various block layer files do not have any licensing information at all.
Add SPDX tags for the default kernel GPLv2 license to those.

Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-05-01 06:12:03 +0800

01 Feb, 2019

1 commit

8ccdf4a37 blk-mq: save queue mapping result into ctx directly ... Browse Code »

Currently, the queue mapping result is saved in a two-dimensional
array. In the hot path, to get a hctx, we need do following:

q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]]

This isn't very efficient. We could save the queue mapping result into
ctx directly with different hctx type, like,

ctx->hctxs[type]

Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe

Jianchao Wang
2019-02-01 23:33:04 +0800

01 Dec, 2018

1 commit

5d2ee7122 sbitmap: optimize wakeup check ... Browse Code »

Even if we have no waiters on any of the sbitmap_queue wait states, we
still have to loop every entry to check. We do this for every IO, so
the cost adds up.

Shift a bit of the cost to the slow path, when we actually have waiters.
Wrap prepare_to_wait_exclusive() and finish_wait(), so we can maintain
an internal count of how many are currently active. Then we can simply
check this count in sbq_wake_ptr() and not have to loop if we don't
have any sleepers.

Convert the two users of sbitmap with waiting, blk-mq-tag and iSCSI.

Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-12-01 05:48:04 +0800

09 Nov, 2018

2 commits

ab11fe5af blk-mq-tag: document tag iteration helper return value ... Browse Code »

Document the fact that the strategy function passed in can
control whether to continue iterating or not.

Suggested-by: Bart Van Assche
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-09 02:09:50 +0800
7baa85727 blk-mq-tag: change busy_iter_fn to return whether to continue or not ... Browse Code »

We have this functionality in sbitmap, but we don't export it in
blk-mq for users of the tags busy iteration. This can be useful
for stopping the iteration, if the caller doesn't need to find
more requests.

Reviewed-by: Mike Snitzer
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-09 01:24:07 +0800

08 Nov, 2018

3 commits

ea4f995ee blk-mq: cache request hardware queue mapping ... Browse Code »

We call blk_mq_map_queue() a lot, at least two times for each
request per IO, sometimes more. Since we now have an indirect
call as well in that function. cache the mapping so we don't
have to re-call blk_mq_map_queue() for the same request
multiple times.

Reviewed-by: Keith Busch
Reviewed-by: Sagi Grimberg
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:44:59 +0800
f9afca4d3 blk-mq: pass in request/bio flags to queue mapping ... Browse Code »

Prep patch for being able to place request based not just on
CPU location, but also on the type of request.

Reviewed-by: Hannes Reinecke
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:44:59 +0800
7ca019264 block: remove legacy rq tagging ... Browse Code »

It's now unused, kill it.

Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:32 +0800

01 Oct, 2018

1 commit

c0aac682f Merge tag 'v4.19-rc6' into for-4.20/block ... Browse Code »

Merge -rc6 in, for two reasons:

1) Resolve a trivial conflict in the blk-mq-tag.c documentation
2) A few important regression fixes went into upstream directly, so
they aren't in the 4.20 branch.

Signed-off-by: Jens Axboe

* tag 'v4.19-rc6': (780 commits)
Linux 4.19-rc6
MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c
cpufreq: qcom-kryo: Fix section annotations
perf/core: Add sanity check to deal with pinned event failure
xen/blkfront: correct purging of persistent grants
Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
selftests/powerpc: Fix Makefiles for headers_install change
blk-mq: I/O and timer unplugs are inverted in blktrace
dax: Fix deadlock in dax_lock_mapping_entry()
x86/boot: Fix kexec booting failure in the SEV bit detection code
bcache: add separate workqueue for journal_write to avoid deadlock
drm/amd/display: Fix Edid emulation for linux
drm/amd/display: Fix Vega10 lightup on S3 resume
drm/amdgpu: Fix vce work queue was not cancelled when suspend
Revert "drm/panel: Add device_link from panel device to DRM device"
xen/blkfront: When purging persistent grants, keep them in the buffer
clocksource/drivers/timer-atmel-pit: Properly handle error cases
block: fix deadline elevator drain for zoned block devices
ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
...

Signed-off-by: Jens Axboe

Jens Axboe
2018-10-01 22:58:57 +0800

26 Sep, 2018

1 commit

530ca2c9b blk-mq: Allow blocking queue tag iter callbacks ... Browse Code »

A recent commit runs tag iterator callbacks under the rcu read lock,
but existing callbacks do not satisfy the non-blocking requirement.
The commit intended to prevent an iterator from accessing a queue that's
being modified. This patch fixes the original issue by taking a queue
reference instead of reading it, which allows callbacks to make blocking
calls.

Fixes: f5bbbbe4d6357 ("blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter")
Acked-by: Jianchao Wang
Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe

Keith Busch
2018-09-26 10:17:59 +0800

22 Sep, 2018

1 commit

c7b1bf5cc blk-mq: Document the functions that iterate over requests ... Browse Code »

Make it easier to understand the purpose of the functions that iterate
over requests by documenting their purpose. Fix several minor spelling
and grammer mistakes in comments in these functions.

Signed-off-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Jianchao Wang
Cc: Hannes Reinecke
Signed-off-by: Jens Axboe

Bart Van Assche
2018-09-22 10:30:22 +0800

23 Aug, 2018

1 commit

5bed49adf Merge tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block ... Browse Code »

Pull more block updates from Jens Axboe:

- Set of bcache fixes and changes (Coly)

- The flush warn fix (me)

- Small series of BFQ fixes (Paolo)

- wbt hang fix (Ming)

- blktrace fix (Steven)

- blk-mq hardware queue count update fix (Jianchao)

- Various little fixes

* tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block: (31 commits)
block/DAC960.c: make some arrays static const, shrinks object size
blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter
blk-mq: init hctx sched after update ctx and hctx mapping
block: remove duplicate initialization
tracing/blktrace: Fix to allow setting same value
pktcdvd: fix setting of 'ret' error return for a few cases
block: change return type to bool
block, bfq: return nbytes and not zero from struct cftype .write() method
block, bfq: improve code of bfq_bfqq_charge_time
block, bfq: reduce write overcharge
block, bfq: always update the budget of an entity when needed
block, bfq: readd missing reset of parent-entity service
blk-wbt: fix IO hang in wbt_wait()
block: don't warn for flush on read-only device
bcache: add the missing comments for smp_mb()/smp_wmb()
bcache: remove unnecessary space before ioctl function pointer arguments
bcache: add missing SPDX header
bcache: move open brace at end of function definitions to next line
bcache: add static const prefix to char * array declarations
bcache: fix code comments style
...

Linus Torvalds
2018-08-23 04:38:05 +0800

21 Aug, 2018

1 commit

f5bbbbe4d blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter ... Browse Code »

For blk-mq, part_in_flight/rw will invoke blk_mq_in_flight/rw to
account the inflight requests. It will access the queue_hw_ctx and
nr_hw_queues w/o any protection. When updating nr_hw_queues and
blk_mq_in_flight/rw occur concurrently, panic comes up.

Before update nr_hw_queues, the q will be frozen. So we could use
q_usage_counter to avoid the race. percpu_ref_is_zero is used here
so that we will not miss any in-flight request. The access to
nr_hw_queues and queue_hw_ctx in blk_mq_queue_tag_busy_iter are
under rcu critical section, __blk_mq_update_nr_hw_queues could use
synchronize_rcu to ensure the zeroed q_usage_counter to be globally
visible.

Signed-off-by: Jianchao Wang
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Jianchao Wang
2018-08-21 23:02:56 +0800

15 Aug, 2018

1 commit

73ba2fb33 Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"First pull request for this merge window, there will also be a
followup request with some stragglers.

This pull request contains:

- Fix for a thundering heard issue in the wbt block code (Anchal
Agarwal)

- A few NVMe pull requests:
* Improved tracepoints (Keith)
* Larger inline data support for RDMA (Steve Wise)
* RDMA setup/teardown fixes (Sagi)
* Effects log suppor for NVMe target (Chaitanya Kulkarni)
* Buffered IO suppor for NVMe target (Chaitanya Kulkarni)
* TP4004 (ANA) support (Christoph)
* Various NVMe fixes

- Block io-latency controller support. Much needed support for
properly containing block devices. (Josef)

- Series improving how we handle sense information on the stack
(Kees)

- Lightnvm fixes and updates/improvements (Mathias/Javier et al)

- Zoned device support for null_blk (Matias)

- AIX partition fixes (Mauricio Faria de Oliveira)

- DIF checksum code made generic (Max Gurtovoy)

- Add support for discard in iostats (Michael Callahan / Tejun)

- Set of updates for BFQ (Paolo)

- Removal of async write support for bsg (Christoph)

- Bio page dirtying and clone fixups (Christoph)

- Set of bcache fix/changes (via Coly)

- Series improving blk-mq queue setup/teardown speed (Ming)

- Series improving merging performance on blk-mq (Ming)

- Lots of other fixes and cleanups from a slew of folks"

* tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits)
blkcg: Make blkg_root_lookup() work for queues in bypass mode
bcache: fix error setting writeback_rate through sysfs interface
null_blk: add lock drop/acquire annotation
Blk-throttle: reduce tail io latency when iops limit is enforced
block: paride: pd: mark expected switch fall-throughs
block: Ensure that a request queue is dissociated from the cgroup controller
block: Introduce blk_exit_queue()
blkcg: Introduce blkg_root_lookup()
block: Remove two superfluous #include directives
blk-mq: count the hctx as active before allocating tag
block: bvec_nr_vecs() returns value for wrong slab
bcache: trivial - remove tailing backslash in macro BTREE_FLAG
bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section
bcache: set max writeback rate when I/O request is idle
bcache: add code comments for bset.c
bcache: fix mistaken comments in request.c
bcache: fix mistaken code comments in bcache.h
bcache: add a comment in super.c
bcache: avoid unncessary cache prefetch bch_btree_node_get()
bcache: display rate debug parameters to 0 when writeback is not running
...

Linus Torvalds
2018-08-15 01:23:25 +0800

09 Aug, 2018

1 commit

d263ed992 blk-mq: count the hctx as active before allocating tag ... Browse Code »

Currently, we count the hctx as active after allocate driver tag
successfully. If a previously inactive hctx try to get tag first
time, it may fails and need to wait. However, due to the stale tag
->active_queues, the other shared-tags users are still able to
occupy all driver tags while there is someone waiting for tag.
Consequently, even if the previously inactive hctx is waked up, it
still may not be able to get a tag and could be starved.

To fix it, we count the hctx as active before try to allocate driver
tag, then when it is waiting the tag, the other shared-tag users
will reserve budget for it.

Reviewed-by: Ming Lei
Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe

Jianchao Wang
2018-08-09 22:34:17 +0800

03 Aug, 2018

2 commits

2d5ba0e2d blk-mq: fix blk_mq_tagset_busy_iter ... Browse Code »

Commit d250bf4e776ff09d5("blk-mq: only iterate over inflight requests
in blk_mq_tagset_busy_iter") uses 'blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT'
to replace 'blk_mq_request_started(req)', this way is wrong, and causes
lots of test system hang during booting.

Fix the issue by using blk_mq_request_started(req) inside bt_tags_iter().

Fixes: d250bf4e776ff09d5 ("blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter")
Cc: Josef Bacik
Cc: Christoph Hellwig
Cc: Guenter Roeck
Cc: Mark Brown
Cc: Matt Hart
Cc: Johannes Thumshirn
Cc: John Garry
Cc: Hannes Reinecke ,
Cc: "Martin K. Petersen" ,
Cc: James Bottomley
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Bart Van Assche
Tested-by: Guenter Roeck
Reported-by: Mark Brown
Reported-by: Guenter Roeck
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-08-03 04:47:20 +0800
75d6e175f blk-mq: fix updating tags depth ... Browse Code »

The passed 'nr' from userspace represents the total depth, meantime
inside 'struct blk_mq_tags', 'nr_tags' stores the total tag depth,
and 'nr_reserved_tags' stores the reserved part.

There are two issues in blk_mq_tag_update_depth() now:

1) for growing tags, we should have used the passed 'nr', and keep the
number of reserved tags not changed.

2) the passed 'nr' should have been used for checking against
'tags->nr_tags', instead of number of the normal part.

This patch fixes the above two cases, and avoids kernel crash caused
by wrong resizing sbitmap queue.

Cc: "Ewan D. Milne"
Cc: Christoph Hellwig
Cc: Bart Van Assche
Cc: Omar Sandoval
Tested by: Marco Patalano
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-08-03 04:41:58 +0800

14 Jun, 2018

1 commit

e6c3456aa blk-mq: remove blk_mq_tagset_iter ... Browse Code »

Unused now that nvme stopped using it.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jens Axboe

Christoph Hellwig
2018-06-14 23:01:45 +0800

31 May, 2018

1 commit

d250bf4e7 blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter ... Browse Code »

We already check for started commands in all callbacks, but we should
also protect against already completed commands. Do this by taking
the checks to common code.

Acked-by: Josef Bacik
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-31 01:31:34 +0800

25 May, 2018

1 commit

e6fc46498 blk-mq: avoid starving tag allocation after allocating process migrates ... Browse Code »

When the allocation process is scheduled back and the mapped hw queue is
changed, fake one extra wake up on previous queue for compensating wake
up miss, so other allocations on the previous queue won't be starved.

This patch fixes one request allocation hang issue, which can be
triggered easily in case of very low nr_request.

The race is as follows:

1) 2 hw queues, nr_requests are 2, and wake_batch is one

2) there are 3 waiters on hw queue 0

3) two in-flight requests in hw queue 0 are completed, and only two
waiters of 3 are waken up because of wake_batch, but both the two
waiters can be scheduled to another CPU and cause to switch to hw
queue 1

4) then the 3rd waiter will wait for ever, since no in-flight request
is in hw queue 0 any more.

5) this patch fixes it by the fake wakeup when waiter is scheduled to
another hw queue

Cc:
Reviewed-by: Omar Sandoval
Signed-off-by: Ming Lei

Modified commit message to make it clearer, and make it apply on
top of the 4.18 branch.

Signed-off-by: Jens Axboe

Ming Lei
2018-05-25 01:00:39 +0800