04 May, 2019
4 commits
-
In normal queue cleanup path, hctx is released after request queue
is freed, see blk_mq_release().However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because
of hw queues shrinking. This way is easy to cause use-after-free,
because: one implicit rule is that it is safe to call almost all block
layer APIs if the request queue is alive; and one hctx may be retrieved
by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues();
finally use-after-free is triggered.Fixes this issue by always freeing hctx after releasing request queue.
If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce
a per-queue list to hold them, then try to resuse these hctxs if numa
node is matched.Cc: Dongli Zhang
Cc: James Smart
Cc: Bart Van Assche
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen ,
Cc: Christoph Hellwig ,
Cc: James E . J . Bottomley ,
Reviewed-by: Hannes Reinecke
Tested-by: James Smart
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
Split blk_mq_alloc_and_init_hctx into two parts, and one is
blk_mq_alloc_hctx() for allocating all hctx resources, another
is blk_mq_init_hctx() for initializing hctx, which serves as
counter-part of blk_mq_exit_hctx().Cc: Dongli Zhang
Cc: James Smart
Cc: Bart Van Assche
Cc: linux-scsi@vger.kernel.org
Cc: Martin K . Petersen
Cc: Christoph Hellwig
Cc: James E . J . Bottomley
Reviewed-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Tested-by: James Smart
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
Once blk_cleanup_queue() returns, tags shouldn't be used any more,
because blk_mq_free_tag_set() may be called. Commit 45a9c9d909b2
("blk-mq: Fix a use-after-free") fixes this issue exactly.However, that commit introduces another issue. Before 45a9c9d909b2,
we are allowed to run queue during cleaning up queue if the queue's
kobj refcount is held. After that commit, queue can't be run during
queue cleaning up, otherwise oops can be triggered easily because
some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue().We have invented ways for addressing this kind of issue before, such as:
8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done")
c2856ae2f315 ("blk-mq: quiesce queue before freeing queue")But still can't cover all cases, recently James reports another such
kind of issue:https://marc.info/?l=linux-scsi&m=155389088124782&w=2
This issue can be quite hard to address by previous way, given
scsi_run_queue() may run requeues for other LUNs.Fixes the above issue by freeing hctx's resources in its release handler, and this
way is safe becasue tags isn't needed for freeing such hctx resource.This approach follows typical design pattern wrt. kobject's release handler.
Cc: Dongli Zhang
Cc: James Smart
Cc: Bart Van Assche
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen ,
Cc: Christoph Hellwig ,
Cc: James E . J . Bottomley ,
Reported-by: James Smart
Fixes: 45a9c9d909b2 ("blk-mq: Fix a use-after-free")
Cc: stable@vger.kernel.org
Reviewed-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Tested-by: James Smart
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
With holding queue's kobject refcount, it is safe for driver
to schedule requeue. However, blk_mq_kick_requeue_list() may
be called after blk_sync_queue() is done because of concurrent
requeue activities, then requeue work may not be completed when
freeing queue, and kernel oops is triggered.So moving the cancel of requeue_work into blk_mq_release() for
avoiding race between requeue and freeing queue.Cc: Dongli Zhang
Cc: James Smart
Cc: Bart Van Assche
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen ,
Cc: Christoph Hellwig ,
Cc: James E . J . Bottomley ,
Reviewed-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Reviewed-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Tested-by: James Smart
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
03 May, 2019
1 commit
-
The comment was out of date.
Reviewed-by: Bart Van Assche
Signed-off-by: Raul E Rangel
Signed-off-by: Jens Axboe
01 May, 2019
1 commit
-
Various block layer files do not have any licensing information at all.
Add SPDX tags for the default kernel GPLv2 license to those.Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
14 Apr, 2019
1 commit
-
A previous commit moved the shallow depth and BFQ depth map calculations
to be done at init time, moving it outside of the hotter IO path. This
potentially causes hangs if the users changes the depth of the scheduler
map, by writing to the 'nr_requests' sysfs file for that device.Add a blk-mq-sched hook that allows blk-mq to inform the scheduler if
the depth changes, so that the scheduler can update its internal state.Tested-by: Kai Krakow
Reported-by: Paolo Valente
Fixes: f0635b8a416e ("bfq: calculate shallow depths at init time")
Signed-off-by: Jens Axboe
10 Apr, 2019
1 commit
-
In NVMe's error handler, follows the typical steps of tearing down
hardware for recovering controller:1) stop blk_mq hw queues
2) stop the real hw queues
3) cancel in-flight requests via
blk_mq_tagset_busy_iter(tags, cancel_request, ...)
cancel_request():
mark the request as abort
blk_mq_complete_request(req);
4) destroy real hw queuesHowever, there may be race between #3 and #4, because blk_mq_complete_request()
may run q->mq_ops->complete(rq) remotelly and asynchronously, and
->complete(rq) may be run after #4.This patch introduces blk_mq_complete_request_sync() for fixing the
above race.Cc: Sagi Grimberg
Cc: Bart Van Assche
Cc: James Smart
Cc: linux-nvme@lists.infradead.org
Reviewed-by: Keith Busch
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
05 Apr, 2019
1 commit
-
blk_mq_try_issue_directly() can return BLK_STS*_RESOURCE for requests that
have been queued. If that happens when blk_mq_try_issue_directly() is called
by the dm-mpath driver then dm-mpath will try to resubmit a request that is
already queued and a kernel crash follows. Since it is nontrivial to fix
blk_mq_request_issue_directly(), revert the blk_mq_request_issue_directly()
changes that went into kernel v5.0.This patch reverts the following commits:
* d6a51a97c0b2 ("blk-mq: replace and kill blk_mq_request_issue_directly") # v5.0.
* 5b7a6f128aad ("blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requests") # v5.0.
* 7f556a44e61d ("blk-mq: refactor the code of issue request directly") # v5.0.Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Jianchao Wang
Cc: Hannes Reinecke
Cc: Johannes Thumshirn
Cc: James Smart
Cc: Dongli Zhang
Cc: Laurence Oberman
Cc:
Reported-by: Laurence Oberman
Tested-by: Laurence Oberman
Fixes: 7f556a44e61d ("blk-mq: refactor the code of issue request directly") # v5.0.
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe
04 Apr, 2019
1 commit
-
We would never be able to sort the list if we first reset plug->rq_count
which is used in conditional check later.Fixes: ce5b009cff19 ("block: improve logic around when to sort a plug list")
Reviewed-by: Ming Lei
Signed-off-by: Dongli Zhang
Signed-off-by: Jens Axboe
02 Apr, 2019
2 commits
-
For now, we just trace plug for single queue device or drivers
provide .commit_rqs, and have not trace plug for multiple queues
device. But, unplug events will be recorded when call
blk_mq_flush_plug_list(). Then, trace events will be asymmetrical,
just have unplug and without plug.This patch add trace plug and unplug for multiple queues device in
blk_mq_make_request(). After that, we can accurately trace plug and
unplug for multiple queues.Reviewed-by: Christoph Hellwig
Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe -
kfree() can leak the hctx->fq->flush_rq field.
Reviewed-by: Ming Lei
Signed-off-by: Shenghui Wang
Signed-off-by: Jens Axboe
26 Mar, 2019
1 commit
-
We now wrap sbitmap waitqueues in an active counter, so we can avoid
iterating wakeups unless we have waiters there. This works as long as
everyone that's manipulating the waitqueues use the proper helpers. For
the tag wait case for shared tags, however, we add ourselves to the
waitqueue without incrementing/decrementing the ->ws_active count. This
means that wakeups can take a long time to happen.Fix this by manually doing the inc/dec as needed for the wait queue
handling.Reported-by: Michael Leun
Tested-by: Michael Leun
Cc: stable@vger.kernel.org
Reviewed-by: Omar Sandoval
Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check")
Signed-off-by: Jens Axboe
25 Mar, 2019
1 commit
-
For now, blk_mq_hctx_has_pending() checks any of ctx, hctx->dispatch
or io scheduler have pending work. So, update the comment accordingly.Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe
21 Mar, 2019
2 commits
-
This function is not used outside the block layer core. Hence unexport it.
Cc: Christoph Hellwig
Cc: Ming Lei
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe -
For q->poll_nsec == -1, means doing classic poll, not hybrid poll.
We introduce a new flag BLK_MQ_POLL_CLASSIC to replace -1, which
may make code much easier to read.Additionally, since val is an int obtained with kstrtoint(), val can be
a negative value other than -1, so return -EINVAL for that case.Thanks to Damien Le Moal for some good suggestion.
Reviewed-by: Damien Le Moal
Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe
18 Mar, 2019
1 commit
-
Let blk_mq_mark_tag_wait() use the blk_mq_sched_mark_restart_hctx()
to set BLK_MQ_S_SCHED_RESTART.Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe
10 Mar, 2019
1 commit
-
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: arcmsr, qla2xxx, lpfc,
hisi_sas, target/iscsi and target/core.Additionally Christoph refactored gdth as part of the dma changes. The
major mid-layer change this time is the removal of bidi commands and
with them the whole of the osd/exofs driver and filesystem. This is a
major simplification for block and mq in particular"* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (240 commits)
scsi: cxgb4i: validate tcp sequence number only if chip version pf
scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
scsi: mpt3sas: Add missing breaks in switch statements
scsi: aacraid: Fix missing break in switch statement
scsi: kill command serial number
scsi: csiostor: drop serial_number usage
scsi: mvumi: use request tag instead of serial_number
scsi: dpt_i2o: remove serial number usage
scsi: st: osst: Remove negative constant left-shifts
scsi: ufs-bsg: Allow reading descriptors
scsi: ufs: Allow reading descriptor via raw upiu
scsi: ufs-bsg: Change the calling convention for write descriptor
scsi: ufs: Remove unused device quirks
Revert "scsi: ufs: disable vccq if it's not needed by UFS device"
scsi: megaraid_sas: Remove a bunch of set but not used variables
scsi: clean obsolete return values of eh_timed_out
scsi: sd: Optimal I/O size should be a multiple of physical block size
scsi: MAINTAINERS: SCSI initiator and target tweaks
scsi: fcoe: make use of fip_mode enum complete
...
09 Mar, 2019
1 commit
-
Pull block layer updates from Jens Axboe:
"Not a huge amount of changes in this round, the biggest one is that we
finally have Mings multi-page bvec support merged. Apart from that,
this pull request contains:- Small series that avoids quiescing the queue for sysfs changes that
match what we currently have (Aleksei)- Series of bcache fixes (via Coly)
- Series of lightnvm fixes (via Mathias)
- NVMe pull request from Christoph. Nothing major, just SPDX/license
cleanups, RR mp policy (Hannes), and little fixes (Bart,
Chaitanya).- BFQ series (Paolo)
- Save blk-mq cpu -> hw queue mapping, removing a pointer indirection
for the fast path (Jianchao)- fops->iopoll() added for async IO polling, this is a feature that
the upcoming io_uring interface will use (Christoph, me)- Partition scan loop fixes (Dongli)
- mtip32xx conversion from managed resource API (Christoph)
- cdrom registration race fix (Guenter)
- MD pull from Song, two minor fixes.
- Various documentation fixes (Marcos)
- Multi-page bvec feature. This brings a lot of nice improvements
with it, like more efficient splitting, larger IOs can be supported
without growing the bvec table size, and so on. (Ming)- Various little fixes to core and drivers"
* tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits)
block: fix updating bio's front segment size
block: Replace function name in string with __func__
nbd: propagate genlmsg_reply return code
floppy: remove set but not used variable 'q'
null_blk: fix checking for REQ_FUA
block: fix NULL pointer dereference in register_disk
fs: fix guard_bio_eod to check for real EOD errors
blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map
block: optimize bvec iteration in bvec_iter_advance
block: introduce mp_bvec_for_each_page() for iterating over page
block: optimize blk_bio_segment_split for single-page bvec
block: optimize __blk_segment_map_sg() for single-page bvec
block: introduce bvec_nth_page()
iomap: wire up the iopoll method
block: add bio_set_polled() helper
block: wire up block device iopoll method
fs: add an iopoll method to struct file_operations
loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
loop: do not print warn message if partition scan is successful
block: bounce: make sure that bvec table is updated
...
01 Mar, 2019
1 commit
-
Replace set->map[0] with set->map[HCTX_TYPE_DEFAULT] to avoid hardcoding.
Signed-off-by: Dongli Zhang
Signed-off-by: Jens Axboe
15 Feb, 2019
1 commit
-
Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"),
physical segment number is mainly figured out in blk_queue_split() for
fast path, and the flag of BIO_SEG_VALID is set there too.Now only blk_recount_segments() and blk_recalc_rq_segments() use this
flag.Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID
is set in blk_queue_split().For another user of blk_recalc_rq_segments():
- run in partial completion branch of blk_update_request, which is an unusual case
- run in blk_cloned_rq_check_limits(), still not a big problem if the flag is killed
since dm-rq is the only user.Multi-page bvec is enabled now, not doing S/G merging is rather pointless with the
current setup of the I/O path, as it isn't going to save you a significant amount
of cycles.Reviewed-by: Christoph Hellwig
Reviewed-by: Omar Sandoval
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
12 Feb, 2019
1 commit
-
When requeue, if RQF_DONTPREP, rq has contained some driver
specific data, so insert it to hctx dispatch list to avoid any
merge. Take scsi as example, here is the trace event log (no
io scheduler, because RQF_STARTED would prevent merging),kworker/0:1H-339 [000] ...1 2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
scsi_inert_test-1987 [000] .... 2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
scsi_inert_test-1987 [000] ...2 2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
kworker/0:1H-339 [000] .... 2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
scsi_inert_test-1996 [000] ..s1 2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
scsi_inert_test-1996 [000] .Ns1 2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
kworker/0:1H-339 [000] ...1 2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
kworker/0:1H-339 [000] ...1 2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
scsi_inert_test-1986 [000] ..s1 2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0](32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
the sdb only contained the part of (32768 + 8), then only that part
was completed. The lucky thing was that scsi_io_completion detected
it and requeued the remaining part. So we didn't get corrupted data.
However, the requeue of (32776 + 8) is not expected.Suggested-by: Jens Axboe
Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe
09 Feb, 2019
1 commit
-
There's no reason to freeze queue and set nr_requests value
if current value is the same.Signed-off-by: Aleksei Zakharov
Signed-off-by: Jens Axboe
06 Feb, 2019
2 commits
-
Unused now, and another field in struct request bites the dust.
Signed-off-by: Christoph Hellwig
Acked-by: Jens Axboe
Signed-off-by: Martin K. Petersen -
No users left.
Signed-off-by: Christoph Hellwig
Acked-by: Jens Axboe
Signed-off-by: Martin K. Petersen
01 Feb, 2019
2 commits
-
Currently, we check whether the hctx type is supported every time
in hot path. Actually, this is not necessary, we could save the
default hctx into ctx->hctxs if the type is not supported when
map swqueues and use it directly with ctx->hctxs[type].We also needn't check whether the poll is enabled or not, because
the caller would clear the REQ_HIPRI in that case.Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe -
Currently, the queue mapping result is saved in a two-dimensional
array. In the hot path, to get a hctx, we need do following:q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]]
This isn't very efficient. We could save the queue mapping result into
ctx directly with different hctx type, like,ctx->hctxs[type]
Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe
16 Jan, 2019
1 commit
-
We need to pass bio->bi_opf after bio intergrity preparing, otherwise
the flag of REQ_INTEGRITY may not be set on the allocated request, then
breaks block integrity.Fixes: f9afca4d367b ("blk-mq: pass in request/bio flags to queue mapping")
Cc: Hannes Reinecke
Cc: Keith Busch
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
19 Dec, 2018
1 commit
-
block consumers will need it for polling requests that
are sent with blk_execute_rq_nowait. Also, get rid of
blk_tag_to_qc_t and open-code it instead.Reviewed-by: Jens Axboe
Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
18 Dec, 2018
4 commits
-
The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues
is bigger than zero, so enhance the constraint by checking .nr_queues of type poll
before enabling IO poll.Otherwise IO race & timeout can be observed when running block/007.
Cc: Jeff Moyer
Cc: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
There's a single user of this function, dm, and dm just wants
to check if IO is inflight, not that it's just allocated.This fixes a hang with srp/002 in blktests with dm, where it tries
to suspend but waits for inflight IO to finish first. As it checks
for just allocated requests, this fails.Tested-by: Mike Snitzer
Signed-off-by: Jens Axboe -
From 7e849dd9cf37 ("nvme-pci: don't share queue maps"), the mapping
table won't be initialized actually if map->nr_queues is zero, so
we can't use blk_mq_map_queue_type() to retrieve hctx any more.This way still may cause broken mapping, fix it by skipping zero-queues
maps in blk_mq_map_swqueue().Cc: Jeff Moyer
Cc: Mike Snitzer
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
When a request is added to rq list of sw queue(ctx), the rq may be from
a different type of hctx, especially after multi queue mapping is
introduced.So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
hctx can be dispatched to current hctx in case that read queue or poll
queue is enabled.This patch fixes this issue by introducing per-queue-type list.
Cc: Christoph Hellwig
Signed-off-by: Ming LeiChanged by me to not use separately cacheline aligned lists, just
place them all in the same cacheline where we had just the one list
and lock before.Signed-off-by: Jens Axboe
17 Dec, 2018
1 commit
-
Type of each element in queue mapping table is 'unsigned int,
intead of 'struct blk_mq_queue_map)', so fix it.Cc: Jeff Moyer
Cc: Mike Snitzer
Cc: Christoph Hellwig
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
16 Dec, 2018
3 commits
-
Replace blk_mq_request_issue_directly with blk_mq_try_issue_directly
in blk_insert_cloned_request and kill it as nobody uses it any more.Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe -
It is not necessary to issue request directly with bypass 'true'
in blk_mq_sched_insert_requests and handle the non-issued requests
itself. Just set bypass to 'false' and let blk_mq_try_issue_directly
handle them totally. Remove the blk_rq_can_direct_dispatch check,
because blk_mq_try_issue_directly can handle it well.If request is
direct-issued unsuccessfully, insert the reset.Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe -
Merge blk_mq_try_issue_directly and __blk_mq_try_issue_directly
into one interface to unify the interfaces to issue requests
directly. The merged interface takes over the requests totally,
it could insert, end or do nothing based on the return value of
.queue_rq and 'bypass' parameter. Then caller needn't any other
handling any more and then code could be cleaned up.And also the commit c616cbee ( blk-mq: punt failed direct issue
to dispatch list ) always inserts requests to hctx dispatch list
whenever get a BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE, this is
overkill and will harm the merging. We just need to do that for
the requests that has been through .queue_rq. This patch also
could fix this.Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe
10 Dec, 2018
2 commits
-
The previous patches deleted all the code that needed the second value
returned from part_in_flight - now the kernel only uses the first value.Consequently, part_in_flight (and blk_mq_in_flight) may be changed so that
it only returns one value.This patch just refactors the code, there's no functional change.
Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe -
Pull in v4.20-rc6 to resolve the conflict in NVMe, but also to get the
two corruption fixes. We're going to be overhauling the direct dispatch
path, and we need to do that on top of the changes we made for that
in mainline.Signed-off-by: Jens Axboe
08 Dec, 2018
1 commit
-
Now almost all .map_queues() implementation based on managed irq
affinity doesn't update queue mapping and it just retrieves the
old built mapping, so if nr_hw_queues is changed, the mapping talbe
includes stale mapping. And only blk_mq_map_queues() may rebuild
the mapping talbe.One case is that we limit .nr_hw_queues as 1 in case of kdump kernel.
However, drivers often builds queue mapping before allocating tagset
via pci_alloc_irq_vectors_affinity(), but set->nr_hw_queues can be set
as 1 in case of kdump kernel, so wrong queue mapping is used, and
kernel panic[1] is observed during booting.This patch fixes the kernel panic triggerd on nvme by rebulding the
mapping table via blk_mq_map_queues().[1] kernel panic log
[ 4.438371] nvme nvme0: 16/0/0 default/read/poll queues
[ 4.443277] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
[ 4.444681] PGD 0 P4D 0
[ 4.445367] Oops: 0000 [#1] SMP NOPTI
[ 4.446342] CPU: 3 PID: 201 Comm: kworker/u33:10 Not tainted 4.20.0-rc5-00664-g5eb02f7ee1eb-dirty #459
[ 4.447630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
[ 4.448689] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[ 4.449368] RIP: 0010:blk_mq_map_swqueue+0xfb/0x222
[ 4.450596] Code: 04 f5 20 28 ef 81 48 89 c6 39 55 30 76 93 89 d0 48 c1 e0 04 48 03 83 f8 05 00 00 48 8b 00 42 8b 3c 28 48 8b 43 58 48 8b 04 f8 8b b8 98 00 00 00 4c 0f a3 37 72 42 f0 4c 0f ab 37 66 8b b8 f6
[ 4.453132] RSP: 0018:ffffc900023b3cd8 EFLAGS: 00010286
[ 4.454061] RAX: 0000000000000000 RBX: ffff888174448000 RCX: 0000000000000001
[ 4.456480] RDX: 0000000000000001 RSI: ffffe8feffc506c0 RDI: 0000000000000001
[ 4.458750] RBP: ffff88810722d008 R08: ffff88817647a880 R09: 0000000000000002
[ 4.464580] R10: ffffc900023b3c10 R11: 0000000000000004 R12: ffff888174448538
[ 4.467803] R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000001
[ 4.469220] FS: 0000000000000000(0000) GS:ffff88817bac0000(0000) knlGS:0000000000000000
[ 4.471554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.472464] CR2: 0000000000000098 CR3: 0000000174e4e001 CR4: 0000000000760ee0
[ 4.474264] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4.476007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4.477061] PKRU: 55555554
[ 4.477464] Call Trace:
[ 4.478731] blk_mq_init_allocated_queue+0x36a/0x3ad
[ 4.479595] blk_mq_init_queue+0x32/0x4e
[ 4.480178] nvme_validate_ns+0x98/0x623 [nvme_core]
[ 4.480963] ? nvme_submit_sync_cmd+0x1b/0x20 [nvme_core]
[ 4.481685] ? nvme_identify_ctrl.isra.8+0x70/0xa0 [nvme_core]
[ 4.482601] nvme_scan_work+0x23a/0x29b [nvme_core]
[ 4.483269] ? _raw_spin_unlock_irqrestore+0x25/0x38
[ 4.483930] ? try_to_wake_up+0x38d/0x3b3
[ 4.484478] ? process_one_work+0x179/0x2fc
[ 4.485118] process_one_work+0x1d3/0x2fc
[ 4.485655] ? rescuer_thread+0x2ae/0x2ae
[ 4.486196] worker_thread+0x1e9/0x2be
[ 4.486841] kthread+0x115/0x11d
[ 4.487294] ? kthread_park+0x76/0x76
[ 4.487784] ret_from_fork+0x3a/0x50
[ 4.488322] Modules linked in: nvme nvme_core qemu_fw_cfg virtio_scsi ip_tables
[ 4.489428] Dumping ftrace buffer:
[ 4.489939] (ftrace buffer empty)
[ 4.490492] CR2: 0000000000000098
[ 4.491052] ---[ end trace 03cd268ad5a86ff7 ]---Cc: Christoph Hellwig
Cc: linux-nvme@lists.infradead.org
Cc: David Milburn
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe