Eric Lee / smarc-fsl-linux-kernel

05 Aug, 2019

5 commits

9cc5169cd block: Improve physical block alignment of split bios ... Browse Code »

Consider the following example:
* The logical block size is 4 KB.
* The physical block size is 8 KB.
* max_sectors equals (16 KB >> 9) sectors.
* A non-aligned 4 KB and an aligned 64 KB bio are merged into a single
non-aligned 68 KB bio.

The current behavior is to split such a bio into (16 KB + 16 KB + 16 KB
+ 16 KB + 4 KB). The start of none of these five bio's is aligned to a
physical block boundary.

This patch ensures that such a bio is split into four aligned and
one non-aligned bio instead of being split into five non-aligned bios.
This improves performance because most block devices can handle aligned
requests faster than non-aligned requests.

Since the physical block size is larger than or equal to the logical
block size, this patch preserves the guarantee that the returned
value is a multiple of the logical block size.

Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Hannes Reinecke
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2019-08-05 11:41:29 +0800
708b25b34 block: Simplify blk_bio_segment_split() ... Browse Code »

Move the max_sectors check into bvec_split_segs() such that a single
call to that function can do all the necessary checks. This patch
optimizes the fast path further, namely if a bvec fits in a page.

Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Hannes Reinecke
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2019-08-05 11:41:29 +0800
ff9811b3c block: Simplify bvec_split_segs() ... Browse Code »

Simplify this function by by removing two if-tests. Other than requiring
that the @sectors pointer is not NULL, this patch does not change the
behavior of bvec_split_segs().

Reviewed-by: Johannes Thumshirn
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Hannes Reinecke
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2019-08-05 11:41:29 +0800
dad775845 block: Document the bio splitting functions ... Browse Code »

Since what the bio splitting functions do is nontrivial, document these
functions.

Reviewed-by: Johannes Thumshirn
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Hannes Reinecke
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2019-08-05 11:41:29 +0800
af2c68fe9 block: Declare several function pointer arguments 'const' ... Browse Code »

Make it clear to the compiler and also to humans that the functions
that query request queue properties do not modify any member of the
request_queue data structure.

Reviewed-by: Johannes Thumshirn
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Hannes Reinecke
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2019-08-05 11:41:29 +0800

03 Jul, 2019

1 commit

d665e12aa block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES ... Browse Code »

Fix a regression introduced when removing bi_phys_segments for Write Zeroes
requests, which need to have a segment count of zero, as they don't have a
payload.

Fixes: 14ccb66b3f58 ("block: remove the bi_phys_segments field in struct bio")
Reported-by: Jens Axboe
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-07-03 21:20:40 +0800

21 Jun, 2019

3 commits

d627065d8 block: untangle the end of blk_bio_segment_split ... Browse Code »

Now that we don't need to assign the front/back segment sizes, we can
duplicating the segs assignment for the split vs no-split case and
remove a whole chunk of boilerplate code.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-06-21 00:29:22 +0800
e9cd19c0c block: simplify blk_recalc_rq_segments ... Browse Code »

Return the segement and let the callers assign them, which makes the code
a littler more obvious. Also pass the request instead of q plus bio
chain, allowing for the use of rq_for_each_bvec.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-06-21 00:29:22 +0800
14ccb66b3 block: remove the bi_phys_segments field in struct bio ... Browse Code »

We only need the number of segments in the blk-mq submission path.
Remove the field from struct bio, and return it from a variant of
blk_queue_split instead of that it can passed as an argument to
those functions that need the value.

This also means we stop recounting segments except for cloning
and partial segments.

To keep the number of arguments in this how path down remove
pointless struct request_queue arguments from any of the functions
that had it and grew a nr_segs argument.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-06-21 00:29:22 +0800

24 May, 2019

3 commits

6869875fb block: remove the bi_seg_{front,back}_size fields in struct bio ... Browse Code »

At this point these fields aren't used for anything, so we can remove
them.

Reviewed-by: Ming Lei
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-05-24 00:25:26 +0800
200a9aff7 block: remove the segment size check in bio_will_gap ... Browse Code »

We fundamentally do not have a maximum segement size for devices with a
virt boundary. So don't bother checking it, especially given that the
existing checks didn't properly work to start with as we never fully
update the front/back segment size and miss the bi_seg_front_size that
wuld have been required for some cases.

Signed-off-by: Christoph Hellwig
Reviewed-by: Ming Lei
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-05-24 00:25:26 +0800
eded341c0 block: don't decrement nr_phys_segments for physically contigous segments ... Browse Code »

Currently ll_merge_requests_fn, unlike all other merge functions,
reduces nr_phys_segments by one if the last segment of the previous,
and the first segment of the next segement are contigous. While this
seems like a nice solution to avoid building smaller than possible
requests it causes a mismatch between the segments actually present
in the request and those iterated over by the bvec iterators, including
__rq_for_each_bio. This can for example mistrigger the single segment
optimization in the nvme-pci driver, and might lead to mismatching
nr_phys_segments number when recalculating the number of request
when inserting a cloned request.

We could possibly work around this by making the bvec iterators take
the front and back segment size into account, but that would require
moving them from the bio to the bio_iter and spreading this mess
over all users of bvecs. Or we could simply remove this optimization
under the assumption that most users already build good enough bvecs,
and that the bio merge patch never cared about this optimization
either. The latter is what this patch does.

dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests").
Reviewed-by: Ming Lei
Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-05-24 00:25:26 +0800

22 Apr, 2019

1 commit

f9f76879b block: avoid scatterlist offsets > PAGE_SIZE ... Browse Code »

While we generally allow scatterlists to have offsets larger than page
size for an entry, and other subsystems like the crypto code make use of
that, the block layer isn't quite ready for that. Flip the switch back
to avoid them for now, and revisit that decision early in a merge window
once the known offenders are fixed.

Fixes: 8a96a0e40810 ("block: rewrite blk_bvec_map_sg to avoid a nth_page call")
Reviewed-by: Ming Lei
Tested-by: Guenter Roeck
Reported-by: Guenter Roeck
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-04-22 23:48:12 +0800

12 Apr, 2019

1 commit

8a96a0e40 block: rewrite blk_bvec_map_sg to avoid a nth_page call ... Browse Code »

The offset in scatterlists is allowed to be larger than the page size,
so don't go to great length to avoid that case and simplify the
arithmetics.

Signed-off-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Reviewed-by: Ming Lei
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-04-12 23:06:36 +0800

09 Apr, 2019

1 commit

b21e11c5c block: fix build warning in merging bvecs ... Browse Code »

Commit f6970f83ef79 ("block: don't check if adjacent bvecs in one bio can
be mergeable") changes bvec merge by only considering two bvecs from
different bios. However, if the former bio doesn't inlcude any io bvec,
then the following warning may be triggered:

warning: ‘bvec.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]

In practice, it shouldn't be triggered.

Fixes it by adding check on former bio, the check shouldn't add any cost
given 'bio->bi_iter' can be hit in cache.

Reported-by: Jens Axboe
Fixes: f6970f83ef79 ("block: don't check if adjacent bvecs in one bio can be mergeable")
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-04-09 00:57:10 +0800

02 Apr, 2019

4 commits

f6970f83e block: don't check if adjacent bvecs in one bio can be mergeable ... Browse Code »

Now both passthrough and FS IO have supported multi-page bvec, and
bvec merging has been handled actually when adding page to bio, then
adjacent bvecs won't be mergeable any more if they belong to same bio.

So only try to merge bvecs if they are from different bios.

Cc: Omar Sandoval
Cc: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-04-02 02:11:48 +0800
16e3e4187 block: reuse __blk_bvec_map_sg() for mapping page sized bvec ... Browse Code »

Inside __blk_segment_map_sg(), page sized bvec mapping is optimized
a bit with one standalone branch.

So reuse __blk_bvec_map_sg() to do that.

Cc: Omar Sandoval
Cc: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-04-02 02:11:48 +0800
cae6c2e54 block: remove argument of 'request_queue' from __blk_bvec_map_sg ... Browse Code »

The argument of 'request_queue' isn't used by __blk_bvec_map_sg(),
so remove it.

Cc: Omar Sandoval
Cc: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-04-02 02:11:48 +0800
fd7d8d423 block: don't merge adjacent bvecs to one segment in bio blk_queue_split ... Browse Code »

For normal filesystem IO, each page is added via blk_add_page(),
in which bvec(page) merge has been handled already, and basically
not possible to merge two adjacent bvecs in one bio.

So not try to merge two adjacent bvecs in blk_queue_split().

Cc: Omar Sandoval
Cc: Christoph Hellwig
Reviewed-by: Boris Ostrovsky
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-04-02 02:11:21 +0800

07 Mar, 2019

1 commit

05b700ba6 block: fix segment calculation for passthrough IO ... Browse Code »

blk_recount_segments() can be called in bio_add_pc_page() for
calculating how many segments this bio will has after one page is added
to this bio. If the resulted segment number is beyond the queue limit,
the added page will be removed.

The try-and-fix policy requires blk_recount_segments(__blk_recalc_rq_segments)
to not consider the segment number limit. Unfortunately bvec_split_segs()
does check this limit, and causes small segment number returned to
bio_add_pc_page(), then page still may be added to the bio even though
segment number limit becomes broken.

Fixes this issue by not considering segment number limit when calcualting
bio's segment number.

Fixes: dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count")
Cc: Christoph Hellwig
Cc: Omar Sandoval
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-03-07 00:42:54 +0800

03 Mar, 2019

1 commit

aaeee62c8 block: fix updating bio's front segment size ... Browse Code »

When the current bvec can be merged to the 1st segment, the bio's front
segment size has to be updated.

However, dcebd755926b doesn't consider that case, then bio's front
segment size may not be correct.

This patch fixes this issue.

Cc: Christoph Hellwig
Cc: Omar Sandoval
Fixes: dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count")
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-03-03 03:45:37 +0800

27 Feb, 2019

3 commits

bbcbbd567 block: optimize blk_bio_segment_split for single-page bvec ... Browse Code »

Introduce a fast path for single-page bvec IO, then we can avoid
to call bvec_split_segs() unnecessarily.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-02-27 21:18:55 +0800
48d7727ca block: optimize __blk_segment_map_sg() for single-page bvec ... Browse Code »

Introduce a fast path for single-page bvec IO, then blk_bvec_map_sg()
can be avoided.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-02-27 21:18:54 +0800
4d633062c block: introduce bvec_nth_page() ... Browse Code »

Single-page bvec can often be seen in small BS workloads, so
introduce bvec_nth_page() for avoiding to call nth_page() unnecessarily,
which looks not cheap.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-02-27 21:18:52 +0800

20 Feb, 2019

1 commit

49b1f22b5 block: avoid to READ fields of null bio ... Browse Code »

rq->bio can be NULL sometimes, such as flush request, so don't
read bio->bi_seg_front_size until this 'bio' is checked as valid.

Cc: Bart Van Assche
Reported-by: Bart Van Assche
Fixes: dcebd755926b0f39dd1e ("block: use bio_for_each_bvec() to compute multi-page bvec count")
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-02-20 00:19:06 +0800

15 Feb, 2019

4 commits

2705c9374 block: kill QUEUE_FLAG_NO_SG_MERGE ... Browse Code »

Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"),
physical segment number is mainly figured out in blk_queue_split() for
fast path, and the flag of BIO_SEG_VALID is set there too.

Now only blk_recount_segments() and blk_recalc_rq_segments() use this
flag.

Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID
is set in blk_queue_split().

For another user of blk_recalc_rq_segments():

- run in partial completion branch of blk_update_request, which is an unusual case

- run in blk_cloned_rq_check_limits(), still not a big problem if the flag is killed
since dm-rq is the only user.

Multi-page bvec is enabled now, not doing S/G merging is rather pointless with the
current setup of the I/O path, as it isn't going to save you a significant amount
of cycles.

Reviewed-by: Christoph Hellwig
Reviewed-by: Omar Sandoval
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-02-15 23:40:12 +0800
862e5a5e6 block: use bio_for_each_bvec() to map sg ... Browse Code »

It is more efficient to use bio_for_each_bvec() to map sg, meantime
we have to consider splitting multipage bvec as done in blk_bio_segment_split().

Reviewed-by: Omar Sandoval
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-02-15 23:40:11 +0800
dcebd7559 block: use bio_for_each_bvec() to compute multi-page bvec count ... Browse Code »

First it is more efficient to use bio_for_each_bvec() in both
blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how
many multi-page bvecs there are in the bio.

Secondly once bio_for_each_bvec() is used, the bvec may need to be
splitted because its length can be very longer than max segment size,
so we have to split the big bvec into several segments.

Thirdly when splitting multi-page bvec into segments, the max segment
limit may be reached, so the bio split need to be considered under
this situation too.

Reviewed-by: Christoph Hellwig
Reviewed-by: Omar Sandoval
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-02-15 23:40:11 +0800
1a67356e9 block: don't use bio->bi_vcnt to figure out segment number ... Browse Code »

It is wrong to use bio->bi_vcnt to figure out how many segments
there are in the bio even though CLONED flag isn't set on this bio,
because this bio may be splitted or advanced.

So always use bio_segments() in blk_recount_segments(), and it shouldn't
cause any performance loss now because the physical segment number is figured
out in blk_queue_split() and BIO_SEG_VALID is set meantime since
bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting").

Reviewed-by: Omar Sandoval
Reviewed-by: Christoph Hellwig
Fixes: 76d8137a3113 ("blk-merge: recaculate segment if it isn't less than max segments")
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-02-15 23:40:10 +0800

27 Jan, 2019

1 commit

947b7ac13 Revert "block: cover another queue enter recursion via BIO_QUEUE_ENTERED" ... Browse Code »

We can't touch a bio after ->make_request_fn(), for all we know it could
already have been completed by the time this function returns.

This reverts commit 698cef173983b086977e633e46476e0f925ca01e.

Reported-by: syzbot+4df6ca820108fd248943@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe

Jens Axboe
2019-01-27 21:35:28 +0800

23 Jan, 2019

1 commit

698cef173 block: cover another queue enter recursion via BIO_QUEUE_ENTERED ... Browse Code »

Except for blk_queue_split(), bio_split() is used for splitting bio too,
then the remained bio is often resubmit to queue via generic_make_request().
So the same queue enter recursion exits in this case too. Unfortunatley
commit cd4a4ae4683dc2 doesn't help this case.

This patch covers the above case by setting BIO_QUEUE_ENTERED before calling
q->make_request_fn.

In theory the per-bio flag is used to simulate one stack variable, it is
just fine to clear it after q->make_request_fn is returned. Especially
the same bio can't be submitted from another context.

Fixes: cd4a4ae4683dc2 ("block: don't use blocking queue entered for recursive bio submits")
Cc: Tetsuo Handa
Cc: NeilBrown
Reviewed-by: Mike Snitzer
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-01-23 01:24:08 +0800

29 Dec, 2018

1 commit

938edb8a3 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi ... Browse Code »

Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas.

Additionally, we have a pile of annotation, unused variable and minor
updates.

The big API change is the updates for Christoph's DMA rework which
include removing the DISABLE_CLUSTERING flag.

And finally there are a couple of target tree updates"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (259 commits)
scsi: isci: request: mark expected switch fall-through
scsi: isci: remote_node_context: mark expected switch fall-throughs
scsi: isci: remote_device: Mark expected switch fall-throughs
scsi: isci: phy: Mark expected switch fall-through
scsi: iscsi: Capture iscsi debug messages using tracepoints
scsi: myrb: Mark expected switch fall-throughs
scsi: megaraid: fix out-of-bound array accesses
scsi: mpt3sas: mpt3sas_scsih: Mark expected switch fall-through
scsi: fcoe: remove set but not used variable 'port'
scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
scsi: smartpqi: fix build warnings
scsi: smartpqi: update driver version
scsi: smartpqi: add ofa support
scsi: smartpqi: increase fw status register read timeout
scsi: smartpqi: bump driver version
scsi: smartpqi: add smp_utils support
scsi: smartpqi: correct lun reset issues
scsi: smartpqi: correct volume status
scsi: smartpqi: do not offline disks for transient did no connect conditions
scsi: smartpqi: allow for larger raid maps
...

Linus Torvalds
2018-12-29 06:48:06 +0800

19 Dec, 2018

1 commit

38417468d scsi: block: remove the cluster flag ... Browse Code »

Now that the the SCSI layer replaced the use of the cluster flag with
segment size limits and the DMA boundary we can remove the cluster flag
from the block layer.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jens Axboe
Signed-off-by: Martin K. Petersen

Christoph Hellwig
2018-12-19 12:39:26 +0800

14 Dec, 2018

1 commit

637b60ade block: remove the blk_recount_segments export ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-12-14 21:17:55 +0800

10 Dec, 2018

2 commits

5b18b5a73 block: delete part_round_stats and switch to less precise counting ... Browse Code »

We want to convert to per-cpu in_flight counters.

The function part_round_stats needs the in_flight counter every jiffy, it
would be too costly to sum all the percpu variables every jiffy, so it
must be deleted. part_round_stats is used to calculate two counters -
time_in_queue and io_ticks.

time_in_queue can be calculated without part_round_stats, by adding the
duration of the I/O when the I/O ends (the value is almost as exact as the
previously calculated value, except that time for in-progress I/Os is not
counted).

io_ticks can be approximated by increasing the value when I/O is started
or ended and the jiffies value has changed. If the I/Os take less than a
jiffy, the value is as exact as the previously calculated value. If the
I/Os take more than a jiffy, io_ticks can drift behind the previously
calculated value.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mikulas Patocka
2018-12-10 23:30:37 +0800
112f158f6 block: stop passing 'cpu' to all percpu stats methods ... Browse Code »

All of part_stat_* and related methods are used with preempt disabled,
so there is no need to pass cpu around to allow of them. Just call
smp_processor_id() as needed.

Suggested-by: Jens Axboe
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mike Snitzer
2018-12-10 23:30:37 +0800

05 Dec, 2018

1 commit

89d04ec34 Merge tag 'v4.20-rc5' into for-4.21/block ... Browse Code »

Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and
also getting the merge fix that went into mainline that users are
hitting testing for-4.21/block and/or for-next.

* tag 'v4.20-rc5': (664 commits)
Linux 4.20-rc5
PCI: Fix incorrect value returned from pcie_get_speed_cap()
MAINTAINERS: Update linux-mips mailing list address
ocfs2: fix potential use after free
mm/khugepaged: fix the xas_create_range() error path
mm/khugepaged: collapse_shmem() do not crash on Compound
mm/khugepaged: collapse_shmem() without freezing new_page
mm/khugepaged: minor reorderings in collapse_shmem()
mm/khugepaged: collapse_shmem() remember to clear holes
mm/khugepaged: fix crashes due to misaccounted holes
mm/khugepaged: collapse_shmem() stop if punched or truncated
mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
mm/huge_memory: splitting set mapping+index before unfreeze
mm/huge_memory: rename freeze_page() to unmap_page()
initramfs: clean old path before creating a hardlink
kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
psi: make disabling/enabling easier for vendor kernels
proc: fixup map_files test on arm
debugobjects: avoid recursive calls with kmemleak
userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
...

Jens Axboe
2018-12-05 00:38:05 +0800

01 Dec, 2018

1 commit

2a5cf35cd block: fix single range discard merge ... Browse Code »

There are actually two kinds of discard merge:

- one is the normal discard merge, just like normal read/write request,
and call it single-range discard

- another is the multi-range discard, queue_max_discard_segments(rq->q) > 1

For the former case, queue_max_discard_segments(rq->q) is 1, and we
should handle this kind of discard merge like the normal read/write
request.

This patch fixes the following kernel panic issue[1], which is caused by
not removing the single-range discard request from elevator queue.

Guangwu has one raid discard test case, in which this issue is a bit
easier to trigger, and I verified that this patch can fix the kernel
panic issue in Guangwu's test case.

[1] kernel panic log from Jens's report

BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
PGD 0 P4D 0.
Oops: 0000 [#1] SMP PTI
CPU: 37 PID: 763 Comm: kworker/37:1H Not tainted \
4.20.0-rc3-00649-ge64d9a554a91-dirty #14 Hardware name: Wiwynn \
Leopard-Orv2/Leopard-DDR BW, BIOS LBM08 03/03/2017 Workqueue: kblockd \
blk_mq_run_work_fn RIP: \
0010:blk_mq_get_driver_tag+0x81/0x120 Code: 24 \
10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 00 00 00 \
0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 8b 87 48 01 00 00 8b 40 04 39 43 20 72 37 \
f6 87 b0 00 00 00 02 RSP: 0018:ffffc90004aabd30 EFLAGS: 00010246 \
RAX: 0000000000000003 RBX: ffff888465ea1300 RCX: ffffc90004aabde8
RDX: 00000000ffffffff RSI: ffffc90004aabde8 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff888465ea1348 R09: 0000000000000000
R10: 0000000000001000 R11: 00000000ffffffff R12: ffff888465ea1300
R13: 0000000000000000 R14: ffff888465ea1348 R15: ffff888465d10000
FS: 0000000000000000(0000) GS:ffff88846f9c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000148 CR3: 000000000220a003 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
blk_mq_dispatch_rq_list+0xec/0x480
? elv_rb_del+0x11/0x30
blk_mq_do_dispatch_sched+0x6e/0xf0
blk_mq_sched_dispatch_requests+0xfa/0x170
__blk_mq_run_hw_queue+0x5f/0xe0
process_one_work+0x154/0x350
worker_thread+0x46/0x3c0
kthread+0xf5/0x130
? process_one_work+0x350/0x350
? kthread_destroy_worker+0x50/0x50
ret_from_fork+0x1f/0x30
Modules linked in: sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel \
kvm switchtec irqbypass iTCO_wdt iTCO_vendor_support efivars cdc_ether usbnet mii \
cdc_acm i2c_i801 lpc_ich mfd_core ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq \
button sch_fq_codel nfsd nfs_acl lockd grace auth_rpcgss oid_registry sunrpc nvme \
nvme_core fuse sg loop efivarfs autofs4 CR2: 0000000000000148 \

---[ end trace 340a1fb996df1b9b ]---
RIP: 0010:blk_mq_get_driver_tag+0x81/0x120
Code: 24 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 \
00 00 00 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 8b 87 48 01 00 00 8b 40 04 39 43 \
20 72 37 f6 87 b0 00 00 00 02

Fixes: 445251d0f4d329a ("blk-mq: fix discard merge with scheduler attached")
Reported-by: Jens Axboe
Cc: Guangwu Zhang
Cc: Christoph Hellwig
Cc: Jianchao Wang
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-12-01 01:07:57 +0800

20 Nov, 2018

1 commit

668ffc034 block: prevent merging of requests with different priorities ... Browse Code »

Growing in size a high priority request by merging it with a lower
priority BIO or request will increase the request execution time. This
is the opposite result of the desired effect of high I/O priorities,
namely getting low I/O latencies. Prevent merging of requests and BIOs
that have different I/O priorities to fix this.

Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Damien Le Moal
2018-11-20 10:03:49 +0800

19 Nov, 2018

1 commit

a78b03bc7 Merge tag 'v4.20-rc3' into for-4.21/block ... Browse Code »

Merge in -rc3 to resolve a few conflicts, but also to get a few
important fixes that have gone into mainline since the block
4.21 branch was forked off (most notably the SCSI queue issue,
which is both a conflict AND needed fix).

Signed-off-by: Jens Axboe

Jens Axboe
2018-11-19 06:46:03 +0800