04 Mar, 2016
1 commit
-
This patch applies the two introduced helpers to
figure out the 1st and last bvec.Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
23 Jan, 2016
1 commit
-
After commit e36f62042880(block: split bios to maxpossible length),
bio can be splitted in the middle of a vector entry, then it
is easy to split out one bio which size isn't aligned with block
size, especially when the block size is bigger than 512.This patch fixes the issue by making the max io size aligned
to logical block size.Fixes: e36f62042880(block: split bios to maxpossible length)
Reported-by: Stefan Haberland
Cc: Keith Busch
Suggested-by: Linus Torvalds
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
20 Jan, 2016
1 commit
-
Pull core block updates from Jens Axboe:
"We don't have a lot of core changes this time around, it's mostly in
drivers, which will come in a subsequent pull.The cores changes include:
- blk-mq
- Prep patch from Christoph, changing blk_mq_alloc_request() to
take flags instead of just using gfp_t for sleep/nosleep.
- Doc patch from me, clarifying the difference between legacy
and blk-mq for timer usage.
- Fixes from Raghavendra for memory-less numa nodes, and a reuse
of CPU masks.- Cleanup from Geliang Tang, using offset_in_page() instead of open
coding it.- From Ilya, rename request_queue slab to it reflects what it holds,
and a fix for proper use of bdgrab/put.- A real fix for the split across stripe boundaries from Keith. We
yanked a broken version of this from 4.4-rc final, this one works.- From Mike Krinkin, emit a trace message when we split.
- From Wei Tang, two small cleanups, not explicitly clearing memory
that is already cleared"* 'for-4.5/core' of git://git.kernel.dk/linux-block:
block: use bd{grab,put}() instead of open-coding
block: split bios to max possible length
block: add call to split trace point
blk-mq: Avoid memoryless numa node encoded in hctx numa_node
blk-mq: Reuse hardware context cpumask for tags
blk-mq: add a flags parameter to blk_mq_alloc_request
Revert "blk-flush: Queue through IO scheduler when flush not required"
block: clarify blk_add_timer() use case for blk-mq
bio: use offset_in_page macro
block: do not initialise statics to 0 or NULL
block: do not initialise globals to 0 or NULL
block: rename request_queue slab cache
13 Jan, 2016
1 commit
-
This splits bio in the middle of a vector to form the largest possible
bio at the h/w's desired alignment, and guarantees the bio being split
will have some data.The criteria for splitting is changed from the max sectors to the h/w's
optimal sector alignment if it is provided. For h/w that advertise their
block storage's underlying chunk size, it's a big performance win to not
submit commands that cross them. If sector alignment is not provided,
this patch uses the max sectors as before.This addresses the performance issue commit d380561113 attempted to
fix, but was reverted due to splitting logic error.Signed-off-by: Keith Busch
Cc: Jens Axboe
Cc: Ming Lei
Cc: Kent Overstreet
Cc: # 4.4.x-
Signed-off-by: Jens Axboe
09 Jan, 2016
1 commit
-
This reverts commit d3805611130af9b911e908af9f67a3f64f4f0914.
If we end up splitting on the first segment, we don't adjust
the sector count. That results in hitting a BUG() with attempting
to split 0 sectors.As this is just a performance issue and not a regression since
4.3 release, let's just rever this change. That gives us more
time to test a real fix for 4.5, which would be marked for
stable anyway.
23 Dec, 2015
1 commit
-
For h/w that advertise their block storage's underlying chunk size, it's
a big performance win to not submit commands that cross them. This patch
uses that criteria if it is provided. If it is not provided, this patch
uses the max sectors as before.Signed-off-by: Keith Busch
Signed-off-by: Jens Axboe
04 Dec, 2015
1 commit
-
There is a split tracepoint that is supposed to be called when
bio is splitted, and it was called in bio_split function until
commit 4b1faf931650d4a35b2a ("block: Kill bio_pair_split()").
But now, no one reports splits, so this patch adds call to
trace_block_split back in blk_queue_split right after split.Signed-off-by: Mike Krinkin
Signed-off-by: Jens Axboe
01 Dec, 2015
1 commit
-
When bio has only one physical segment, we should set bio's
bi_seg_front_size as the real(final) size of the single segment.Fixes: 02e707424c2ea(blk-merge: fix blk_bio_segment_split)
Reported-by: Markus Trippelsdorf
Tested-by: Markus Trippelsdorf
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
24 Nov, 2015
3 commits
-
We had seen lots of reports of this kind issue, so add one
warnning in blk-merge, then it can be triggered easily and
avoid to depend on warning/bug from drivers.Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
Commit bdced438acd83a(block: setup bi_phys_segments after
splitting) introduces function of computing bio->bi_phys_segments
during bio splitting.Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size
arn't computed, so too many physical segments may be obtained
for one request since both the two are used to check if one segment
across two bios can be possible.This patch fixes the issue by computing the two variables in
blk_bio_segment_split().Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting)
Reported-by: Michael Ellerman
Reported-by: Mark Salter
Tested-by: Laurent Dufour
Tested-by: Mark Salter
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
Inside blk_bio_segment_split(), previous bvec pointer(bvprvp)
always points to the iterator local variable, which is obviously
wrong, so fix it by pointing to the local variable of 'bvprv'.Fixes: 5014c311baa2b(block: fix bogus compiler warnings in blk-merge.c)
Cc: stable@kernel.org #4.3
Reported-by: Michael Ellerman
Reported-by: Mark Salter
Tested-by: Laurent Dufour
Tested-by: Mark Salter
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
22 Oct, 2015
2 commits
-
The splitted bio has been already too fat to merge, so mark it
as NOMERGE.Reviewed-by: Jeff Moyer
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
The number of bio->bi_phys_segments is always obtained
during bio splitting, so it is natural to setup it
just after bio splitting, then we can avoid to compute
nr_segment again during merge.Reviewed-by: Jeff Moyer
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
17 Sep, 2015
1 commit
-
biovecs has become immutable since v3.13, so it isn't necessary
to allocate biovecs for the new cloned bios, then we can save
one extra biovecs allocation/copy, and the allocation is often
not fixed-length and a bit more expensive.For example, if the 'max_sectors_kb' of null blk's queue is set
as 16(32 sectors) via sysfs just for making more splits, this patch
can increase throught about ~70% in the sequential read test over
null_blk(direct io, bs: 1M).Cc: Christoph Hellwig
Cc: Kent Overstreet
Cc: Ming Lin
Cc: Dongsu Park
Signed-off-by: Ming LeiThis fixes a performance regression introduced by commit 54efd50bfd,
and allows us to take full advantage of the fact that we have immutable
bio_vecs. Hand applied, as it rejected violently with commit
5014c311baa2.Signed-off-by: Jens Axboe
11 Sep, 2015
1 commit
-
If a driver sets the block queue virtual boundary mask, it means that
it cannot handle gaps so we must not allow those in the integrity
payload as well.Signed-off-by: Sagi Grimberg
Fixed up by me to have duplicate integrity merge functions, depending
on whether block integrity is enabled or not. Fixes a compilations
issue with CONFIG_BLK_DEV_INTEGRITY unset.Signed-off-by: Jens Axboe
04 Sep, 2015
1 commit
-
We are checking for gaps to previous bio_vec, which can
only detect back merges gaps. Moreover, at the point where
we check for a gap, we don't know if we will attempt a back
or a front merge. Thus, check for gap to prev in a back merge
attempt and check for a gap to next in a front merge attempt.Signed-off-by: Jens Axboe
[sagig: Minor rename change]
Signed-off-by: Sagi Grimberg
03 Sep, 2015
2 commits
-
The compiler can't figure out that bvprv is initialized whenever 'prev'
is set to 1 as well. Use a pointer to bvprv instead, setting it to NULL
initially, and get rid of the 'prev' tracking. This dumbs it down
enough that gcc is happy.Signed-off-by: Jens Axboe
-
Pull SG updates from Jens Axboe:
"This contains a set of scatter-gather related changes/fixes for 4.3:- Add support for limited chaining of sg tables even for
architectures that do not set ARCH_HAS_SG_CHAIN. From Christoph.- Add sg chain support to target_rd. From Christoph.
- Fixup open coded sg->page_link in crypto/omap-sham. From
Christoph.- Fixup open coded crypto ->page_link manipulation. From Dan.
- Also from Dan, automated fixup of manual sg_unmark_end()
manipulations.- Also from Dan, automated fixup of open coded sg_phys()
implementations.- From Robert Jarzmik, addition of an sg table splitting helper that
drivers can use"* 'for-4.3/sg' of git://git.kernel.dk/linux-block:
lib: scatterlist: add sg splitting function
scatterlist: use sg_phys()
crypto/omap-sham: remove an open coded access to ->page_link
scatterlist: remove open coded sg_unmark_end instances
crypto: replace scatterwalk_sg_chain with sg_chain
target/rd: always chain S/G list
scatterlist: allow limited chaining without ARCH_HAS_SG_CHAIN
02 Sep, 2015
1 commit
-
Corrects a coding error from earlier patch.
Reported by: Sagi Grimberg
Signed-off-by: Keith Busch
Fixes: 03100aada96f ("block: Replace SG_GAPS with new queue limits mask")
Signed-off-by: Jens Axboe
20 Aug, 2015
1 commit
-
The SG_GAPS queue flag caused checks for bio vector alignment against
PAGE_SIZE, but the device may have different constraints. This patch
adds a queue limits so a driver with such constraints can set to allow
requests that would have been unnecessarily split. The new gaps check
takes the request_queue as a parameter to simplify the logic around
invoking this function.This new limit makes the queue flag redundant, so removing it and
all usage. Device-mappers will inherit the correct settings through
blk_stack_limits().Signed-off-by: Keith Busch
Reviewed-by: Martin K. Petersen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
17 Aug, 2015
1 commit
-
Signed-off-by: Dan Williams
[hch: split from a larger patch by Dan]
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
14 Aug, 2015
2 commits
-
As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own ->merge_bvec_fn() callback. Remove every invocation completely.Cc: Jens Axboe
Cc: Lars Ellenberg
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina
Cc: Yehuda Sadeh
Cc: Sage Weil
Cc: Alex Elder
Cc: ceph-devel@vger.kernel.org
Cc: Alasdair Kergon
Cc: Mike Snitzer
Cc: dm-devel@redhat.com
Cc: Neil Brown
Cc: linux-raid@vger.kernel.org
Cc: Christoph Hellwig
Cc: "Martin K. Petersen"
Acked-by: NeilBrown (for the 'md' bits)
Acked-by: Mike Snitzer
Signed-off-by: Kent Overstreet
[dpark: also remove ->merge_bvec_fn() in dm-thin as well as
dm-era-target, and resolve merge conflicts]
Signed-off-by: Dongsu Park
Signed-off-by: Ming Lin
Signed-off-by: Jens Axboe -
The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them. In the future this will
let us delete merge_bvec_fn and a bunch of other code.We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to
blk_queue_bounce(); this means that blk_queue_split() and
blk_recalc_rq_segments() don't need to be concerned with bouncing
affecting segment merging.Some make_request_fn() callbacks were simple enough to audit and verify
they don't need blk_queue_split() calls. The skipped ones are:* nfhd_make_request (arch/m68k/emu/nfblock.c)
* axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
* simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
* brd_make_request (ramdisk - drivers/block/brd.c)
* mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
* loop_make_request
* null_queue_bio
* bcache's make_request fnsSome others are almost certainly safe to remove now, but will be left
for future patches.Cc: Jens Axboe
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Ming Lei
Cc: Neil Brown
Cc: Alasdair Kergon
Cc: Mike Snitzer
Cc: dm-devel@redhat.com
Cc: Lars Ellenberg
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina
Cc: Geoff Levand
Cc: Jim Paris
Cc: Philip Kelleher
Cc: Minchan Kim
Cc: Nitin Gupta
Cc: Oleg Drokin
Cc: Andreas Dilger
Acked-by: NeilBrown (for the 'md/md.c' bits)
Acked-by: Mike Snitzer
Reviewed-by: Martin K. Petersen
Signed-off-by: Kent Overstreet
[dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
Signed-off-by: Dongsu Park
Signed-off-by: Ming Lin
Signed-off-by: Jens Axboe
29 Jul, 2015
1 commit
-
Some places use helpers now, others don't. We only have the 'is set'
helper, add helpers for setting and clearing flags too.It was a bit of a mess of atomic vs non-atomic access. With
BIO_UPTODATE gone, we don't have any risk of concurrent access to the
flags. So relax the restriction and don't make any of them atomic. The
flags that do have serialization issues (reffed and chained), we
already handle those separately.Signed-off-by: Jens Axboe
30 May, 2015
1 commit
-
We can safely merge anything that wont generate an SG list entry,
so if the bio is data-less (discard), don't look at potential
SG gaps.Signed-off-by: Jens Axboe
20 Mar, 2015
1 commit
-
Use the right array index to reference the last
element of rq->biotail->bi_io_vec[]Signed-off-by: Wenbo Wang
Reviewed-by: Chong Yuan
Fixes: 66cb45aa41315 ("block: add support for limiting gaps in SG lists")
Cc: stable@kernel.org
Signed-off-by: Jens Axboe
12 Feb, 2015
2 commits
-
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
If the queue has SG_GAPS set, we must not merge across an sg gap.
This is caught for the bio case, but currently not for the
more rare case of merging two requests directly.Signed-off-by: Keith Busch
Cut the dm bits, those will go through the dm tree, and fixed
the test_bit() test.Signed-off-by: Jens Axboe
12 Nov, 2014
1 commit
-
For cloned bio, bio->bi_vcnt can't be used at all, and we
have resort to bio_segments() to figure out how many
segment there are in the bio.Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
22 Oct, 2014
1 commit
-
The problem is introduced by commit 764f612c6c3c231b(blk-merge:
don't compute bi_phys_segments from bi_vcnt for cloned bio),
and merge is needed if number of current segment isn't less than
max segments.Strictly speaking, bio->bi_vcnt shouldn't be used here since
it may not be accurate in cases of both cloned bio or bio cloned
from, but bio_segments() is a bit expensive, and bi_vcnt is still
the biggest number, so the approach should work.Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
10 Oct, 2014
1 commit
-
It isn't correct to figure out req->bi_phys_segments from bio->bi_vcnt
if the bio is cloned.Signed-off-by: Ming Lei
Tested-by: Jeff Mahoney
Signed-off-by: Jens Axboe
27 Sep, 2014
1 commit
-
We'd occasionally merge requests with conflicting integrity flags.
Introduce a merge helper which checks that the requests have compatible
integrity payloads.Signed-off-by: Martin K. Petersen
Reviewed-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Signed-off-by: Jens Axboe
03 Sep, 2014
1 commit
-
QUEUE_FLAG_NO_SG_MERGE is set at default for blk-mq devices,
so bio->bi_phys_segment computed may be bigger than
queue_max_segments(q) for blk-mq devices, then drivers will
fail to handle the case, for example, BUG_ON() in
virtio_queue_rq() can be triggerd for virtio-blk:https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1359146
This patch fixes the issue by ignoring the QUEUE_FLAG_NO_SG_MERGE
flag if the computed bio->bi_phys_segment is bigger than
queue_max_segments(q), and the regression is caused by commit
05f1dd53152173(block: add queue flag for disabling SG merging).Reported-by: Kick In
Tested-by: Chris J Arges
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
25 Jun, 2014
1 commit
-
Another restriction inherited for NVMe - those devices don't support
SG lists that have "gaps" in them. Gaps refers to cases where the
previous SG entry doesn't end on a page boundary. For NVMe, all SG
entries must start at offset 0 (except the first) and end on a page
boundary (except the last).Signed-off-by: Jens Axboe
29 May, 2014
1 commit
-
If devices are not SG starved, we waste a lot of time potentially
collapsing SG segments. Enough that 1.5% of the CPU time goes
to this, at only 400K IOPS. Add a queue flag, QUEUE_FLAG_NO_SG_MERGE,
which just returns the number of vectors in a bio instead of looping
over all segments and checking for collapsible ones.Add a BLK_MQ_F_SG_MERGE flag so that drivers can opt-in on the sg
merging, if they so desire.Signed-off-by: Jens Axboe
08 Feb, 2014
1 commit
-
Immutable biovecs changed the way biovecs are interpreted - drivers no
longer use bi_vcnt, they have to go by bi_iter.bi_size (to allow for
using part of an existing segment without modifying it).This breaks with discards and write_same bios, since for those bi_size
has nothing to do with segments in the biovec. So for now, we need a
fairly gross hack - we fortunately know that there will never be more
than one segment for the entire request, so we can special case
discard/write_same.Signed-off-by: Kent Overstreet
Tested-by: Hugh Dickins
Signed-off-by: Jens Axboe
04 Dec, 2013
1 commit
-
The uninitialized_var() macro appears to not work on structs...
Get rid of it, and manually initialize instead.Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe
27 Nov, 2013
1 commit
-
Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe
24 Nov, 2013
2 commits
-
bio_iovec_idx() and __bio_iovec() don't have any valid uses anymore -
previous users have been converted to bio_iovec_iter() or other methods.__BVEC_END() has to go too - the bvec array can't be used directly for
the last biovec because we might only be using the first portion of it,
we have to iterate over the bvec array with bio_for_each_segment() which
checks against the current value of bi_iter.bi_size.Signed-off-by: Kent Overstreet
Cc: Jens Axboe -
More prep work for immutable biovecs - with immutable bvecs drivers
won't be able to use the biovec directly, they'll need to use helpers
that take into account bio->bi_iter.bi_bvec_done.This updates callers for the new usage without changing the
implementation yet.Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Geert Uytterhoeven
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "Ed L. Cashin"
Cc: Nick Piggin
Cc: Lars Ellenberg
Cc: Jiri Kosina
Cc: Paul Clements
Cc: Jim Paris
Cc: Geoff Levand
Cc: Yehuda Sadeh
Cc: Sage Weil
Cc: Alex Elder
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris
Cc: Philip Kelleher
Cc: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge
Cc: Neil Brown
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: linux390@de.ibm.com
Cc: Nagalakshmi Nandigama
Cc: Sreekanth Reddy
Cc: support@lsi.com
Cc: "James E.J. Bottomley"
Cc: Greg Kroah-Hartman
Cc: Alexander Viro
Cc: Steven Whitehouse
Cc: Herton Ronaldo Krzesinski
Cc: Tejun Heo
Cc: Andrew Morton
Cc: Guo Chao
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Matthew Wilcox
Cc: Keith Busch
Cc: Stephen Hemminger
Cc: Quoc-Son Anh
Cc: Sebastian Ott
Cc: Nitin Gupta
Cc: Minchan Kim
Cc: Jerome Marchand
Cc: Seth Jennings
Cc: "Martin K. Petersen"
Cc: Mike Snitzer
Cc: Vivek Goyal
Cc: "Darrick J. Wong"
Cc: Chris Metcalf
Cc: Jan Kara
Cc: linux-m68k@lists.linux-m68k.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: drbd-user@lists.linbit.com
Cc: nbd-general@lists.sourceforge.net
Cc: cbe-oss-dev@lists.ozlabs.org
Cc: xen-devel@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org
Cc: linux-raid@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: DL-MPTFusionLinux@lsi.com
Cc: linux-scsi@vger.kernel.org
Cc: devel@driverdev.osuosl.org
Cc: linux-fsdevel@vger.kernel.org
Cc: cluster-devel@redhat.com
Cc: linux-mm@kvack.org
Acked-by: Geoff Levand