Eric Lee / smarc-fsl-linux-kernel

28 Oct, 2020

1 commit

4977d121b block: advance iov_iter on bio_add_hw_page failure ... Browse Code »

When the bio's size reaches max_append_sectors, bio_add_hw_page returns
0 then __bio_iov_append_get_pages returns -EINVAL. This is an expected
result of building a small enough bio not to be split in the IO path.
However, iov_iter is not advanced in this case, causing the same pages
are filled for the bio again and again.

Fix the case by properly advancing the iov_iter for already processed
pages.

Fixes: 0512a75b98f8 ("block: Introduce REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org # 5.8+
Reviewed-by: Johannes Thumshirn
Signed-off-by: Naohiro Aota
Signed-off-by: Jens Axboe

Naohiro Aota
2020-10-28 21:51:02 +0800

15 Oct, 2020

2 commits

5cd3ddc18 docs: bio: fix a kerneldoc markup ... Browse Code »

Fix this warning:

./block/bio.c:1098: WARNING: Inline emphasis start-string without end-string.

The thing is that *iter is not a valid markup.

That seems to be a typo:
*iter -> @iter

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2020-10-15 13:49:48 +0800
5b874af62 block: bio: fix a warning at the kernel-doc markups ... Browse Code »

Using "@bio's parent" causes the following waring:
./block/bio.c:10: WARNING: Inline emphasis start-string without end-string.

The main problem here is that this would be converted into:

**bio**'s parent

By kernel-doc, which is not a valid notation. It would be
possible to use, instead, this kernel-doc markup:

``bio's`` parent

Yet, here, is probably simpler to just use an altenative language:

the parent of @bio

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2020-10-15 13:49:47 +0800

14 Oct, 2020

1 commit

3ad11d7ac Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:

- Series of merge handling cleanups (Baolin, Christoph)

- Series of blk-throttle fixes and cleanups (Baolin)

- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)

- Removal of bdget() as a generic API (Christoph)

- Removal of blkdev_get() as a generic API (Christoph)

- Cleanup of is-partition checks (Christoph)

- Series reworking disk revalidation (Christoph)

- Series cleaning up bio flags (Christoph)

- bio crypt fixes (Eric)

- IO stats inflight tweak (Gabriel)

- blk-mq tags fixes (Hannes)

- Buffer invalidation fixes (Jan)

- Allow soft limits for zone append (Johannes)

- Shared tag set improvements (John, Kashyap)

- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

- DM no-wait support (Mike, Konstantin)

- Request allocation improvements (Ming)

- Allow md/dm/bcache to use IO stat helpers (Song)

- Series improving blk-iocost (Tejun)

- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...

Linus Torvalds
2020-10-14 03:12:44 +0800

06 Oct, 2020

1 commit

07560151d block: make bio_crypt_clone() able to fail ... Browse Code »

bio_crypt_clone() assumes its gfp_mask argument always includes
__GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed.

However, bio_crypt_clone() might be called with GFP_ATOMIC via
setup_clone() in drivers/md/dm-rq.c, or with GFP_NOWAIT via
kcryptd_io_read() in drivers/md/dm-crypt.c.

Neither case is currently reachable with a bio that actually has an
encryption context. However, it's fragile to rely on this. Just make
bio_crypt_clone() able to fail, analogous to bio_integrity_clone().

Reported-by: Miaohe Lin
Signed-off-by: Eric Biggers
Reviewed-by: Mike Snitzer
Reviewed-by: Satya Tangirala
Cc: Satya Tangirala
Signed-off-by: Jens Axboe

Eric Biggers
2020-10-06 00:47:43 +0800

09 Sep, 2020

1 commit

2cd896a5e block: Set same_page to false in __bio_try_merge_page if ret is false ... Browse Code »

If we hit the UINT_MAX limit of bio->bi_iter.bi_size and so we are anyway
not merging this page in this bio, then it make sense to make same_page
also as false before returning.

Without this patch, we hit below WARNING in iomap.
This mostly happens with very large memory system and / or after tweaking
vm dirty threshold params to delay writeback of dirty data.

WARNING: CPU: 18 PID: 5130 at fs/iomap/buffered-io.c:74 iomap_page_release+0x120/0x150
CPU: 18 PID: 5130 Comm: fio Kdump: loaded Tainted: G W 5.8.0-rc3 #6
Call Trace:
__remove_mapping+0x154/0x320 (unreliable)
iomap_releasepage+0x80/0x180
try_to_release_page+0x94/0xe0
invalidate_inode_page+0xc8/0x110
invalidate_mapping_pages+0x1dc/0x540
generic_fadvise+0x3c8/0x450
xfs_file_fadvise+0x2c/0xe0 [xfs]
vfs_fadvise+0x3c/0x60
ksys_fadvise64_64+0x68/0xe0
sys_fadvise64+0x28/0x40
system_call_exception+0xf8/0x1c0
system_call_common+0xf0/0x278

Fixes: cc90bc68422 ("block: fix "check bi_size overflow before merge"")
Reported-by: Shivaprasad G Bhat
Suggested-by: Christoph Hellwig
Signed-off-by: Anju T Sudhakar
Signed-off-by: Ritesh Harjani
Reviewed-by: Ming Lei
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Ritesh Harjani
2020-09-09 22:18:45 +0800

18 Aug, 2020

1 commit

d81665198 block: Fix page_is_mergeable() for compound pages ... Browse Code »

If we pass in an offset which is larger than PAGE_SIZE, then
page_is_mergeable() thinks it's not mergeable with the previous bio_vec,
leading to a large number of bio_vecs being used. Use a slightly more
obvious test that the two pages are compatible with each other.

Fixes: 52d52d1c98a9 ("block: only allow contiguous page structs in a bio_vec")
Signed-off-by: Matthew Wilcox (Oracle)
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Matthew Wilcox (Oracle)
2020-08-18 10:35:53 +0800

01 Aug, 2020

1 commit

3cf148891 block: bio: delete duplicated words ... Browse Code »

Drop the repeated words "a" and "the".

Signed-off-by: Randy Dunlap
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe

Randy Dunlap
2020-08-01 06:29:47 +0800

01 Jul, 2020

1 commit

ed00aabd5 block: rename generic_make_request to submit_bio_noacct ... Browse Code »

generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-01 21:27:24 +0800

29 Jun, 2020

5 commits

28fc591ff block: move the bio cgroup associatation helpers to blk-cgroup.c ... Browse Code »

Keep the cgroup code together.

Acked-by: Tejun Heo
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-06-29 23:09:08 +0800
a18b9b159 block: move bio_associate_blkg_from_page to mm/page_io.c ... Browse Code »

bio_associate_blkg_from_page is a special purpose helper for swap bios
that doesn't need access to bio internals. Move it to the swap code
instead of having it in bio.c.

Acked-by: Tejun Heo
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-06-29 23:09:08 +0800
2badf06cf block: merge __bio_associate_blkg into bio_associate_blkg_from_css ... Browse Code »

Merge __bio_associate_blkg into the only caller, which allows to slightly
reduce the RCU crticial section and better explain the code flow.

Acked-by: Tejun Heo
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-06-29 23:09:08 +0800
d92c370a1 block: really clone the block cgroup in bio_clone_blkg_association ... Browse Code »

bio_clone_blkg_association is supposed to clone the associatation, but
actually ends up doing a search with a tryget. As we know we have a
reference on the source cgroup just get an unconditional additional
reference to it and call it a day. That also removes the need for
a RCU critical section.

Acked-by: Tejun Heo
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-06-29 23:09:08 +0800
db9819c76 block: remove bio_disassociate_blkg ... Browse Code »

bio_disassociate_blkg has two callers, of which one immediately assigns
a new value to >bi_blkg. Just open code the function in the two callers.

Acked-by: Tejun Heo
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-06-29 23:09:08 +0800

24 Jun, 2020

1 commit

1f4fe21cf block: bio: Use struct_size() in kmalloc() ... Browse Code »

Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.

This code was detected with the help of Coccinelle and, audited and
fixed manually.

Signed-off-by: Gustavo A. R. Silva
Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83
Signed-off-by: Jens Axboe

Gustavo A. R. Silva
2020-06-24 23:15:58 +0800

05 Jun, 2020

1 commit

d24de76af block: remove the error argument to the block_bio_complete tracepoint ... Browse Code »

The status can be trivially derived from the bio itself. That also avoid
callers like NVMe to incorrectly pass a blk_status_t instead of the errno,
and the overhead of translating the blk_status_t to the errno in the I/O
completion fast path when no tracing is enabled.

Fixes: 35fe0d12c8a3 ("nvme: trace bio completion")
Signed-off-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-06-05 11:16:11 +0800

27 May, 2020

2 commits

9123bf6f2 block: move update_io_ticks to blk-core.c ... Browse Code »

All callers are in blk-core.c, so move update_io_ticks over.

Signed-off-by: Christoph Hellwig
Reviewed-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-27 19:21:23 +0800
e722fff23 block: remove generic_{start,end}_io_acct ... Browse Code »

Remove these now unused functions.

Signed-off-by: Christoph Hellwig
Reviewed-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-27 19:21:23 +0800

19 May, 2020

1 commit

10ec5e86f block: merge part_{inc,dev}_in_flight into their only callers ... Browse Code »

part_inc_in_flight and part_dec_in_flight only have one caller each, and
those callers are purely for bio based drivers. Merge each function into
the only caller, and remove the superflous blk-mq checks.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-19 23:35:24 +0800

14 May, 2020

1 commit

a892c8d52 block: Inline encryption support for blk-mq ... Browse Code »

We must have some way of letting a storage device driver know what
encryption context it should use for en/decrypting a request. However,
it's the upper layers (like the filesystem/fscrypt) that know about and
manages encryption contexts. As such, when the upper layer submits a bio
to the block layer, and this bio eventually reaches a device driver with
support for inline encryption, the device driver will need to have been
told the encryption context for that bio.

We want to communicate the encryption context from the upper layer to the
storage device along with the bio, when the bio is submitted to the block
layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
represent an encryption context (note that we can't use the bi_private
field in struct bio to do this because that field does not function to pass
information across layers in the storage stack). We also introduce various
functions to manipulate the bio_crypt_ctx and make the bio/request merging
logic aware of the bio_crypt_ctx.

We also make changes to blk-mq to make it handle bios with encryption
contexts. blk-mq can merge many bios into the same request. These bios need
to have contiguous data unit numbers (the necessary changes to blk-merge
are also made to ensure this) - as such, it suffices to keep the data unit
number of just the first bio, since that's all a storage driver needs to
infer the data unit number to use for each data block in each bio in a
request. blk-mq keeps track of the encryption context to be used for all
the bios in a request with the request's rq_crypt_ctx. When the first bio
is added to an empty request, blk-mq will program the encryption context
of that bio into the request_queue's keyslot manager, and store the
returned keyslot in the request's rq_crypt_ctx. All the functions to
operate on encryption contexts are in blk-crypto.c.

Upper layers only need to call bio_crypt_set_ctx with the encryption key,
algorithm and data_unit_num; they don't have to worry about getting a
keyslot for each encryption context, as blk-mq/blk-crypto handles that.
Blk-crypto also makes it possible for request-based layered devices like
dm-rq to make use of inline encryption hardware by cloning the
rq_crypt_ctx and programming a keyslot in the new request_queue when
necessary.

Note that any user of the block layer can submit bios with an
encryption context, such as filesystems, device-mapper targets, etc.

Signed-off-by: Satya Tangirala
Reviewed-by: Eric Biggers
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Satya Tangirala
2020-05-14 23:47:53 +0800

13 May, 2020

3 commits

29b2a3aa2 block: export bio_release_pages and bio_iov_iter_get_pages ... Browse Code »

Export bio_release_pages and bio_iov_iter_get_pages, so they can be used
from modular code.

Signed-off-by: Johannes Thumshirn
Reviewed-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Johannes Thumshirn
2020-05-13 10:36:28 +0800
0512a75b9 block: Introduce REQ_OP_ZONE_APPEND ... Browse Code »

Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned
block device. This is a no-merge write operation.

A zone append write BIO must:
* Target a zoned block device
* Have a sector position indicating the start sector of the target zone
* The target zone must be a sequential write zone
* The BIO must not cross a zone boundary
* The BIO size must not be split to ensure that a single range of LBAs
is written with a single command.

Implement these checks in generic_make_request_checks() using the
helper function blk_check_zone_append(). To avoid write append BIO
splitting, introduce the new max_zone_append_sectors queue limit
attribute and ensure that a BIO size is always lower than this limit.
Export this new limit through sysfs and check these limits in bio_full().

Also when a LLDD can't dispatch a request to a specific zone, it
will return BLK_STS_ZONE_RESOURCE indicating this request needs to
be delayed, e.g. because the zone it will be dispatched to is still
write-locked. If this happens set the request aside in a local list
to continue trying dispatching requests such as READ requests or a
WRITE/ZONE_APPEND requests targetting other zones. This way we can
still keep a high queue depth without starving other requests even if
one request can't be served due to zone write-locking.

Finally, make sure that the bio sector position indicates the actual
write position as indicated by the device on completion.

Signed-off-by: Keith Busch
[ jth: added zone-append specific add_page and merge_page helpers ]
Signed-off-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Keith Busch
2020-05-13 10:36:28 +0800
e45811057 block: rename __bio_add_pc_page to bio_add_hw_page ... Browse Code »

Rename __bio_add_pc_page() to bio_add_hw_page() and explicitly pass in a
max_sectors argument.

This max_sectors argument can be used to specify constraints from the
hardware.

Signed-off-by: Christoph Hellwig
[ jth: rebased and made public for blk-map.c ]
Signed-off-by: Johannes Thumshirn
Reviewed-by: Daniel Wagner
Reviewed-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-13 10:36:28 +0800

28 Mar, 2020

1 commit

130879f1e block: move bio_map_* to blk-map.c ... Browse Code »

The bio_map_* helpers are just the low-level helpers for the
blk_rq_map_* APIs. Move them together for better logical grouping,
as no there isn't much overlap with other code in bio.c.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-28 02:04:34 +0800

25 Mar, 2020

3 commits

29125ed62 block: move guard_bio_eod to bio.c ... Browse Code »

This is bio layer functionality and not related to buffer heads.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-25 23:50:08 +0800
8cd5b8fc0 block/diskstats: replace time_in_queue with sum of request times ... Browse Code »

Column "time_in_queue" in diskstats is supposed to show total waiting time
of all requests. I.e. value should be equal to the sum of times from other
columns. But this is not true, because column "time_in_queue" is counted
separately in jiffies rather than in nanoseconds as other times.

This patch removes redundant counter for "time_in_queue" and shows total
time of read, write, discard and flush requests.

Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Jens Axboe

Konstantin Khlebnikov
2020-03-25 22:49:12 +0800
2b8bd4236 block/diskstats: more accurate approximation of io_ticks for slow disks ... Browse Code »

Currently io_ticks is approximated by adding one at each start and end of
requests if jiffies counter has changed. This works perfectly for requests
shorter than a jiffy or if one of requests starts/ends at each jiffy.

If disk executes just one request at a time and they are longer than two
jiffies then only first and last jiffies will be accounted.

Fix is simple: at the end of request add up into io_ticks jiffies passed
since last update rather than just one jiffy.

Example: common HDD executes random read 4k requests around 12ms.

fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
iostat -x 10 sdb

Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch:

Before:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43

After:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99

Now io_ticks does not loose time between start and end of requests, but
for queue-depth > 1 some I/O time between adjacent starts might be lost.

For load estimation "%util" is not as useful as average queue length,
but it clearly shows how often disk queue is completely empty.

Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: Konstantin Khlebnikov
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Konstantin Khlebnikov
2020-03-25 22:49:08 +0800

24 Mar, 2020

1 commit

5cbd28e3c block: move disk_name and related helpers out of partition-generic.c ... Browse Code »

Thes functions aren't really related to partition support, so move them
to a more suitable place.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-24 21:57:07 +0800

18 Mar, 2020

1 commit

de6a78b60 block: Prevent hung_check firing during long sync IO ... Browse Code »

submit_bio_wait() can be called from ioctl(BLKSECDISCARD), which
may take long time to complete, as Salman mentioned, 4K BLKSECDISCARD
takes up to 100 second on some devices. Also any block I/O operation
that occurs after the BLKSECDISCARD is submitted will also potentially
be affected by the hung task timeouts.

Another report is that task hang can be observed when running mkfs
over raid10 which takes a small max discard sectors limit because
of chunk size.

So prevent hung_check from firing by taking same approach used
in blk_execute_rq(), and the wake-up interval is set as half the
hung_check timer period, which keeps overhead low enough.

Cc: Salman Qazi
Cc: Jesse Barnes
Cc: Bart Van Assche
Link: https://lkml.org/lkml/2020/2/12/1193
Reported-by: Salman Qazi
Reviewed-by: Jesse Barnes
Reviewed-by: Salman Qazi
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2020-03-18 22:48:03 +0800

09 Jan, 2020

1 commit

83c9c5471 fs: move guard_bio_eod() after bio_set_op_attrs ... Browse Code »

Commit 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
adds bio_truncate() for handling bio EOD. However, bio_truncate()
doesn't use the passed 'op' parameter from guard_bio_eod's callers.

So bio_trunacate() may retrieve wrong 'op', and zering pages may
not be done for READ bio.

Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
in submit_bh_wbc() so that bio_truncate() can always retrieve correct
op info.

Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
used any more.

Cc: Carlos Maiolino
Cc: linux-fsdevel@vger.kernel.org
Fixes: 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
Signed-off-by: Ming Lei

Fold in kerneldoc and bio_op() change.

Signed-off-by: Jens Axboe

Ming Lei
2020-01-09 23:16:12 +0800

29 Dec, 2019

1 commit

85a8ce62c block: add bio_truncate to fix guard_bio_eod ... Browse Code »

Some filesystem, such as vfat, may send bio which crosses device boundary,
and the worse thing is that the IO request starting within device boundaries
can contain more than one segment past EOD.

Commit dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
tries to fix this issue by returning -EIO for this situation. However,
this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
may hang for ever.

Also the current truncating on last segment is dangerous by updating the
last bvec, given bvec table becomes not immutable any more, and fs bio
users may not retrieve the truncated pages via bio_for_each_segment_all() in
its .end_io callback.

Fixes this issue by supporting multi-segment truncating. And the
approach is simpler:

- just update bio size since block layer can make correct bvec with
the updated bio size. Then bvec table becomes really immutable.

- zero all truncated segments for read bio

Cc: Carlos Maiolino
Cc: linux-fsdevel@vger.kernel.org
Fixed-by: dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2019-12-29 00:44:56 +0800

10 Dec, 2019

1 commit

cc90bc684 block: fix "check bi_size overflow before merge" ... Browse Code »

This partially reverts commit e3a5d8e386c3fb973fa75f2403622a8f3640ec06.

Commit e3a5d8e386c3 ("check bi_size overflow before merge") adds a bio_full
check to __bio_try_merge_page. This will cause __bio_try_merge_page to fail
when the last bi_io_vec has been reached. Instead, what we want here is only
the bi_size overflow check.

Fixes: e3a5d8e386c3 ("block: check bi_size overflow before merge")
Cc: stable@vger.kernel.org # v5.4+
Reviewed-by: Ming Lei
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Jens Axboe

Andreas Gruenbacher
2019-12-10 13:04:35 +0800

06 Dec, 2019

1 commit

ece841abb block: fix memleak of bio integrity data ... Browse Code »

7c20f11680a4 ("bio-integrity: stop abusing bi_end_io") moves
bio_integrity_free from bio_uninit() to bio_integrity_verify_fn()
and bio_endio(). This way looks wrong because bio may be freed
without calling bio_endio(), for example, blk_rq_unprep_clone() is
called from dm_mq_queue_rq() when the underlying queue of dm-mpath
is busy.

So memory leak of bio integrity data is caused by commit 7c20f11680a4.

Fixes this issue by re-adding bio_integrity_free() to bio_uninit().

Fixes: 7c20f11680a4 ("bio-integrity: stop abusing bi_end_io")
Reviewed-by: Christoph Hellwig
Signed-off-by Justin Tee

Add commit log, and simplify/fix the original patch wroten by Justin.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Justin Tee
2019-12-06 02:38:36 +0800

12 Nov, 2019

1 commit

e3a5d8e38 block: check bi_size overflow before merge ... Browse Code »

__bio_try_merge_page() may merge a page to bio without bio_full() check
and cause bi_size overflow.

The overflow typically ends up with sd_init_command() warning on zero
segment request with call trace like this:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 1986 at drivers/scsi/scsi_lib.c:1025 scsi_init_io+0x156/0x180
CPU: 2 PID: 1986 Comm: kworker/2:1H Kdump: loaded Not tainted 5.4.0-rc7 #1
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:scsi_init_io+0x156/0x180
RSP: 0018:ffffa11487663bf0 EFLAGS: 00010246
RAX: 00000000002be0a0 RBX: ffff8e6e9ff30118 RCX: 0000000000000000
RDX: 00000000ffffffe1 RSI: 0000000000000000 RDI: ffff8e6e9ff30118
RBP: ffffa11487663c18 R08: ffffa11487663d28 R09: ffff8e6e9ff30150
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8e6e9ff30000
R13: 0000000000000001 R14: ffff8e74a1cf1800 R15: ffff8e6e9ff30000
FS: 0000000000000000(0000) GS:ffff8e6ea7680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff18cf0fe8 CR3: 0000000659f0a001 CR4: 00000000001606e0
Call Trace:
sd_init_command+0x326/0xb40 [sd_mod]
scsi_queue_rq+0x502/0xaa0
? blk_mq_get_driver_tag+0xe7/0x120
blk_mq_dispatch_rq_list+0x256/0x5a0
? elv_rb_del+0x24/0x30
? deadline_remove_request+0x7b/0xc0
blk_mq_do_dispatch_sched+0xa3/0x140
blk_mq_sched_dispatch_requests+0xfb/0x170
__blk_mq_run_hw_queue+0x81/0x130
blk_mq_run_work_fn+0x1b/0x20
process_one_work+0x179/0x390
worker_thread+0x4f/0x3e0
kthread+0x105/0x140
? max_active_store+0x80/0x80
? kthread_bind+0x20/0x20
ret_from_fork+0x35/0x40
---[ end trace f9036abf5af4a4d3 ]---
blk_update_request: I/O error, dev sdd, sector 2875552 op 0x1:(WRITE) flags 0x0 phys_seg 0 prio class 0
XFS (sdd1): writeback error on sector 2875552

__bio_try_merge_page() should check the overflow before actually doing
merge.

Fixes: 07173c3ec276c ("block: enable multipage bvecs")
Reviewed-by: Christoph Hellwig
Reviewed-by: Ming Lei
Reviewed-by: Hannes Reinecke
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Jens Axboe

Junichi Nomura
2019-11-12 22:26:27 +0800

22 Aug, 2019

3 commits

d1916c86c block: move same page handling from __bio_add_pc_page to the callers ... Browse Code »

Hiding page refcount manipulation inside a low-level bio helper is
somewhat awkward. Instead return the same page information to the
callers, where it fits in much better.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-08-22 21:14:39 +0800
384209cd5 block: create a bio_try_merge_pc_page helper ... Browse Code »

Passsthrough bio handling should be the same as normal bio handling,
except that we need to take hardware limitations into account. Thus
use the common try_merge implementation after checking the hardware
limits. This changes behavior in that we now also check segment
and dma boundary settings for same page merges, which is a little
more work but has no effect as those need to be larger than the
page size.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-08-22 21:14:38 +0800
320ea869a block: improve the gap check in __bio_add_pc_page ... Browse Code »

If we can add more data into an existing segment we do not create a gap
per definition, so move the check for a gap after the attempt to merge
into the segment.

Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-08-22 21:14:36 +0800

14 Aug, 2019

1 commit

b8e24a930 block: annotate refault stalls from IO submission ... Browse Code »

psi tracks the time tasks wait for refaulting pages to become
uptodate, but it does not track the time spent submitting the IO. The
submission part can be significant if backing storage is contended or
when cgroup throttling (io.latency) is in effect - a lot of time is
spent in submit_bio(). In that case, we underreport memory pressure.

Annotate submit_bio() to account submission time as memory stall when
the bio is reading userspace workingset pages.

Tested-by: Suren Baghdasaryan
Signed-off-by: Johannes Weiner
Signed-off-by: Jens Axboe

Johannes Weiner
2019-08-14 22:50:01 +0800

06 Aug, 2019

1 commit

00ec4f303 block: stop exporting bio_map_kern ... Browse Code »

Now that there no module users left of bio_map_kern, stop exporting the
symbol.

Reviewed-by: Javier González
Reviewed-by: Christoph Hellwig
Signed-off-by: Hans Holmberg
Signed-off-by: Jens Axboe

Hans Holmberg
2019-08-06 22:20:10 +0800

05 Aug, 2019

1 commit

dad775845 block: Document the bio splitting functions ... Browse Code »

Since what the bio splitting functions do is nontrivial, document these
functions.

Reviewed-by: Johannes Thumshirn
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Hannes Reinecke
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2019-08-05 11:41:29 +0800