Eric Lee / smarc-fsl-linux-kernel

31 May, 2018

25 commits

d450542e3 block, bfq: increase weight-raising duration for interactive apps ... Browse Code »

The maximum possible duration of the weight-raising period for
interactive applications is limited to 13 seconds, as this is the time
needed to load the largest application that we considered when tuning
weight raising. Unfortunately, in such an evaluation, we did not
consider the case of very slow virtual machines.

For example, on a QEMU/KVM virtual machine
- running in a slow PC;
- with a virtual disk stacked on a slow low-end 5400rpm HDD;
- serving a heavy I/O workload, such as the sequential reading of
several files;
mplayer takes 23 seconds to start, if constantly weight-raised.

To address this issue, this commit conservatively sets the upper limit
for weight-raising duration to 25 seconds.

Signed-off-by: Davide Sapienza
Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Davide Sapienza
2018-05-31 22:54:40 +0800
e24f1c245 block, bfq: remove slow-system class ... Browse Code »

BFQ computes the duration of weight raising for interactive
applications automatically, using some reference parameters. In
particular, BFQ uses the best durations (see comments in the code for
how these durations have been assessed) for two classes of systems:
slow and fast ones. Examples of slow systems are old phones or systems
using micro HDDs. Fast systems are all the remaining ones. Using these
parameters, BFQ computes the actual duration of the weight raising,
for the system at hand, as a function of the relative speed of the
system w.r.t. the speed of a reference system, belonging to the same
class of systems as the system at hand.

This slow vs fast differentiation proved to be useful in the past, but
happens to have little meaning with current hardware. Even worse, it
does cause problems in virtual systems, where the speed of the system
can vary frequently, and so widely to just confuse the class-detection
mechanism, and, as we have verified experimentally, to cause BFQ to
compute non-sensical weight-raising durations.

This commit addresses this issue by removing the slow class and the
class-detection mechanism.

Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Paolo Valente
2018-05-31 22:54:38 +0800
4029eef1b block, bfq: add description of weight-raising heuristics ... Browse Code »

A description of how weight raising works is missing in BFQ
sources. In addition, the code for handling weight raising is
scattered across a few functions. This makes it rather hard to
understand the mechanism and its rationale. This commits adds such a
description at the beginning of the main source file.

Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Paolo Valente
2018-05-31 22:54:36 +0800
ac857e0d5 block, bfq: remove the removal of 'next' rq in bfq_requests_merged ... Browse Code »

Since bfq_finish_request() is always called on the request 'next',
after bfq_requests_merged() is finished, and bfq_finish_request()
removes 'next' from its bfq_queue if needed, it isn't necessary to do
such a removal in advance in bfq_merged_requests().

This commit removes such a useless 'next' removal.

Signed-off-by: Filippo Muzzini
Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Filippo Muzzini
2018-05-31 22:48:32 +0800
8abfa4d6f block, bfq: remove wrong check in bfq_requests_merged ... Browse Code »

The request rq passed to the function bfq_requests_merged is always in
a bfq_queue, so the check !RB_EMPTY_NODE(&rq->rb_node) at the
beginning of bfq_requests_merged always succeeds, and the control
flow systematically skips to the end of the function. This implies
that the body of the function is never executed, i.e., the
repositioning of rq is never performed.

On the opposite end, a control is missing in the body of the function:
'next' must be removed only if it is inside a bfq_queue.

This commit removes the wrong check on rq, and adds the missing check
on 'next'. In addition, this commit adds comments on
bfq_requests_merged.

Signed-off-by: Filippo Muzzini
Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Paolo Valente
2018-05-31 22:48:05 +0800
a12bffebc block, bfq: remove wrong lock in bfq_requests_merged ... Browse Code »

In bfq_requests_merged(), there is a deadlock because the lock on
bfqq->bfqd->lock is held by the calling function, but the code of
this function tries to grab the lock again.

This deadlock is currently hidden by another bug (fixed by next commit
for this source file), which causes the body of bfq_requests_merged()
to be never executed.

This commit removes the deadlock by removing the lock/unlock pair.

Signed-off-by: Filippo Muzzini
Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Filippo Muzzini
2018-05-31 22:42:27 +0800
04c4950d5 block: fixup bioset_integrity_create() call ... Browse Code »

Missed converting the bioset_integrity_create() bounce bio set
call.

Fixes: 338aa96d5661 ("block: convert bounce, q->bio_split to bioset_init()/mempool_init()")
Signed-off-by: Jens Axboe

Jens Axboe
2018-05-31 08:51:21 +0800
dad085275 block: Drop bioset_create() ... Browse Code »

All users have been converted to bioset_init(), kill off the
old API.

Reviewed-by: Christoph Hellwig
Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800
e292d7bc6 xfs: convert to bioset_init()/mempool_init() ... Browse Code »

Convert XFS to embedded bio sets.

Acked-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800
8ac9f7c1f btrfs: convert to bioset_init()/mempool_init() ... Browse Code »

Convert btrfs to embedded bio sets.

Acked-by: Chris Mason
Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800
52190f8ab fs: convert block_dev.c to bioset_init() ... Browse Code »

Convert block DIO code to embedded bio sets.

Reviewed-by: Christoph Hellwig
Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800
a47a28b74 target: convert to bioset_init()/mempool_init() ... Browse Code »

Convert the target code to embedded bio sets.

Reviewed-by: Christoph Hellwig
Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800
6f1c819c2 dm: convert to bioset_init()/mempool_init() ... Browse Code »

Convert dm to embedded bio sets.

Acked-by: Mike Snitzer
Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800
afeee514c md: convert to bioset_init()/mempool_init() ... Browse Code »

Convert md to embedded bio sets.

Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800
d19936a26 bcache: convert to bioset_init()/mempool_init() ... Browse Code »

Convert bcache to embedded bio sets.

Reviewed-by: Coly Li
Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800
b906bbb69 lightnvm: convert to bioset_init()/mempool_init() ... Browse Code »

Convert lightnvm to embedded bio sets.

Reviewed-by: Javier González
Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800
64c4bc4de pktcdvd: convert to bioset_init()/mempool_init() ... Browse Code »

Convert pktcdvd to embedded bio sets.

Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800
0892fac87 drbd: convert to bioset_init()/mempool_init() ... Browse Code »

Convert drbd to embedded bio sets and mempools.

Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800
338aa96d5 block: convert bounce, q->bio_split to bioset_init()/mempool_init() ... Browse Code »

Convert the core block functionality to embedded bio sets.

Reviewed-by: Christoph Hellwig
Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe

Kent Overstreet
2018-05-31 05:33:32 +0800
0b6bad7d6 blk-throttle: return proper bool type to caller instead of 0/1 ... Browse Code »

Change to return true/false only for bool type return code.

Signed-off-by: Chengguang Xu
Signed-off-by: Jens Axboe

Chengguang Xu
2018-05-31 02:48:22 +0800
d250bf4e7 blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter ... Browse Code »

We already check for started commands in all callbacks, but we should
also protect against already completed commands. Do this by taking
the checks to common code.

Acked-by: Josef Bacik
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-31 01:31:34 +0800
5e3c3a7ec nbd: clear DISCONNECT_REQUESTED flag once disconnection occurs. ... Browse Code »

When a userspace client requests a NBD device be disconnected, the
DISCONNECT_REQUESTED flag is set. While this flag is set, the driver
will not inform userspace when a connection is closed.

Unfortunately the flag was never cleared, so once a disconnect was
requested the driver would thereafter never tell userspace about a
closed connection. Thus when connections failed due to timeout, no
attempt to reconnect was made and eventually the device would fail.

Fix by clearing the DISCONNECT_REQUESTED flag (and setting the
DISCONNECTED flag) once all connections are closed.

Reviewed-by: Josef Bacik
Signed-off-by: Kevin Vigor
Signed-off-by: Jens Axboe

Kevin Vigor
2018-05-31 01:30:42 +0800
2ab74cd29 blk-throttle: fix potential NULL pointer dereference in throtl_select_dispatch ... Browse Code »

tg in throtl_select_dispatch is used first and then do check. Since tg
may be NULL, it has potential NULL pointer dereference risk. So fix
it.

Signed-off-by: Joseph Qi
Signed-off-by: Liu Bo
Signed-off-by: Jens Axboe

Liu Bo
2018-05-31 00:54:33 +0800
a6088845c block: kyber: make kyber more friendly with merging ... Browse Code »

Currently, kyber is very unfriendly with merging. kyber depends
on ctx rq_list to do merging, however, most of time, it will not
leave any requests in ctx rq_list. This is because even if tokens
of one domain is used up, kyber will try to dispatch requests
from other domain and flush the rq_list there.

To improve this, we setup kyber_ctx_queue (kcq) which is similar
with ctx, but it has rq_lists for different domain and build same
mapping between kcq and khd as the ctx & hctx. Then we could merge,
insert and dispatch for different domains separately. At the same
time, only flush the rq_list of kcq when get domain token successfully.
Then if one domain token is used up, the requests could be left in
the rq_list of that domain and maybe merged with following io.

Following is my test result on machine with 8 cores and NVMe card
INTEL SSDPEKKR128G7

fio size=256m ioengine=libaio iodepth=64 direct=1 numjobs=8
seq/random
+------+---------------------------------------------------------------+
|patch?| bw(MB/s) | iops | slat(usec) | clat(usec) | merge |
+----------------------------------------------------------------------+
| w/o | 606/612 | 151k/153k | 6.89/7.03 | 3349.21/3305.40 | 0/0 |
+----------------------------------------------------------------------+
| w/ | 1083/616 | 277k/154k | 4.93/6.95 | 1830.62/3279.95 | 223k/3k |
+----------------------------------------------------------------------+
When set numjobs to 16, the bw and iops could reach 1662MB/s and 425k
on my platform.

Signed-off-by: Jianchao Wang
Tested-by: Holger Hoffstätte
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jianchao Wang
2018-05-31 00:47:40 +0800
9c5587346 blk-mq: abstract out blk-mq-sched rq list iteration bio merge helper ... Browse Code »

No functional changes in this patch, just a prep patch for utilizing
this in an IO scheduler.

Signed-off-by: Jens Axboe
Reviewed-by: Omar Sandoval

Jens Axboe
2018-05-31 00:43:58 +0800

30 May, 2018

2 commits

5de815a7e block: remove parent device reference from struct bsg_class_device ... Browse Code »

Bsg holding a reference to the parent device may result in a crash if a
bsg file handle is closed after the parent device driver has unloaded.

Holding a reference is not really needed: the parent device must exist
between bsg_register_queue and bsg_unregister_queue. Before the device
goes away the caller does blk_cleanup_queue so that all in-flight
requests to the device are gone and all new requests cannot pass beyond
the queue. The queue itself is a refcounted object and it will stay
alive with a bsg file.

Based on analysis, previous patch and changelog from Anatoliy Glagolev.

Reported-by: Anatoliy Glagolev
Reviewed-by: James E.J. Bottomley
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-30 03:00:25 +0800
b7405176b Merge branch 'nvme-4.18-2' of git://git.infradead.org/nvme into for-4.18/block ... Browse Code »

Pull NVMe changes from Christoph:

"Here is the current batch of nvme updates for 4.18, we have a few more
patches in the queue, but I'd like to get this pile into your tree
and linux-next ASAP.

The biggest item is support for file-backed namespaces in the NVMe
target from Chaitanya, in addition to that we mostly small fixes from
all the usual suspects."

* 'nvme-4.18-2' of git://git.infradead.org/nvme:
nvme: fixup memory leak in nvme_init_identify()
nvme: fix KASAN warning when parsing host nqn
nvmet-loop: use nr_phys_segments when map rq to sgl
nvmet-fc: increase LS buffer count per fc port
nvmet: add simple file backed ns support
nvmet: remove duplicate NULL initialization for req->ns
nvmet: make a few error messages more generic
nvme-fabrics: allow duplicate connections to the discovery controller
nvme-fabrics: centralize discovery controller defaults
nvme-fabrics: remove unnecessary controller subnqn validation
nvme-fc: remove setting DNR on exception conditions
nvme-rdma: stop admin queue before freeing it
nvme-pci: Fix AER reset handling
nvme-pci: set nvmeq->cq_vector after alloc cq/sq
nvme: host: core: fix precedence of ternary operator
nvme: fix lockdep warning in nvme_mpath_clear_current_path

Jens Axboe
2018-05-30 02:56:20 +0800

29 May, 2018

13 commits

5afb78356 block: don't print a message when the device went away ... Browse Code »

The information about a size change in this case just creates confusion.

Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
4163a0398 block: unexport check_disk_size_change ... Browse Code »

Only used in block_dev.c and the partitions code, and it should remain
that way..

Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
0b7576d8e block: move ->timeout request member ... Browse Code »

After the recent timeout handling changes, we have two holes in
the struct. Move the timeout near the deadline, killing both,
and moving related members closer together. On my config on
x86-64, this shrinks struct request from 312 to 304 bytes.

Signed-off-by: Jens Axboe

Jens Axboe
2018-05-29 22:59:21 +0800
d1210d5af blk-mq: simplify blk_mq_rq_timed_out ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
88b0cfad2 block: document the blk_eh_timer_return values ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
f6e7d48a7 block: remove BLK_EH_HANDLED ... Browse Code »

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
adb2b769d libiscsi: don't try to bypass SCSI EH ... Browse Code »

libiscsi is the only SCSI code that return BLK_EH_HANDLED, thus trying to
bypass the normal SCSI EH code. We are going to remove this return value
at the block layer, and at least from a quick look it doesn't look too
harmful to try to send an abort for these cases, especially as the first
one should not actually be possible. If this doesn't work out iscsi
will probably need its own eh_strategy_handler instead to just do the
right thing.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
ad73d6fea mmc: complete requests from ->timeout ... Browse Code »

By completing the request entirely in the driver we can remove the
BLK_EH_HANDLED return value and thus the split responsibility between the
driver and the block layer that has been causing trouble.

[While this keeps existing behavior it seems to mismatch the comment,
maintainers please chime in!]

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
1fc2b62ed scsi_transport_fc: complete requests from ->timeout ... Browse Code »

By completing the request entirely in the driver we can remove the
BLK_EH_HANDLED return value and thus the split responsibility between the
driver and the block layer that has been causing trouble.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
0df0bb080 null_blk: complete requests from ->timeout ... Browse Code »

By completing the request entirely in the driver we can remove the
BLK_EH_HANDLED return value and thus the split responsibility between the
driver and the block layer that has been causing trouble.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
c5fb85b7f mtip32xx: complete requests from ->timeout ... Browse Code »

By completing the request entirely in the driver we can remove the
BLK_EH_HANDLED return value and thus the split responsibility between the
driver and the block layer that has been causing trouble.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
e5eab0170 nbd: complete requests from ->timeout ... Browse Code »

By completing the request entirely in the driver we can remove the
BLK_EH_HANDLED return value and thus the split responsibility between the
driver and the block layer that has been causing trouble.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800
db8c48e4b nvme: return BLK_EH_DONE from ->timeout ... Browse Code »

NVMe always completes the request before returning from ->timeout, either
by polling for it, or by disabling the controller. Return BLK_EH_DONE so
that the block layer doesn't even try to complete it again.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-05-29 22:59:21 +0800