Eric Lee / smarc-fsl-linux-kernel

26 Sep, 2018

1 commit

07a252b47 blk-mq: only attempt to merge bio if there is rq in sw queue ... Browse Code »

[ Upstream commit b04f50ab8a74129b3041a2836c33c916be3c6667 ]

Only attempt to merge bio iff the ctx->rq_list isn't empty, because:

1) for high-performance SSD, most of times dispatch may succeed, then
there may be nothing left in ctx->rq_list, so don't try to merge over
sw queue if it is empty, then we can save one acquiring of ctx->lock

2) we can't expect good merge performance on per-cpu sw queue, and missing
one merge on sw queue won't be a big deal since tasks can be scheduled from
one CPU to another.

Cc: Laurence Oberman
Cc: Omar Sandoval
Cc: Bart Van Assche
Tested-by: Kashyap Desai
Reported-by: Kashyap Desai
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Ming Lei
2018-09-26 14:38:13 +0800

20 Dec, 2017

1 commit

85dcb3c85 blk-mq-sched: dispatch from scheduler IFF progress is made in ->dispatch ... Browse Code »

[ Upstream commit 5e3d02bbafad38975099b5848f5ebadedcf7bb7e ]

When the hw queue is busy, we shouldn't take requests from the scheduler
queue any more, otherwise it is difficult to do IO merge.

This patch fixes the awful IO performance on some SCSI devices(lpfc,
qla2xxx, ...) when mq-deadline/kyber is used by not taking requests if
hw queue is busy.

Reviewed-by: Omar Sandoval
Reviewed-by: Bart Van Assche
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Ming Lei
2017-12-20 17:10:27 +0800

04 Jul, 2017

1 commit

32825c45f blk-mq-sched: fix performance regression of mq-deadline ... Browse Code »

When mq-deadline is taken, IOPS of sequential read and
seqential write is observed more than 20% drop on sata(scsi-mq)
devices, compared with using 'none' scheduler.

The reason is that the default nr_requests for scheduler is
too big for small queuedepth devices, and latency is increased
much.

Since the principle of taking 256 requests for mq scheduler
is based on 128 queue depth, this patch changes into
double size of min(hw queue_depth, 128).

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2017-07-04 06:54:09 +0800

23 Jun, 2017

1 commit

f95a0d6a9 Merge commit '8e8320c9315c ' into for-4.13/block ... Browse Code »

Pull in the fix for shared tags, as it conflicts with the pending
changes in for-4.13/block. We already pulled in v4.12-rc5 to solve
other conflicts or get fixes that went into 4.12, so not a lot
of changes in this merge.

Signed-off-by: Jens Axboe

Jens Axboe
2017-06-23 11:55:24 +0800

22 Jun, 2017

1 commit

8e8320c93 blk-mq: fix performance regression with shared tags ... Browse Code »

If we have shared tags enabled, then every IO completion will trigger
a full loop of every queue belonging to a tag set, and every hardware
queue for each of those queues, even if nothing needs to be done.
This causes a massive performance regression if you have a lot of
shared devices.

Instead of doing this huge full scan on every IO, add an atomic
counter to the main queue that tracks how many hardware queues have
been marked as needing a restart. With that, we can avoid looking for
restartable queues, if we don't have to.

Max reports that this restores performance. Before this patch, 4K
IOPS was limited to 22-23K IOPS. With the patch, we are running at
950-970K IOPS.

Fixes: 6d8c6c0f97ad ("blk-mq: Restart a single queue if tag sets are shared")
Reported-by: Max Gurtovoy
Tested-by: Max Gurtovoy
Reviewed-by: Bart Van Assche
Tested-by: Bart Van Assche
Signed-off-by: Jens Axboe

Jens Axboe
2017-06-22 00:17:49 +0800

21 Jun, 2017

1 commit

7b6078146 blk-mq: Document locking assumptions ... Browse Code »

Document the locking assumptions in functions that modify
blk_mq_ctx.rq_list to make it easier for humans to verify
this code.

Signed-off-by: Bart Van Assche
Reviewed-by: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Omar Sandoval
Cc: Ming Lei
Signed-off-by: Jens Axboe

Bart Van Assche
2017-06-21 09:27:14 +0800

19 Jun, 2017

4 commits

f4560ffe8 blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue ... Browse Code »

It is required that no dispatch can happen any more once
blk_mq_quiesce_queue() returns, and we don't have such requirement
on APIs of stopping queue.

But blk_mq_quiesce_queue() still may not block/drain dispatch in the
the case of BLK_MQ_S_START_ON_RUN, so use the new introduced flag of
QUEUE_FLAG_QUIESCED and evaluate it inside RCU read-side critical
sections for fixing this issue.

Also blk_mq_quiesce_queue() is implemented via stopping queue, which
limits its uses, and easy to cause race, because any queue restart in
other paths may break blk_mq_quiesce_queue(). With the introduced
flag of QUEUE_FLAG_QUIESCED, we don't need to depend on stopping queue
for quiescing any more.

Signed-off-by: Ming Lei
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe

Ming Lei
2017-06-19 04:24:27 +0800
44e8c2bff blk-mq: refactor blk_mq_sched_assign_ioc ... Browse Code »

blk_mq_sched_assign_ioc now only handles the assigned of the ioc if
the schedule needs it (bfq only at the moment). The caller to the
per-request initializer is moved out so that it can be merged with
a similar call for the kyber I/O scheduler.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-06-19 00:08:55 +0800
ea511e3c2 blk-mq: remove blk_mq_sched_{get,put}_rq_priv ... Browse Code »

Having these as separate helpers in a header really does not help
readability, or my chances to refactor this code sanely.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-06-19 00:08:55 +0800
d2c0d3832 blk-mq: move blk_mq_sched_{get,put}_request to blk-mq.c ... Browse Code »

Having them out of line in blk-mq-sched.c just makes the code flow
unnecessarily complicated.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-06-19 00:08:55 +0800

27 May, 2017

1 commit

9bddeb2a5 blk-mq: make per-sw-queue bio merge as default .bio_merge ... Browse Code »

Because what the per-sw-queue bio merge does is basically same with
scheduler's .bio_merge(), this patch makes per-sw-queue bio merge
as the default .bio_merge if no scheduler is used or io scheduler
doesn't provide .bio_merge().

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2017-05-27 04:12:04 +0800

04 May, 2017

1 commit

d332ce091 blk-mq-debugfs: allow schedulers to register debugfs attributes ... Browse Code »

This provides the infrastructure for schedulers to expose their internal
state through debugfs. We add a list of queue attributes and a list of
hctx attributes to struct elevator_type and wire them up when switching
schedulers.

Signed-off-by: Omar Sandoval
Reviewed-by: Hannes Reinecke

Add missing seq_file.h header in blk-mq-debugfs.h

Signed-off-by: Jens Axboe

Omar Sandoval
2017-05-04 22:24:40 +0800

02 May, 2017

1 commit

9f2779bff blk-mq-sched: remove hack that bypasses scheduler for reserved requests ... Browse Code »

We have update the troublesome driver (mtip32xx) to deal with this
appropriately. So kill the hack that bypassed scheduler allocation
and insertion for reserved requests.

Reviewed-by: Ming Lei
Reviewed-by: Christoph Hellwig
Tested-by: Ming Lei
Signed-off-by: Jens Axboe

Jens Axboe
2017-05-02 21:52:08 +0800

27 Apr, 2017

1 commit

339318080 blk-mq-sched: alloate reserved tags out of normal pool ... Browse Code »

At least one driver, mtip32xx, has a hard coded dependency on
the value of the reserved tag used for internal commands. While
that should really be fixed up, for now let's ensure that we just
bypass the scheduler tags an allocation marked as reserved. They
are used for house keeping or error handling, so we can safely
ignore them in the scheduler.

Tested-by: Ming Lei
Signed-off-by: Jens Axboe

Jens Axboe
2017-04-27 21:45:46 +0800

21 Apr, 2017

1 commit

246665db3 blk-mq: Remove blk_mq_sched_move_to_dispatch() ... Browse Code »

commit c13660a08c8b ("blk-mq-sched: change ->dispatch_requests()
to ->dispatch_request()") removed the last user of this function.
Hence also remove the function itself.

Signed-off-by: Bart Van Assche
Cc: Omar Sandoval
Cc: Hannes Reinecke
Signed-off-by: Jens Axboe

Bart Van Assche
2017-04-21 07:28:30 +0800

08 Apr, 2017

2 commits

ee056f981 blk-mq-sched: provide hooks for initializing hardware queue data ... Browse Code »

Schedulers need to be informed when a hardware queue is added or removed
at runtime so they can allocate/free per-hardware queue data. So,
replace the blk_mq_sched_init_hctx_data() helper, which only makes sense
at init time, with .init_hctx() and .exit_hctx() hooks.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-04-08 02:45:41 +0800
6d8c6c0f9 blk-mq: Restart a single queue if tag sets are shared ... Browse Code »

To improve scalability, if hardware queues are shared, restart
a single hardware queue in round-robin fashion. Rename
blk_mq_sched_restart_queues() to reflect the new semantics.
Remove blk_mq_sched_mark_restart_queue() because this function
has no callers. Remove flag QUEUE_FLAG_RESTART because this
patch removes the code that uses this flag.

Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Signed-off-by: Jens Axboe

Bart Van Assche
2017-04-08 02:40:09 +0800

07 Apr, 2017

4 commits

54d5329d4 blk-mq-sched: fix crash in switch error path ... Browse Code »

In elevator_switch(), if blk_mq_init_sched() fails, we attempt to fall
back to the original scheduler. However, at this point, we've already
torn down the original scheduler's tags, so this causes a crash. Doing
the fallback like the legacy elevator path is much harder for mq, so fix
it by just falling back to none, instead.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-04-07 22:56:48 +0800
93252632e blk-mq-sched: set up scheduler tags when bringing up new queues ... Browse Code »

If a new hardware queue is added at runtime, we don't allocate scheduler
tags for it, leading to a crash. This hooks up the scheduler framework
to blk_mq_{init,exit}_hctx() to make sure everything gets properly
initialized/freed.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-04-07 22:56:46 +0800
6917ff0b5 blk-mq-sched: refactor scheduler initialization ... Browse Code »

Preparation cleanup for the next couple of fixes, push
blk_mq_sched_setup() and e->ops.mq.init_sched() into a helper.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-04-07 22:56:44 +0800
81380ca10 blk-mq: use the right hctx when getting a driver tag fails ... Browse Code »

While dispatching requests, if we fail to get a driver tag, we mark the
hardware queue as waiting for a tag and put the requests on a
hctx->dispatch list to be run later when a driver tag is freed. However,
blk_mq_dispatch_rq_list() may dispatch requests from multiple hardware
queues if using a single-queue scheduler with a multiqueue device. If
blk_mq_get_driver_tag() fails, it doesn't update the hardware queue we
are processing. This means we end up using the hardware queue of the
previous request, which may or may not be the same as that of the
current request. If it isn't, the wrong hardware queue will end up
waiting for a tag, and the requests will be on the wrong dispatch list,
leading to a hang.

The fix is twofold:

1. Make sure we save which hardware queue we were trying to get a
request for in blk_mq_get_driver_tag() regardless of whether it
succeeds or not.
2. Make blk_mq_dispatch_rq_list() take a request_queue instead of a
blk_mq_hw_queue to make it clear that it must handle multiple
hardware queues, since I've already messed this up on a couple of
occasions.

This didn't appear in testing with nvme and mq-deadline because nvme has
more driver tags than the default number of scheduler tags. However,
with the blk_mq_update_nr_hw_queues() fix, it showed up with nbd.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-04-07 22:56:26 +0800

02 Mar, 2017

3 commits

562bef425 blk-mq: move update of tags->rqs to __blk_mq_alloc_request() ... Browse Code »

No functional difference, it just makes a little more sense to update
the tag map where we actually allocate the tag.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe
Tested-by: Sagi Grimberg

Omar Sandoval
2017-03-02 23:56:04 +0800
6d2809d51 blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request ... Browse Code »

blk_mq_alloc_request_hctx() allocates a driver request directly, unlike
its blk_mq_alloc_request() counterpart. It also crashes because it
doesn't update the tags->rqs map.

Fix it by making it allocate a scheduler request.

Reported-by: Sagi Grimberg
Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe
Tested-by: Sagi Grimberg

Omar Sandoval
2017-03-02 23:56:04 +0800
415b806de blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset ... Browse Code »

Signed-off-by: Sagi Grimberg

Modified by me to also check at driver tag allocation time if the
original request was reserved, so we can be sure to allocate a
properly reserved tag at that point in time, too.

Signed-off-by: Jens Axboe

Sagi Grimberg
2017-03-02 23:56:04 +0800

24 Feb, 2017

1 commit

d38d35155 blk-mq-sched: separate mark hctx and queue restart operations ... Browse Code »

In blk_mq_sched_dispatch_requests(), we call blk_mq_sched_mark_restart()
after we dispatch requests left over on our hardware queue dispatch
list. This is so we'll go back and dispatch requests from the scheduler.
In this case, it's only necessary to restart the hardware queue that we
are running; there's no reason to run other hardware queues just because
we are using shared tags.

So, split out blk_mq_sched_mark_restart() into two operations, one for
just the hardware queue and one for the whole request queue. The core
code only needs the hctx variant, but I/O schedulers will want to use
both.

This also requires adjusting blk_mq_sched_restart_queues() to always
check the queue restart flag, not just when using shared tags.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-02-24 02:55:47 +0800

23 Feb, 2017

1 commit

b86dd815f block: get rid of blk-mq default scheduler choice Kconfig entries ... Browse Code »

The wording in the entries were poor and not understandable
by even deities. Kill the selection for default block scheduler,
and impose a policy with sane defaults.

Architected-by: Linus Torvalds
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2017-02-23 04:19:45 +0800

18 Feb, 2017

2 commits

64765a75e blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers ... Browse Code »

Usually we don't ask the scheduler for work, if we already have
leftovers on the dispatch list. This is done to leave work on
the scheduler side for as long as possible, for proper merging.
But if we do have work leftover but didn't dispatch anything,
then we should ask the scheduler since we could potentially
issue requests from that.

Signed-off-by: Jens Axboe
Reviewed-by: Omar Sandoval

Jens Axboe
2017-02-18 03:35:47 +0800
c7a571b45 blk-mq-sched: don't add flushes to the head of requeue queue ... Browse Code »

If we are currently out of driver tags, we don't want to add a
new flush (without a tag) to the head of the requeue list. We
want to add it to the back, behind the others that are
potentially also waiting for a tag.

Signed-off-by: Jens Axboe
Reviewed-by: Omar Sandoval

Jens Axboe
2017-02-18 03:35:47 +0800

11 Feb, 2017

1 commit

f1ba82616 blk-mq: pass bio to blk_mq_sched_get_rq_priv ... Browse Code »

bio is used in bfq-mq's get_rq_priv, to get the request group. We could
pass directly the group here, but I thought that passing the bio was
more general, giving the possibility to get other pieces of information
if needed.

Signed-off-by: Paolo Valente
Signed-off-by: Jens Axboe

Paolo Valente
2017-02-11 00:09:59 +0800

09 Feb, 2017

1 commit

34fe7c054 block: enumify ELEVATOR_*_MERGE ... Browse Code »

Switch these constants to an enum, and make let the compiler ensure that
all callers of blk_try_merge and elv_merge handle all potential values.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-02-09 04:43:06 +0800

04 Feb, 2017

1 commit

e4d750c97 block: free merged request in the caller ... Browse Code »

If we end up doing a request-to-request merge when we have completed
a bio-to-request merge, we free the request from deep down in that
path. For blk-mq-sched, the merge path has to hold the appropriate
lock, but we don't need it for freeing the request. And in fact
holding the lock is problematic, since we are now calling the
mq sched put_rq_private() hook with the lock held. Other call paths
do not hold this lock.

Fix this inconsistency by ensuring that the caller frees a merged
request. Then we can do it outside of the lock, making it both more
efficient and fixing the blk-mq-sched problem of invoking parts of
the scheduler with an unknown lock state.

Reported-by: Paolo Valente
Signed-off-by: Jens Axboe
Reviewed-by: Omar Sandoval

Jens Axboe
2017-02-04 00:48:28 +0800

03 Feb, 2017

1 commit

0cacba6cf blk-mq-sched: bypass the scheduler for flushes entirely ... Browse Code »

There's a weird inconsistency that flushes are mostly hidden from the
scheduler, but it needs to be aware of them in ->insert_requests().
Instead of having every scheduler call blk_mq_sched_bypass_insert(),
let's do it in the common framework.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-02-03 07:57:56 +0800

28 Jan, 2017

3 commits

f3a8ab7d5 block: cleanup remaining manual checks for PREFLUSH|FUA ... Browse Code »

Use op_is_flush() where applicable.

Signed-off-by: Jens Axboe

Jens Axboe
2017-01-28 00:08:23 +0800
bd6737f1a blk-mq-sched: add flush insertion into blk_mq_sched_insert_request() ... Browse Code »

Instead of letting the caller check this and handle the details
of inserting a flush request, put the logic in the scheduler
insertion function. This fixes direct flush insertion outside
of the usual make_request_fn calls, like from dm via
blk_insert_cloned_request().

Signed-off-by: Jens Axboe

Jens Axboe
2017-01-28 00:03:14 +0800
f73f44eb0 block: add a op_is_flush helper ... Browse Code »

This centralizes the checks for bios that needs to be go into the flush
state machine.

Signed-off-by: Christoph Hellwig
Reviewed-by: Martin K. Petersen
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-01-28 00:01:45 +0800

27 Jan, 2017

4 commits

c13660a08 blk-mq-sched: change ->dispatch_requests() to ->dispatch_request() ... Browse Code »

When we invoke dispatch_requests(), the scheduler empties everything
into the passed in list. This isn't always a good thing, since it
means that we remove items that we could have potentially merged
with.

Change the function to dispatch single requests at the time. If
we do that, we can backoff exactly at the point where the device
can't consume more IO, and leave the rest with the scheduler for
better merging and future dispatch decision making.

Signed-off-by: Jens Axboe
Reviewed-by: Omar Sandoval
Tested-by: Hannes Reinecke

Jens Axboe
2017-01-27 23:20:35 +0800
50e1dab86 blk-mq-sched: fix starvation for multiple hardware queues and shared tags ... Browse Code »

If we have both multiple hardware queues and shared tag map between
devices, we need to ensure that we propagate the hardware queue
restart bit higher up. This is because we can get into a situation
where we don't have any IO pending on a hardware queue, yet we fail
getting a tag to start new IO. If that happens, it's not enough to
mark the hardware queue as needing a restart, we need to bubble
that up to the higher level queue as well.

Signed-off-by: Jens Axboe
Reviewed-by: Omar Sandoval
Tested-by: Hannes Reinecke

Jens Axboe
2017-01-27 23:20:34 +0800
b48fda097 blk-mq-sched: check for successful allocation before assigning tag ... Browse Code »

We don't trigger this from the normal IO path, since we always use
blocking allocations from there. But Bart saw it testing multipath
dm, since that is a heavy user of atomic request allocations in
the map and clone path.

Reported-by: Bart Van Assche
Signed-off-by: Jens Axboe

Jens Axboe
2017-01-27 05:52:20 +0800
5a797e00d blk-mq: don't lose flags passed in to blk_mq_alloc_request() ... Browse Code »

If we come in from blk_mq_alloc_requst() with NOWAIT set in flags,
we must ensure that we don't later overwrite that in
blk_mq_sched_get_request(). Initialize alloc_data->flags before
passing it in.

Reported-by: Bart Van Assche
Signed-off-by: Jens Axboe

Jens Axboe
2017-01-27 03:22:11 +0800

18 Jan, 2017

1 commit

d34849913 blk-mq-sched: allow setting of default IO scheduler ... Browse Code »

Add Kconfig entries to manage what devices get assigned an MQ
scheduler, and add a blk-mq flag for drivers to opt out of scheduling.
The latter is useful for admin type queues that still allocate a blk-mq
queue and tag set, but aren't use for normal IO.

Signed-off-by: Jens Axboe
Reviewed-by: Bart Van Assche
Reviewed-by: Omar Sandoval

Jens Axboe
2017-01-18 01:04:31 +0800