Eric Lee / smarc-fsl-linux-kernel

04 Sep, 2020

1 commit

b445547ec blk-mq, elevator: Count requests per hctx to improve performance ... Browse Code »

High CPU utilization on "native_queued_spin_lock_slowpath" due to lock
contention is possible for mq-deadline and bfq IO schedulers
when nr_hw_queues is more than one.

It is because kblockd work queue can submit IO from all online CPUs
(through blk_mq_run_hw_queues()) even though only one hctx has pending
commands.

The elevator callback .has_work for mq-deadline and bfq scheduler considers
pending work if there are any IOs on request queue but it does not account
hctx context.

Add a per-hctx 'elevator_queued' count to the hctx to avoid triggering
the elevator even though there are no requests queued.

[jpg: Relocated atomic_dec() in dd_dispatch_request(), update commit message per Kashyap]

Signed-off-by: Kashyap Desai
Signed-off-by: Hannes Reinecke
Signed-off-by: John Garry
Tested-by: Douglas Gilbert
Signed-off-by: Jens Axboe

Kashyap Desai
2020-09-04 05:20:47 +0800

30 May, 2020

1 commit

5d9c305b8 blk-mq: remove the bio argument to ->prepare_request ... Browse Code »

None of the I/O schedulers actually needs it.

Signed-off-by: Christoph Hellwig
Reviewed-by: Ming Lei
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Reviewed-by: Bart Van Assche
Reviewed-by: Daniel Wagner
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-05-30 00:23:24 +0800

06 Sep, 2019

1 commit

68c43f133 block: Introduce elevator features ... Browse Code »

Introduce the definition of elevator features through the
elevator_features flags in the elevator_type structure. Each flag can
represent a feature supported by an elevator. The first feature defined
by this patch is support for zoned block device sequential write
constraint with the flag ELEVATOR_F_ZBD_SEQ_WRITE, which is implemented
by the mq-deadline elevator using zone write locking.

Other possible features are IO priorities, write hints, latency targets
or single-LUN dual-actuator disks (for which the elevator could maintain
one LBA ordered list per actuator).

The required_elevator_features field is also added to the request_queue
structure to allow a device driver to specify elevator feature flags
that an elevator must support for the correct operation of the device
(e.g. device drivers for zoned block devices can have the
ELEVATOR_F_ZBD_SEQ_WRITE flag as a required feature).
The helper function blk_queue_required_elevator_features() is
defined for setting this new field.

With these two new fields in place, the elevator functions
elevator_match() and elevator_find() are modified to allow a user to set
only an elevator with a set of features that satisfies the device
required features. Elevators not matching the device requirements are
not shown in the device sysfs queue/scheduler file to prevent their use.

The "none" elevator can always be selected as before.

Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Reviewed-by: Ming Lei
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Damien Le Moal
2019-09-06 09:52:33 +0800

03 Sep, 2019

1 commit

cb8acabbe block: mq-deadline: Fix queue restart handling ... Browse Code »

Commit 7211aef86f79 ("block: mq-deadline: Fix write completion
handling") added a call to blk_mq_sched_mark_restart_hctx() in
dd_dispatch_request() to make sure that write request dispatching does
not stall when all target zones are locked. This fix left a subtle race
when a write completion happens during a dispatch execution on another
CPU:

CPU 0: Dispatch CPU1: write completion

dd_dispatch_request()
lock(&dd->lock);
...
lock(&dd->zone_lock); dd_finish_request()
rq = find request lock(&dd->zone_lock);
unlock(&dd->zone_lock);
zone write unlock
unlock(&dd->zone_lock);
...
__blk_mq_free_request
check restart flag (not set)
-> queue not run
...
if (!rq && have writes)
blk_mq_sched_mark_restart_hctx()
unlock(&dd->lock)

Since the dispatch context finishes after the write request completion
handling, marking the queue as needing a restart is not seen from
__blk_mq_free_request() and blk_mq_sched_restart() not executed leading
to the dispatch stall under 100% write workloads.

Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from
dd_dispatch_request() into dd_finish_request() under the zone lock to
ensure full mutual exclusion between write request dispatch selection
and zone unlock on write request completion.

Fixes: 7211aef86f79 ("block: mq-deadline: Fix write completion handling")
Cc: stable@vger.kernel.org
Reported-by: Hans Holmberg
Reviewed-by: Hans Holmberg
Reviewed-by: Christoph Hellwig
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Damien Le Moal
2019-09-03 21:59:51 +0800

15 Jul, 2019

1 commit

898bd37a9 docs: block: convert to ReST ... Browse Code »

Rename the block documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2019-07-15 20:20:27 +0800

21 Jun, 2019

1 commit

14ccb66b3 block: remove the bi_phys_segments field in struct bio ... Browse Code »

We only need the number of segments in the blk-mq submission path.
Remove the field from struct bio, and return it from a variant of
blk_queue_split instead of that it can passed as an argument to
those functions that need the value.

This also means we stop recounting segments except for cloning
and partial segments.

To keep the number of arguments in this how path down remove
pointless struct request_queue arguments from any of the functions
that had it and grew a nr_segs argument.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-06-21 00:29:22 +0800

01 May, 2019

1 commit

3dcf60bcb block: add SPDX tags to block layer files missing licensing information ... Browse Code »

Various block layer files do not have any licensing information at all.
Add SPDX tags for the default kernel GPLv2 license to those.

Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-05-01 06:12:03 +0800

18 Dec, 2018

1 commit

7211aef86 block: mq-deadline: Fix write completion handling ... Browse Code »

For a zoned block device using mq-deadline, if a write request for a
zone is received while another write was already dispatched for the same
zone, dd_dispatch_request() will return NULL and the newly inserted
write request is kept in the scheduler queue waiting for the ongoing
zone write to complete. With this behavior, when no other request has
been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
__blk_mq_free_request() call of blk_mq_sched_restart() to not run the
queue when the already dispatched write request completes. The newly
dispatched request stays stuck in the scheduler queue until eventually
another request is submitted.

This problem does not affect SCSI disk as the SCSI stack handles queue
restart on request completion. However, this problem is can be triggered
the nullblk driver with zoned mode enabled.

Fix this by always requesting a queue restart in dd_dispatch_request()
if no request was dispatched while WRITE requests are queued.

Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support")
Cc:
Signed-off-by: Damien Le Moal

Add missing export of blk_mq_sched_restart()

Signed-off-by: Jens Axboe

Damien Le Moal
2018-12-18 02:19:39 +0800

08 Nov, 2018

2 commits

f9cd4bfe9 block: get rid of MQ scheduler ops union ... Browse Code »

This is a remnant of when we had ops for both SQ and MQ
schedulers. Now it's just MQ, so get rid of the union.

Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:32 +0800
a1ce35fa4 block: remove dead elevator code ... Browse Code »

This removes a bunch of core and elevator related code. On the core
front, we remove anything related to queue running, draining,
initialization, plugging, and congestions. We also kill anything
related to request allocation, merging, retrieval, and completion.

Remove any checking for single queue IO schedulers, as they no
longer exist. This means we can also delete a bunch of code related
to request issue, adding, completion, etc - and all the SQ related
ops and helpers.

Also kill the load_default_modules(), as all that did was provide
for a way to load the default single queue elevator.

Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:32 +0800

25 May, 2018

1 commit

5657a819a block drivers/block: Use octal not symbolic permissions ... Browse Code »

Convert the S_ symbolic permissions to their octal equivalents as
using octal and not symbolic permissions is preferred by many as more
readable.

see: https://lkml.org/lkml/2016/8/2/1945

Done with automated conversion via:
$ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace

Miscellanea:

o Wrapped modified multi-line calls to a single line where appropriate
o Realign modified multi-line calls to open parenthesis

Signed-off-by: Joe Perches
Signed-off-by: Jens Axboe

Joe Perches
2018-05-25 03:38:59 +0800

01 Mar, 2018

1 commit

f3bc78d2d mq-deadline: Make sure to always unlock zones ... Browse Code »

In case of a failed write request (all retries failed) and when using
libata, the SCSI error handler calls scsi_finish_command(). In the
case of blk-mq this means that scsi_mq_done() does not get called,
that blk_mq_complete_request() does not get called and also that the
mq-deadline .completed_request() method is not called. This results in
the target zone of the failed write request being left in a locked
state, preventing that any new write requests are issued to the same
zone.

Fix this by replacing the .completed_request() method with the
.finish_request() method as this method is always called whether or
not a request completes successfully. Since the .finish_request()
method is only called by the blk-mq core if a .prepare_request()
method exists, add a dummy .prepare_request() method.

Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support")
Cc: Hannes Reinecke
Reviewed-by: Ming Lei
Signed-off-by: Damien Le Moal
[ bvanassche: edited patch description ]
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Damien Le Moal
2018-03-01 23:39:24 +0800

07 Jan, 2018

1 commit

ca11f209a mq-deadline: make it clear that __dd_dispatch_request() works on all hw queues ... Browse Code »

Don't pass in the hardware queue to __dd_dispatch_request(), since it
leads the reader to believe that we are returning a request for that
specific hardware queue. That's not how mq-deadline works, the state
for determining which request to serve next is shared across all
hardware queues for a device.

Reviewed-by: Omar Sandoval
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Jens Axboe
2018-01-07 00:23:11 +0800

06 Jan, 2018

2 commits

5700f6917 mq-deadline: Introduce zone locking support ... Browse Code »

Introduce zone write locking to avoid write request reordering with
zoned block devices. This is achieved using a finer selection of the
next request to dispatch:
1) Any non-write request is always allowed to proceed.
2) Any write to a conventional zone is always allowed to proceed.
3) For a write to a sequential zone, the zone lock is first checked.
a) If the zone is not locked, the write is allowed to proceed after
its target zone is locked.
b) If the zone is locked, the write request is skipped and the next
request in the dispatch queue tested (back to step 1).

For a write request that has locked its target zone, the zone is
unlocked either when the request completes with a call to the method
deadline_request_completed() or when the request is requeued using
dd_insert_request().

Requests targeting a locked zone are always left in the scheduler queue
to preserve the lba ordering for write requests. If no write request
can be dispatched, allow reads to be dispatched even if the write batch
is not done.

If the device used is not a zoned block device, or if zoned block device
support is disabled, this patch does not modify mq-deadline behavior.

Signed-off-by: Damien Le Moal
Reviewed-by: Christoph Hellwig
Reviewed-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Damien Le Moal
2018-01-06 00:22:17 +0800
bf09ce56f mq-deadline: Introduce dispatch helpers ... Browse Code »

Avoid directly referencing the next_rq and fifo_list arrays using the
helper functions deadline_next_request() and deadline_fifo_request() to
facilitate changes in the dispatch request selection in
__dd_dispatch_request() for zoned block devices.

Signed-off-by: Damien Le Moal
Reviewed-by: Bart Van Assche
Reviewed-by: Christoph Hellwig
Reviewed-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Damien Le Moal
2018-01-06 00:22:17 +0800

26 Oct, 2017

1 commit

4d740bc9f mq-deadline: add 'deadline' as a name alias ... Browse Code »

The scheduler framework now supports looking up the appropriate
scheduler with the {name,mq} tupple. We can register mq-deadline
with the alias of 'deadline', so that switching to 'deadline'
will do the right thing based on the type of driver attached to
it.

Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2017-10-26 02:36:55 +0800

30 Aug, 2017

1 commit

7de967e76 mq-deadline: Enable auto-loading when built as module ... Browse Code »

The block core requests modules with the "-iosched" name suffix, but
mq-deadline does not have that suffix. Add an alias.

Fixes: 945ffb60c11d ("mq-deadline: add blk-mq adaptation of the deadline ...")
Reviewed-by: Ming Lei
Signed-off-by: Ben Hutchings
Signed-off-by: Jens Axboe

Ben Hutchings
2017-08-30 00:47:23 +0800

29 Aug, 2017

1 commit

235f8da11 block, scheduler: convert xxx_var_store to void ... Browse Code »

The last parameter "count" never be used in xxx_var_store,
convert these functions to void.

Signed-off-by: weiping zhang
Signed-off-by: Jens Axboe

weiping zhang
2017-08-29 00:01:08 +0800

04 May, 2017

1 commit

daaadb3e9 mq-deadline: add debugfs attributes ... Browse Code »

Expose the fifo lists, cached next requests, batching state, and
dispatch list. It'd also be possible to add the sorted lists, but there
aren't already seq_file helpers for rbtrees.

Signed-off-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Omar Sandoval
2017-05-04 22:25:17 +0800

09 Feb, 2017

1 commit

34fe7c054 block: enumify ELEVATOR_*_MERGE ... Browse Code »

Switch these constants to an enum, and make let the compiler ensure that
all callers of blk_try_merge and elv_merge handle all potential values.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-02-09 04:43:06 +0800

04 Feb, 2017

1 commit

e4d750c97 block: free merged request in the caller ... Browse Code »

If we end up doing a request-to-request merge when we have completed
a bio-to-request merge, we free the request from deep down in that
path. For blk-mq-sched, the merge path has to hold the appropriate
lock, but we don't need it for freeing the request. And in fact
holding the lock is problematic, since we are now calling the
mq sched put_rq_private() hook with the lock held. Other call paths
do not hold this lock.

Fix this inconsistency by ensuring that the caller frees a merged
request. Then we can do it outside of the lock, making it both more
efficient and fixing the blk-mq-sched problem of invoking parts of
the scheduler with an unknown lock state.

Reported-by: Paolo Valente
Signed-off-by: Jens Axboe
Reviewed-by: Omar Sandoval

Jens Axboe
2017-02-04 00:48:28 +0800

03 Feb, 2017

1 commit

0cacba6cf blk-mq-sched: bypass the scheduler for flushes entirely ... Browse Code »

There's a weird inconsistency that flushes are mostly hidden from the
scheduler, but it needs to be aware of them in ->insert_requests().
Instead of having every scheduler call blk_mq_sched_bypass_insert(),
let's do it in the common framework.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2017-02-03 07:57:56 +0800

01 Feb, 2017

1 commit

57292b58d block: introduce blk_rq_is_passthrough ... Browse Code »

This can be used to check for fs vs non-fs requests and basically
removes all knowledge of BLOCK_PC specific from the block layer,
as well as preparing for removing the cmd_type field in struct request.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-02-01 05:00:34 +0800

27 Jan, 2017

1 commit

c13660a08 blk-mq-sched: change ->dispatch_requests() to ->dispatch_request() ... Browse Code »

When we invoke dispatch_requests(), the scheduler empties everything
into the passed in list. This isn't always a good thing, since it
means that we remove items that we could have potentially merged
with.

Change the function to dispatch single requests at the time. If
we do that, we can backoff exactly at the point where the device
can't consume more IO, and leave the rest with the scheduler for
better merging and future dispatch decision making.

Signed-off-by: Jens Axboe
Reviewed-by: Omar Sandoval
Tested-by: Hannes Reinecke

Jens Axboe
2017-01-27 23:20:35 +0800

18 Jan, 2017

1 commit

945ffb60c mq-deadline: add blk-mq adaptation of the deadline IO scheduler ... Browse Code »

This is basically identical to deadline-iosched, except it registers
as a MQ capable scheduler. This is still a single queue design.

Signed-off-by: Jens Axboe
Reviewed-by: Bart Van Assche
Reviewed-by: Omar Sandoval

Jens Axboe
2017-01-18 01:04:25 +0800