Eric Lee / smarc-fsl-linux-kernel

01 Feb, 2019

1 commit

8ccdf4a37 blk-mq: save queue mapping result into ctx directly ... Browse Code »

Currently, the queue mapping result is saved in a two-dimensional
array. In the hot path, to get a hctx, we need do following:

q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]]

This isn't very efficient. We could save the queue mapping result into
ctx directly with different hctx type, like,

ctx->hctxs[type]

Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe

Jianchao Wang
2019-02-01 23:33:04 +0800

27 Nov, 2018

1 commit

5f0ed774e block: sum requests in the plug structure ... Browse Code »

This isn't exactly the same as the previous count, as it includes
requests for all devices. But that really doesn't matter, if we have
more than the threshold (16) queued up, flush it. It's not worth it
to have an expensive list loop for this.

Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-27 01:35:22 +0800

20 Nov, 2018

1 commit

e2b3fa5af block: Remove bio->bi_ioc ... Browse Code »

bio->bi_ioc is never set so always NULL. Remove references to it in
bio_disassociate_task() and in rq_ioc() and delete this field from
struct bio. With this change, rq_ioc() always returns
current->io_context without the need for a bio argument. Further
simplify the code and make it more readable by also removing this
helper, which also allows to simplify blk_mq_sched_assign_ioc() by
removing its bio argument.

Reviewed-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Reviewed-by: Adam Manzanares
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Damien Le Moal
2018-11-20 10:03:44 +0800

19 Nov, 2018

1 commit

a78b03bc7 Merge tag 'v4.20-rc3' into for-4.21/block ... Browse Code »

Merge in -rc3 to resolve a few conflicts, but also to get a few
important fixes that have gone into mainline since the block
4.21 branch was forked off (most notably the SCSI queue issue,
which is both a conflict AND needed fix).

Signed-off-by: Jens Axboe

Jens Axboe
2018-11-19 06:46:03 +0800

16 Nov, 2018

3 commits

373e4af34 block: remove queue_lockdep_assert_held ... Browse Code »

The only remaining user unconditionally drops and reacquires the lock,
which means we really don't need any additional (conditional) annotation.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:21 +0800
57d74df90 block: use atomic bitops for ->queue_flags ... Browse Code »

->queue_flags is generally not set or cleared in the fast path, and also
generally set or cleared one flag at a time. Make use of the normal
atomic bitops for it so that we don't need to take the queue_lock,
which is otherwise mostly unused in the core block layer now.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:19 +0800
079076b34 block: remove deadline __deadline manipulation helpers ... Browse Code »

No users left since the removal of the legacy request interface, we can
remove all the magic bit stealing now and make it a normal field.

But use WRITE_ONCE/READ_ONCE on the new deadline field, given that we
don't seem to have any mechanism to guarantee a new value actually
gets seen by other threads.

Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-16 03:13:16 +0800

10 Nov, 2018

1 commit

9d037ad70 block: remove req->timeout_list ... Browse Code »

Unused now that the legacy request path is gone.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-11-10 02:44:10 +0800

09 Nov, 2018

1 commit

1adfc5e41 block: make sure discard bio is aligned with logical block size ... Browse Code »

Obviously the created discard bio has to be aligned with logical block size.

This patch introduces the helper of bio_allowed_max_sectors() for
this purpose.

Cc: stable@vger.kernel.org
Cc: Mike Snitzer
Cc: Christoph Hellwig
Cc: Xiao Ni
Cc: Mariusz Dabrowski
Fixes: 744889b7cbb56a6 ("block: don't deal with discard limit in blkdev_issue_discard()")
Fixes: a22c4d7e34402cc ("block: re-add discard_granularity and alignment checks")
Reported-by: Rui Salvaterra
Tested-by: Rui Salvaterra
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2018-11-09 21:23:14 +0800

08 Nov, 2018

7 commits

f9afca4d3 blk-mq: pass in request/bio flags to queue mapping ... Browse Code »

Prep patch for being able to place request based not just on
CPU location, but also on the type of request.

Reviewed-by: Hannes Reinecke
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:44:59 +0800
820efc62f block: kill request slab cache ... Browse Code »

Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:33 +0800
db6d99523 block: remove request_list code ... Browse Code »

It's now dead code, nobody uses it.

Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:33 +0800
4316b79e4 block: kill legacy parts of timeout handling ... Browse Code »

The only user of legacy timing now is BSG, which is invoked
from the mq timeout handler. Kill the legacy code, and rename
the q->rq_timed_out_fn to q->bsg_job_timeout_fn.

Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:33 +0800
a1ce35fa4 block: remove dead elevator code ... Browse Code »

This removes a bunch of core and elevator related code. On the core
front, we remove anything related to queue running, draining,
initialization, plugging, and congestions. We also kill anything
related to request allocation, merging, retrieval, and completion.

Remove any checking for single queue IO schedulers, as they no
longer exist. This means we can also delete a bunch of code related
to request issue, adding, completion, etc - and all the SQ related
ops and helpers.

Also kill the load_default_modules(), as all that did was provide
for a way to load the default single queue elevator.

Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:32 +0800
7e992f847 block: remove non mq parts from the flush code ... Browse Code »

Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-11-08 04:42:32 +0800
df376b2ed block: respect virtual boundary mask in bvecs ... Browse Code »

With drivers that are settting a virtual boundary constrain, we are
seeing a lot of bio splitting and smaller I/Os being submitted to the
driver.

This happens because the bio gap detection code does not account cases
where PAGE_SIZE - 1 is bigger than queue_virt_boundary() and thus will
split the bio unnecessarily.

Cc: Jan Kara
Cc: Bart Van Assche
Cc: Ming Lei
Reviewed-by: Sagi Grimberg
Signed-off-by: Johannes Thumshirn
Acked-by: Keith Busch
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Johannes Thumshirn
2018-11-08 04:04:22 +0800

26 Oct, 2018

2 commits

bf5054569 block: Introduce blk_revalidate_disk_zones() ... Browse Code »

Drivers exposing zoned block devices have to initialize and maintain
correctness (i.e. revalidate) of the device zone bitmaps attached to
the device request queue (seq_zones_bitmap and seq_zones_wlock).

To simplify coding this, introduce a generic helper function
blk_revalidate_disk_zones() suitable for most (and likely all) cases.
This new function always update the seq_zones_bitmap and seq_zones_wlock
bitmaps as well as the queue nr_zones field when called for a disk
using a request based queue. For a disk using a BIO based queue, only
the number of zones is updated since these queues do not have
schedulers and so do not need the zone bitmaps.

With this change, the zone bitmap initialization code in sd_zbc.c can be
replaced with a call to this function in sd_zbc_read_zones(), which is
called from the disk revalidate block operation method.

A call to blk_revalidate_disk_zones() is also added to the null_blk
driver for devices created with the zoned mode enabled.

Finally, to ensure that zoned devices created with dm-linear or
dm-flakey expose the correct number of zones through sysfs, a call to
blk_revalidate_disk_zones() is added to dm_table_set_restrictions().

The zone bitmaps allocated and initialized with
blk_revalidate_disk_zones() are freed automatically from
__blk_release_queue() using the block internal function
blk_queue_free_zone_bitmaps().

Reviewed-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Reviewed-by: Martin K. Petersen
Reviewed-by: Mike Snitzer
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Damien Le Moal
2018-10-26 01:17:40 +0800
a2d6b3a2d block: Improve zone reset execution ... Browse Code »

There is no need to synchronously execute all REQ_OP_ZONE_RESET BIOs
necessary to reset a range of zones. Similarly to what is done for
discard BIOs in blk-lib.c, all zone reset BIOs can be chained and
executed asynchronously and a synchronous call done only for the last
BIO of the chain.

Modify blkdev_reset_zones() to operate similarly to
blkdev_issue_discard() using the next_bio() helper for chaining BIOs. To
avoid code duplication of that function in blk_zoned.c, rename
next_bio() into blk_next_bio() and declare it as a block internal
function in blk.h.

Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Damien Le Moal
2018-10-26 01:17:40 +0800

14 Oct, 2018

1 commit

5b202853f blk-mq: change gfp flags to GFP_NOIO in blk_mq_realloc_hw_ctxs ... Browse Code »

blk_mq_realloc_hw_ctxs could be invoked during update hw queues.
At the momemt, IO is blocked. Change the gfp flags from GFP_KERNEL
to GFP_NOIO to avoid forever hang during memory allocation in
blk_mq_realloc_hw_ctxs.

Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe

Jianchao Wang
2018-10-14 05:42:01 +0800

26 Sep, 2018

1 commit

c39ae60df block: remove ARCH_BIOVEC_PHYS_MERGEABLE ... Browse Code »

Take the Xen check into the core code instead of delegating it to
the architectures.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-09-26 22:45:11 +0800

25 Sep, 2018

5 commits

6e768461c block: remove bvec_to_phys ... Browse Code »

We only use it in biovec_phys_mergeable and a m68k paravirt driver,
so just opencode it there. Also remove the pointless unsigned long cast
for the offset in the opencoded instances.

Signed-off-by: Christoph Hellwig
Reviewed-by: Geert Uytterhoeven
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-09-25 02:33:59 +0800
3dccdae54 block: merge BIOVEC_SEG_BOUNDARY into biovec_phys_mergeable ... Browse Code »

These two checks should always be performed together, so merge them into
a single helper.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-09-25 02:33:57 +0800
6a9f5f240 block: simplify BIOVEC_PHYS_MERGEABLE ... Browse Code »

Turn the macro into an inline, move it to blk.h and simplify the
arch hooks a bit.

Also rename the function to biovec_phys_mergeable as there is no need
to shout.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-09-25 02:33:54 +0800
27ca1d4ed block: move req_gap_back_merge to blk.h ... Browse Code »

No need to expose these helpers outside the block layer.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-09-25 02:33:53 +0800
43b729bfe block: move integrity_req_gap_{back,front}_merge to blk.h ... Browse Code »

No need to expose these to drivers.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-09-25 02:33:50 +0800

21 Aug, 2018

1 commit

d48ece209 blk-mq: init hctx sched after update ctx and hctx mapping ... Browse Code »

Currently, when update nr_hw_queues, IO scheduler's init_hctx will
be invoked before the mapping between ctx and hctx is adapted
correctly by blk_mq_map_swqueue. The IO scheduler init_hctx (kyber)
may depend on this mapping and get wrong result and panic finally.
A simply way to fix this is that switch the IO scheduler to 'none'
before update the nr_hw_queues, and then switch it back after
update nr_hw_queues. blk_mq_sched_init_/exit_hctx are removed due
to nobody use them any more.

Signed-off-by: Jianchao Wang
Signed-off-by: Jens Axboe

Jianchao Wang
2018-08-21 23:02:55 +0800

17 Aug, 2018

1 commit

599d067dd block: change return type to bool ... Browse Code »

Because blk_do_io_stat() only does a judgement about the request
contributes to IO statistics, it better changes return type to bool.

Signed-off-by: Chengguang Xu
Signed-off-by: Jens Axboe

Chengguang Xu
2018-08-17 03:44:17 +0800

09 Aug, 2018

1 commit

4cf6324b1 block: Introduce blk_exit_queue() ... Browse Code »

This patch does not change any functionality.

Signed-off-by: Bart Van Assche
Reviewed-by: Johannes Thumshirn
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Omar Sandoval
Cc: Alexandru Moise
Cc: Joseph Qi
Cc:
Signed-off-by: Jens Axboe

Bart Van Assche
2018-08-09 23:12:59 +0800

09 Jul, 2018

1 commit

d70675121 block: introduce blk-iolatency io controller ... Browse Code »

Current IO controllers for the block layer are less than ideal for our
use case. The io.max controller is great at hard limiting, but it is
not work conserving. This patch introduces io.latency. You provide a
latency target for your group and we monitor the io in short windows to
make sure we are not exceeding those latency targets. This makes use of
the rq-qos infrastructure and works much like the wbt stuff. There are
a few differences from wbt

- It's bio based, so the latency covers the whole block layer in addition to
the actual io.
- We will throttle all IO types that comes in here if we need to.
- We use the mean latency over the 100ms window. This is because writes can
be particularly fast, which could give us a false sense of the impact of
other workloads on our protected workload.
- By default there's no throttling, we set the queue_depth to INT_MAX so that
we can have as many outstanding bio's as we're allowed to. Only at
throttle time do we pay attention to the actual queue depth.
- We backcharge cgroups for root cg issued IO and induce artificial
delays in order to deal with cases like metadata only or swap heavy
workloads.

In testing this has worked out relatively well. Protected workloads
will throttle noisy workloads down to 1 io at time if they are doing
normal IO on their own, or induce up to a 1 second delay per syscall if
they are doing a lot of root issued IO (metadata/swap IO).

Our testing has revolved mostly around our production web servers where
we have hhvm (the web server application) in a protected group and
everything else in another group. We see slightly higher requests per
second (RPS) on the test tier vs the control tier, and much more stable
RPS across all machines in the test tier vs the control tier.

Another test we run is a slow memory allocator in the unprotected group.
Before this would eventually push us into swap and cause the whole box
to die and not recover at all. With these patches we see slight RPS
drops (usually 10-15%) before the memory consumer is properly killed and
things recover within seconds.

Signed-off-by: Josef Bacik
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Josef Bacik
2018-07-09 23:07:54 +0800

01 Jun, 2018

3 commits

131d08e12 block: split the blk-mq case from elevator_init ... Browse Code »

There is almost no shared logic, which leads to a very confusing code
flow.

Signed-off-by: Christoph Hellwig
Reviewed-by: Damien Le Moal
Tested-by: Damien Le Moal
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-06-01 21:38:21 +0800
ddb725325 block: remove the always unused name argument to elevator_init ... Browse Code »

Reported-by: Damien Le Moal
Signed-off-by: Christoph Hellwig
Reviewed-by: Damien Le Moal
Tested-by: Damien Le Moal
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-06-01 21:38:17 +0800
a8a275c9c block: unexport elevator_init/exit ... Browse Code »

These are only used by the block core. Also move the declarations to
block/blk.h.

Reported-by: Damien Le Moal
Signed-off-by: Christoph Hellwig
Reviewed-by: Damien Le Moal
Tested-by: Damien Le Moal
Signed-off-by: Jens Axboe

Christoph Hellwig
2018-06-01 21:38:16 +0800

09 May, 2018

1 commit

522a77756 block: consolidate struct request timestamp fields ... Browse Code »

Currently, struct request has four timestamp fields:

- A start time, set at get_request time, in jiffies, used for iostats
- An I/O start time, set at start_request time, in ktime nanoseconds,
used for blk-stats (i.e., wbt, kyber, hybrid polling)
- Another start time and another I/O start time, used for cfq and bfq

These can all be consolidated into one start time and one I/O start
time, both in ktime nanoseconds, shaving off up to 16 bytes from struct
request depending on the kernel config.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2018-05-09 22:33:09 +0800

09 Mar, 2018

1 commit

8a0ac14b8 block: Move the queue_flag_*() functions from a public into a private header file ... Browse Code »

This patch helps to avoid that new code gets introduced in block drivers
that manipulates queue flags without holding the queue lock when that
lock should be held.

Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Ming Lei
Reviewed-by: Johannes Thumshirn
Reviewed-by: Martin K. Petersen
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2018-03-09 05:13:48 +0800

30 Jan, 2018

1 commit

0a4b6e2f8 Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"This is the main pull request for block IO related changes for the
4.16 kernel. Nothing major in this pull request, but a good amount of
improvements and fixes all over the map. This contains:

- BFQ improvements, fixes, and cleanups from Angelo, Chiara, and
Paolo.

- Support for SMR zones for deadline and mq-deadline from Damien and
Christoph.

- Set of fixes for bcache by way of Michael Lyle, including fixes
from himself, Kent, Rui, Tang, and Coly.

- Series from Matias for lightnvm with fixes from Hans Holmberg,
Javier, and Matias. Mostly centered around pblk, and the removing
rrpc 1.2 in preparation for supporting 2.0.

- A couple of NVMe pull requests from Christoph. Nothing major in
here, just fixes and cleanups, and support for command tracing from
Johannes.

- Support for blk-throttle for tracking reads and writes separately.
From Joseph Qi. A few cleanups/fixes also for blk-throttle from
Weiping.

- Series from Mike Snitzer that enables dm to register its queue more
logically, something that's alwways been problematic on dm since
it's a stacked device.

- Series from Ming cleaning up some of the bio accessor use, in
preparation for supporting multipage bvecs.

- Various fixes from Ming closing up holes around queue mapping and
quiescing.

- BSD partition fix from Richard Narron, fixing a problem where we
can't mount newer (10/11) FreeBSD partitions.

- Series from Tejun reworking blk-mq timeout handling. The previous
scheme relied on atomic bits, but it had races where we would think
a request had timed out if it to reused at the wrong time.

- null_blk now supports faking timeouts, to enable us to better
exercise and test that functionality separately. From me.

- Kill the separate atomic poll bit in the request struct. After
this, we don't use the atomic bits on blk-mq anymore at all. From
me.

- sgl_alloc/free helpers from Bart.

- Heavily contended tag case scalability improvement from me.

- Various little fixes and cleanups from Arnd, Bart, Corentin,
Douglas, Eryu, Goldwyn, and myself"

* 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits)
block: remove smart1,2.h
nvme: add tracepoint for nvme_complete_rq
nvme: add tracepoint for nvme_setup_cmd
nvme-pci: introduce RECONNECTING state to mark initializing procedure
nvme-rdma: remove redundant boolean for inline_data
nvme: don't free uuid pointer before printing it
nvme-pci: Suspend queues after deleting them
bsg: use pr_debug instead of hand crafted macros
blk-mq-debugfs: don't allow write on attributes with seq_operations set
nvme-pci: Fix queue double allocations
block: Set BIO_TRACE_COMPLETION on new bio during split
blk-throttle: use queue_is_rq_based
block: Remove kblockd_schedule_delayed_work{,_on}()
blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays
blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
lib/scatterlist: Fix chaining support in sgl_alloc_order()
blk-throttle: track read and write request individually
block: add bdev_read_only() checks to common helpers
block: fail op_is_write() requests to read-only partitions
blk-throttle: export io_serviced_recursive, io_service_bytes_recursive
...

Linus Torvalds
2018-01-30 03:51:49 +0800

19 Jan, 2018

1 commit

83d016ac8 block: Unexport elv_register_queue() and elv_unregister_queue() ... Browse Code »

These two functions are only called from inside the block layer so
unexport them.

Reviewed-by: Christoph Hellwig
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe

Bart Van Assche
2018-01-19 03:54:41 +0800

11 Jan, 2018

3 commits

e14575b3d block: convert REQ_ATOM_COMPLETE to stealing rq->__deadline bit ... Browse Code »

We only have one atomic flag left. Instead of using an entire
unsigned long for that, steal the bottom bit of the deadline
field that we already reserved.

Remove ->atomic_flags, since it's now unused.

Reviewed-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-01-11 02:47:53 +0800
0a72e7f44 block: add accessors for setting/querying request deadline ... Browse Code »

We reduce the resolution of request expiry, but since we're already
using jiffies for this where resolution depends on the kernel
configuration and since the timeout resolution is coarse anyway,
that should be fine.

Reviewed-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-01-11 02:47:47 +0800
76a86f9d0 block: remove REQ_ATOM_POLL_SLEPT ... Browse Code »

We don't need this to be an atomic flag, it can be a regular
flag. We either end up on the same CPU for the polling, in which
case the state is sane, or we did the sleep which would imply
the needed barrier to ensure we see the right state.

Reviewed-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe

Jens Axboe
2018-01-11 02:47:43 +0800

10 Jan, 2018

1 commit

5a61c3639 blk-mq: remove REQ_ATOM_STARTED ... Browse Code »

After the recent updates to use generation number and state based
synchronization, we can easily replace REQ_ATOM_STARTED usages by
adding an extra state to distinguish completed but not yet freed
state.

Add MQ_RQ_COMPLETE and replace REQ_ATOM_STARTED usages with
blk_mq_rq_state() tests. REQ_ATOM_STARTED no longer has any users
left and is removed.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2018-01-10 00:31:15 +0800