08 Sep, 2017
1 commit
-
Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:- Fix for a registration race in loop, from Anton Volkov.
- Overflow complaint fix from Arnd for DAC960.
- Series of drbd changes from the usual suspects.
- Conversion of the stec/skd driver to blk-mq. From Bart.
- A few BFQ improvements/fixes from Paolo.
- CFQ improvement from Ritesh, allowing idling for group idle.
- A few fixes found by Dan's smatch, courtesy of Dan.
- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.- A few nbd fixes from Josef.
- Support for cgroup info in blktrace, from Shaohua.
- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.- Various corner cases and error handling fixes from Weiping Zhang.
- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...
25 Aug, 2017
1 commit
-
The symbolic constants QUEUE_FLAG_SCSI_PASSTHROUGH, QUEUE_FLAG_QUIESCED
and REQ_NOWAIT are missing from blk-mq-debugfs.c. Add these to
blk-mq-debugfs.c such that these appear as names in debugfs instead of
as numbers.Reviewed-by: Omar Sandoval
Signed-off-by: Bart Van Assche
Cc: Hannes Reinecke
Signed-off-by: Jens Axboe
18 Aug, 2017
1 commit
-
This was detected by sparse.
Signed-off-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe
10 Aug, 2017
1 commit
-
We haven't used these in years, but somehow the definitions still
remained. Kill them, and renumber the QUEUE_FLAG_ space. We had
a hole in the beginning of the space, too.Signed-off-by: Jens Axboe
28 Jun, 2017
1 commit
-
Useful to verify that things are working the way they should.
Reading the file will return number of kb written with each
write hint. Writing the file will reset the statistics. No care
is taken to ensure that we don't race on updates.Drivers will write to q->write_hints[] if they handle a given
write hint.Reviewed-by: Andreas Dilger
Reviewed-by: Martin K. Petersen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
02 Jun, 2017
4 commits
-
Running a queue causes the block layer to examine the per-CPU and
hw queues but not the requeue list. Hence add a 'kick' operation
that also examines the requeue list.Signed-off-by: Bart Van Assche
Reviewed-by: Ming Lei
Reviewed-by: Eduardo Valentin
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Omar Sandoval
Signed-off-by: Jens Axboe -
Requests that got stuck in a block driver are neither on
blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
visible in debugfs through the "busy" attribute.Signed-off-by: Bart Van Assche
Reviewed-by: Eduardo Valentin
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Omar Sandoval
Cc: Ming Lei
Signed-off-by: Jens Axboe -
When verifying whether or not a blk-mq driver forgot to kick the
requeue list after having requeued a request it is important to
be able to verify the contents of the requeue list. Hence export
that list through debugfs.Signed-off-by: Bart Van Assche
Reviewed-by: Hannes Reinecke
Reviewed-by: Ming Lei
Reviewed-by: Eduardo Valentin
Cc: Christoph Hellwig
Cc: Omar Sandoval
Signed-off-by: Jens Axboe -
When analyzing e.g. queue lockups it is important to know whether
or not a request has already been started. Hence also show the
atomic request flags.Signed-off-by: Bart Van Assche
Reviewed-by: Hannes Reinecke
Reviewed-by: Ming Lei
Reviewed-by: Eduardo Valentin
Cc: Omar Sandoval
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe
04 May, 2017
12 commits
-
Expose the fifo lists, cached next requests, batching state, and
dispatch list. It'd also be possible to add the sorted lists, but there
aren't already seq_file helpers for rbtrees.Signed-off-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
Expose the domain token pools, asynchronous sbitmap depth, domain
request lists, and batching state.Signed-off-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
This provides the infrastructure for schedulers to expose their internal
state through debugfs. We add a list of queue attributes and a list of
hctx attributes to struct elevator_type and wire them up when switching
schedulers.Signed-off-by: Omar Sandoval
Reviewed-by: Hannes ReineckeAdd missing seq_file.h header in blk-mq-debugfs.h
Signed-off-by: Jens Axboe
-
Originally, I tied debugfs registration/unregistration together with
sysfs. There's no reason to do this, and it's getting in the way of
letting schedulers define their own debugfs attributes. Instead, tie the
debugfs registration to the lifetime of the structures themselves.The saner lifetimes mean we can also get rid of the extra mq directory
and move everything one level up. I.e., nvme0n1/mq/hctx0/tags is now
just nvme0n1/hctx0/tags.Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe -
Preparation for adding more declarations.
Signed-off-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
In commit e869b5462f83 ("blk-mq: Unregister debugfs attributes
earlier"), we shuffled the debugfs cleanup around so that the "state"
attribute was removed before we freed the blk-mq data structures.
However, later changes are going to undo that, so we need to explicitly
disallow running a dead queue.[Omar: rebased and updated commit message]
Signed-off-by: Omar Sandoval
Signed-off-by: Bart Van Assche
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
A large part of blk-mq-debugfs.c is file_operations and seq_file
boilerplate. This sucks as is but will suck even more when schedulers
can define their own debugfs entries. Factor it all out into a single
blk_mq_debugfs_fops which multiplexes as needed. We store the
request_queue, blk_mq_hw_ctx, or blk_mq_ctx in the parent directory
dentry, which is kind of hacky, but it works.Signed-off-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
It's not clear what these numbered directories represent unless you
consult the code. We're about to get rid of the intermediate "mq"
directory, so these would be even more confusing without that context.Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe -
Slightly more readable, plus we also strip leading spaces.
Signed-off-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
blk_queue_flags_store() currently truncates and returns a short write if
the operation being written is too long. This can give us weird results,
like here:$ echo "run bar"
echo: write error: invalid argument
$ dmesg
[ 1103.075435] blk_queue_flags_store: unsupported operation bar. Use either 'run' or 'start'Instead, return an error if the user does this. While we're here, make
the argument names consistent with everywhere else in this file.Signed-off-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
Make sure the spelled out flag names match the definition. This also
adds a missing hctx state, BLK_MQ_S_START_ON_RUN, and a missing
cmd_flag, __REQ_NOUNMAP.Signed-off-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
This reads more naturally than spaces.
Signed-off-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe
27 Apr, 2017
6 commits
-
This new callback function will be used in the next patch to show
more information about SCSI requests.Signed-off-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Cc: Hannes Reinecke
Signed-off-by: Jens Axboe -
Show the operation name, .cmd_flags and .rq_flags as names instead
of numbers.Signed-off-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
This patch does not change any functionality but makes it possible
to produce a single line of output with multiple flag-to-name
translations.Signed-off-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
Move the "state" attribute from the top level to the "mq" directory
as requested by Omar.Signed-off-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
Since the blk_mq_debugfs_*register_hctxs() functions register and
unregister all attributes under the "mq" directory, rename these
into blk_mq_debugfs_*register_mq().Signed-off-by: Bart Van Assche
Reviewed-by: Hannes Reinecke
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe -
A later patch will move the call of blk_mq_debugfs_register() to
a function to which the queue name is not passed as an argument.
To avoid having to add a 'name' argument to multiple callers, let
blk_mq_debugfs_register() look up the queue name.Signed-off-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe
21 Apr, 2017
1 commit
-
Fixes an issue where the size of the poll_stat array in request_queue
does not match the size expected by the new size based bucketing for
IO completion polling.Fixes: 720b8ccc4500 ("blk-mq: Add a polling specific stats function")
Signed-off-by: Stephen Bates
Signed-off-by: Jens Axboe
11 Apr, 2017
2 commits
-
Instead of showing the hctx state and flags as numbers, show the
names of the flags.Signed-off-by: Bart Van Assche
Cc: Omar Sandoval
Cc: Hannes Reinecke
Signed-off-by: Jens Axboe -
Make it possible to check whether or not a block layer queue has
been stopped. Make it possible to start and to run a blk-mq queue
from user space.Signed-off-by: Bart Van Assche
Cc: Omar Sandoval
Cc: Hannes Reinecke
Signed-off-by: Jens Axboe
22 Mar, 2017
2 commits
-
Currently, statistics are gathered in ~0.13s windows, and users grab the
statistics whenever they need them. This is not ideal for both in-tree
users:1. Writeback throttling wants its own dynamically sized window of
statistics. Since the blk-stats statistics are reset after every
window and the wbt windows don't line up with the blk-stats windows,
wbt doesn't see every I/O.
2. Polling currently grabs the statistics on every I/O. Again, depending
on how the window lines up, we may miss some I/Os. It's also
unnecessary overhead to get the statistics on every I/O; the hybrid
polling heuristic would be just as happy with the statistics from the
previous full window.This reworks the blk-stats infrastructure to be callback-based: users
register a callback that they want called at a given time with all of
the statistics from the window during which the callback was active.
Users can dynamically bucketize the statistics. wbt and polling both
currently use read vs. write, but polling can be extended to further
subdivide based on request size.The callbacks are kept on an RCU list, and each callback has percpu
stats buffers. There will only be a few users, so the overhead on the
I/O completion side is low. The stats flushing is also simplified
considerably: since the timer function is responsible for clearing the
statistics, we don't have to worry about stale statistics.wbt is a trivial conversion. After the conversion, the windowing problem
mentioned above is fixed.For polling, we register an extra callback that caches the previous
window's statistics in the struct request_queue for the hybrid polling
heuristic to use.Since we no longer have a single stats buffer for the request queue,
this also removes the sysfs and debugfs stats entries. To replace those,
we add a debugfs entry for the poll statistics.Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe -
The stats buckets will become generic soon, so make the existing users
use the common READ and WRITE definitions instead of one internal to
blk-stat.Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe
03 Feb, 2017
1 commit
-
When I added the blk-mq debugging information to debugfs, I didn't
notice that blktrace also creates a "block" directory in debugfs. Make
them use the same dentry, now created in the core block code. Based on a
patch from Jens.Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe
02 Feb, 2017
4 commits
-
Replace the two debugfs_create_file() loops by a call to the new
debugfs_create_files() function. Add an empty element at the end
of the two attribute arrays such that the array size does not have
to be passed to debugfs_create_files().Signed-off-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe -
Allow users to interrupt show operations instead of making a user
space process unkillable if ownership of q->sysfs_lock cannot be
obtained.Signed-off-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe -
Avoid that sparse reports the following complaints:
block/elevator.c:541:29: warning: incorrect type in assignment (different base types)
block/elevator.c:541:29: expected bool [unsigned] [usertype] next_sorted
block/elevator.c:541:29: got restricted req_flags_tblock/blk-mq-debugfs.c:92:54: warning: cast from restricted req_flags_t
Signed-off-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe -
This patch avoids that sparse complains about lock imbalances.
Signed-off-by: Bart Van Assche
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe
01 Feb, 2017
1 commit
-
Instead of keeping two levels of indirection for requests types, fold it
all into the operations. The little caveat here is that previously
cmd_type only applied to struct request, while the request and bio op
fields were set to plain REQ_OP_READ/WRITE even for passthrough
operations.Instead this patch adds new REQ_OP_* for SCSI passthrough and driver
private requests, althought it has to add two for each so that we
can communicate the data in/out nature of the request.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
27 Jan, 2017
2 commits
-
These counters aren't as out-of-place in sysfs as the other stuff, but
debugfs is a slightly better home for them.Reviewed-by: Hannes Reinecke
Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe -
These statistics _might_ be useful to userspace, but it's better not to
commit to an ABI for these yet. Also, the dispatched file in sysfs
couldn't be cleared, so make it clearable like the others in debugfs.Reviewed-by: Hannes Reinecke
Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe