26 Apr, 2018

1 commit

  • [ Upstream commit 6b136a24b05c81a24e0b648a4bd938bcd0c4f69e ]

    Attributes that only implement .seq_ops are read-only, any write to
    them should be rejected. But currently kernel would crash when
    writing to such debugfs entries, e.g.

    chmod +w /sys/kernel/debug/block//requeue_list
    echo 0 > /sys/kernel/debug/block//requeue_list
    chmod -w /sys/kernel/debug/block//requeue_list

    Fix it by returning -EPERM in blk_mq_debugfs_write() when writing to
    such attributes.

    Cc: Ming Lei
    Signed-off-by: Eryu Guan
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Eryu Guan
     

04 Oct, 2017

1 commit

  • In blk_mq_debugfs_register(), I remembered to set up the per-hctx sched
    directories if a default scheduler was already configured by
    blk_mq_sched_init() from blk_mq_init_allocated_queue(), but I didn't do
    the same for the device-wide sched directory. Fix it.

    Fixes: d332ce091813 ("blk-mq-debugfs: allow schedulers to register debugfs attributes")
    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

08 Sep, 2017

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the first pull request for 4.14, containing most of the code
    changes. It's a quiet series this round, which I think we needed after
    the churn of the last few series. This contains:

    - Fix for a registration race in loop, from Anton Volkov.

    - Overflow complaint fix from Arnd for DAC960.

    - Series of drbd changes from the usual suspects.

    - Conversion of the stec/skd driver to blk-mq. From Bart.

    - A few BFQ improvements/fixes from Paolo.

    - CFQ improvement from Ritesh, allowing idling for group idle.

    - A few fixes found by Dan's smatch, courtesy of Dan.

    - A warning fixup for a race between changing the IO scheduler and
    device remova. From David Jeffery.

    - A few nbd fixes from Josef.

    - Support for cgroup info in blktrace, from Shaohua.

    - Also from Shaohua, new features in the null_blk driver to allow it
    to actually hold data, among other things.

    - Various corner cases and error handling fixes from Weiping Zhang.

    - Improvements to the IO stats tracking for blk-mq from me. Can
    drastically improve performance for fast devices and/or big
    machines.

    - Series from Christoph removing bi_bdev as being needed for IO
    submission, in preparation for nvme multipathing code.

    - Series from Bart, including various cleanups and fixes for switch
    fall through case complaints"

    * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
    kernfs: checking for IS_ERR() instead of NULL
    drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
    drbd: Fix allyesconfig build, fix recent commit
    drbd: switch from kmalloc() to kmalloc_array()
    drbd: abort drbd_start_resync if there is no connection
    drbd: move global variables to drbd namespace and make some static
    drbd: rename "usermode_helper" to "drbd_usermode_helper"
    drbd: fix race between handshake and admin disconnect/down
    drbd: fix potential deadlock when trying to detach during handshake
    drbd: A single dot should be put into a sequence.
    drbd: fix rmmod cleanup, remove _all_ debugfs entries
    drbd: Use setup_timer() instead of init_timer() to simplify the code.
    drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
    drbd: new disk-option disable-write-same
    drbd: Fix resource role for newly created resources in events2
    drbd: mark symbols static where possible
    drbd: Send P_NEG_ACK upon write error in protocol != C
    drbd: add explicit plugging when submitting batches
    drbd: change list_for_each_safe to while(list_first_entry_or_null)
    drbd: introduce drbd_recv_header_maybe_unplug
    ...

    Linus Torvalds
     

25 Aug, 2017

1 commit

  • The symbolic constants QUEUE_FLAG_SCSI_PASSTHROUGH, QUEUE_FLAG_QUIESCED
    and REQ_NOWAIT are missing from blk-mq-debugfs.c. Add these to
    blk-mq-debugfs.c such that these appear as names in debugfs instead of
    as numbers.

    Reviewed-by: Omar Sandoval
    Signed-off-by: Bart Van Assche
    Cc: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

18 Aug, 2017

1 commit


10 Aug, 2017

1 commit


28 Jun, 2017

1 commit

  • Useful to verify that things are working the way they should.
    Reading the file will return number of kb written with each
    write hint. Writing the file will reset the statistics. No care
    is taken to ensure that we don't race on updates.

    Drivers will write to q->write_hints[] if they handle a given
    write hint.

    Reviewed-by: Andreas Dilger
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     

02 Jun, 2017

4 commits

  • Running a queue causes the block layer to examine the per-CPU and
    hw queues but not the requeue list. Hence add a 'kick' operation
    that also examines the requeue list.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Ming Lei
    Reviewed-by: Eduardo Valentin
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Omar Sandoval
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Requests that got stuck in a block driver are neither on
    blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these
    visible in debugfs through the "busy" attribute.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Eduardo Valentin
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Omar Sandoval
    Cc: Ming Lei
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • When verifying whether or not a blk-mq driver forgot to kick the
    requeue list after having requeued a request it is important to
    be able to verify the contents of the requeue list. Hence export
    that list through debugfs.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Ming Lei
    Reviewed-by: Eduardo Valentin
    Cc: Christoph Hellwig
    Cc: Omar Sandoval
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • When analyzing e.g. queue lockups it is important to know whether
    or not a request has already been started. Hence also show the
    atomic request flags.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Ming Lei
    Reviewed-by: Eduardo Valentin
    Cc: Omar Sandoval
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

04 May, 2017

12 commits

  • Expose the fifo lists, cached next requests, batching state, and
    dispatch list. It'd also be possible to add the sorted lists, but there
    aren't already seq_file helpers for rbtrees.

    Signed-off-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • Expose the domain token pools, asynchronous sbitmap depth, domain
    request lists, and batching state.

    Signed-off-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • This provides the infrastructure for schedulers to expose their internal
    state through debugfs. We add a list of queue attributes and a list of
    hctx attributes to struct elevator_type and wire them up when switching
    schedulers.

    Signed-off-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke

    Add missing seq_file.h header in blk-mq-debugfs.h

    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • Originally, I tied debugfs registration/unregistration together with
    sysfs. There's no reason to do this, and it's getting in the way of
    letting schedulers define their own debugfs attributes. Instead, tie the
    debugfs registration to the lifetime of the structures themselves.

    The saner lifetimes mean we can also get rid of the extra mq directory
    and move everything one level up. I.e., nvme0n1/mq/hctx0/tags is now
    just nvme0n1/hctx0/tags.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • Preparation for adding more declarations.

    Signed-off-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • In commit e869b5462f83 ("blk-mq: Unregister debugfs attributes
    earlier"), we shuffled the debugfs cleanup around so that the "state"
    attribute was removed before we freed the blk-mq data structures.
    However, later changes are going to undo that, so we need to explicitly
    disallow running a dead queue.

    [Omar: rebased and updated commit message]
    Signed-off-by: Omar Sandoval
    Signed-off-by: Bart Van Assche
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • A large part of blk-mq-debugfs.c is file_operations and seq_file
    boilerplate. This sucks as is but will suck even more when schedulers
    can define their own debugfs entries. Factor it all out into a single
    blk_mq_debugfs_fops which multiplexes as needed. We store the
    request_queue, blk_mq_hw_ctx, or blk_mq_ctx in the parent directory
    dentry, which is kind of hacky, but it works.

    Signed-off-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • It's not clear what these numbered directories represent unless you
    consult the code. We're about to get rid of the intermediate "mq"
    directory, so these would be even more confusing without that context.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • Slightly more readable, plus we also strip leading spaces.

    Signed-off-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • blk_queue_flags_store() currently truncates and returns a short write if
    the operation being written is too long. This can give us weird results,
    like here:

    $ echo "run bar"
    echo: write error: invalid argument
    $ dmesg
    [ 1103.075435] blk_queue_flags_store: unsupported operation bar. Use either 'run' or 'start'

    Instead, return an error if the user does this. While we're here, make
    the argument names consistent with everywhere else in this file.

    Signed-off-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • Make sure the spelled out flag names match the definition. This also
    adds a missing hctx state, BLK_MQ_S_START_ON_RUN, and a missing
    cmd_flag, __REQ_NOUNMAP.

    Signed-off-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • This reads more naturally than spaces.

    Signed-off-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

27 Apr, 2017

6 commits


21 Apr, 2017

1 commit


11 Apr, 2017

2 commits


22 Mar, 2017

2 commits

  • Currently, statistics are gathered in ~0.13s windows, and users grab the
    statistics whenever they need them. This is not ideal for both in-tree
    users:

    1. Writeback throttling wants its own dynamically sized window of
    statistics. Since the blk-stats statistics are reset after every
    window and the wbt windows don't line up with the blk-stats windows,
    wbt doesn't see every I/O.
    2. Polling currently grabs the statistics on every I/O. Again, depending
    on how the window lines up, we may miss some I/Os. It's also
    unnecessary overhead to get the statistics on every I/O; the hybrid
    polling heuristic would be just as happy with the statistics from the
    previous full window.

    This reworks the blk-stats infrastructure to be callback-based: users
    register a callback that they want called at a given time with all of
    the statistics from the window during which the callback was active.
    Users can dynamically bucketize the statistics. wbt and polling both
    currently use read vs. write, but polling can be extended to further
    subdivide based on request size.

    The callbacks are kept on an RCU list, and each callback has percpu
    stats buffers. There will only be a few users, so the overhead on the
    I/O completion side is low. The stats flushing is also simplified
    considerably: since the timer function is responsible for clearing the
    statistics, we don't have to worry about stale statistics.

    wbt is a trivial conversion. After the conversion, the windowing problem
    mentioned above is fixed.

    For polling, we register an extra callback that caches the previous
    window's statistics in the struct request_queue for the hybrid polling
    heuristic to use.

    Since we no longer have a single stats buffer for the request queue,
    this also removes the sysfs and debugfs stats entries. To replace those,
    we add a debugfs entry for the poll statistics.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • The stats buckets will become generic soon, so make the existing users
    use the common READ and WRITE definitions instead of one internal to
    blk-stat.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

03 Feb, 2017

1 commit


02 Feb, 2017

4 commits


01 Feb, 2017

1 commit

  • Instead of keeping two levels of indirection for requests types, fold it
    all into the operations. The little caveat here is that previously
    cmd_type only applied to struct request, while the request and bio op
    fields were set to plain REQ_OP_READ/WRITE even for passthrough
    operations.

    Instead this patch adds new REQ_OP_* for SCSI passthrough and driver
    private requests, althought it has to add two for each so that we
    can communicate the data in/out nature of the request.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig