04 May, 2017

1 commit

  • This provides the infrastructure for schedulers to expose their internal
    state through debugfs. We add a list of queue attributes and a list of
    hctx attributes to struct elevator_type and wire them up when switching
    schedulers.

    Signed-off-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke

    Add missing seq_file.h header in blk-mq-debugfs.h

    Signed-off-by: Jens Axboe

    Omar Sandoval
     

02 May, 2017

1 commit


27 Apr, 2017

1 commit

  • At least one driver, mtip32xx, has a hard coded dependency on
    the value of the reserved tag used for internal commands. While
    that should really be fixed up, for now let's ensure that we just
    bypass the scheduler tags an allocation marked as reserved. They
    are used for house keeping or error handling, so we can safely
    ignore them in the scheduler.

    Tested-by: Ming Lei
    Signed-off-by: Jens Axboe

    Jens Axboe
     

21 Apr, 2017

1 commit


08 Apr, 2017

2 commits

  • Schedulers need to be informed when a hardware queue is added or removed
    at runtime so they can allocate/free per-hardware queue data. So,
    replace the blk_mq_sched_init_hctx_data() helper, which only makes sense
    at init time, with .init_hctx() and .exit_hctx() hooks.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • To improve scalability, if hardware queues are shared, restart
    a single hardware queue in round-robin fashion. Rename
    blk_mq_sched_restart_queues() to reflect the new semantics.
    Remove blk_mq_sched_mark_restart_queue() because this function
    has no callers. Remove flag QUEUE_FLAG_RESTART because this
    patch removes the code that uses this flag.

    Signed-off-by: Bart Van Assche
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

07 Apr, 2017

4 commits

  • In elevator_switch(), if blk_mq_init_sched() fails, we attempt to fall
    back to the original scheduler. However, at this point, we've already
    torn down the original scheduler's tags, so this causes a crash. Doing
    the fallback like the legacy elevator path is much harder for mq, so fix
    it by just falling back to none, instead.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • If a new hardware queue is added at runtime, we don't allocate scheduler
    tags for it, leading to a crash. This hooks up the scheduler framework
    to blk_mq_{init,exit}_hctx() to make sure everything gets properly
    initialized/freed.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • Preparation cleanup for the next couple of fixes, push
    blk_mq_sched_setup() and e->ops.mq.init_sched() into a helper.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • While dispatching requests, if we fail to get a driver tag, we mark the
    hardware queue as waiting for a tag and put the requests on a
    hctx->dispatch list to be run later when a driver tag is freed. However,
    blk_mq_dispatch_rq_list() may dispatch requests from multiple hardware
    queues if using a single-queue scheduler with a multiqueue device. If
    blk_mq_get_driver_tag() fails, it doesn't update the hardware queue we
    are processing. This means we end up using the hardware queue of the
    previous request, which may or may not be the same as that of the
    current request. If it isn't, the wrong hardware queue will end up
    waiting for a tag, and the requests will be on the wrong dispatch list,
    leading to a hang.

    The fix is twofold:

    1. Make sure we save which hardware queue we were trying to get a
    request for in blk_mq_get_driver_tag() regardless of whether it
    succeeds or not.
    2. Make blk_mq_dispatch_rq_list() take a request_queue instead of a
    blk_mq_hw_queue to make it clear that it must handle multiple
    hardware queues, since I've already messed this up on a couple of
    occasions.

    This didn't appear in testing with nvme and mq-deadline because nvme has
    more driver tags than the default number of scheduler tags. However,
    with the blk_mq_update_nr_hw_queues() fix, it showed up with nbd.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

02 Mar, 2017

3 commits


24 Feb, 2017

1 commit

  • In blk_mq_sched_dispatch_requests(), we call blk_mq_sched_mark_restart()
    after we dispatch requests left over on our hardware queue dispatch
    list. This is so we'll go back and dispatch requests from the scheduler.
    In this case, it's only necessary to restart the hardware queue that we
    are running; there's no reason to run other hardware queues just because
    we are using shared tags.

    So, split out blk_mq_sched_mark_restart() into two operations, one for
    just the hardware queue and one for the whole request queue. The core
    code only needs the hctx variant, but I/O schedulers will want to use
    both.

    This also requires adjusting blk_mq_sched_restart_queues() to always
    check the queue restart flag, not just when using shared tags.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

23 Feb, 2017

1 commit


18 Feb, 2017

2 commits


11 Feb, 2017

1 commit

  • bio is used in bfq-mq's get_rq_priv, to get the request group. We could
    pass directly the group here, but I thought that passing the bio was
    more general, giving the possibility to get other pieces of information
    if needed.

    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     

09 Feb, 2017

1 commit


04 Feb, 2017

1 commit

  • If we end up doing a request-to-request merge when we have completed
    a bio-to-request merge, we free the request from deep down in that
    path. For blk-mq-sched, the merge path has to hold the appropriate
    lock, but we don't need it for freeing the request. And in fact
    holding the lock is problematic, since we are now calling the
    mq sched put_rq_private() hook with the lock held. Other call paths
    do not hold this lock.

    Fix this inconsistency by ensuring that the caller frees a merged
    request. Then we can do it outside of the lock, making it both more
    efficient and fixing the blk-mq-sched problem of invoking parts of
    the scheduler with an unknown lock state.

    Reported-by: Paolo Valente
    Signed-off-by: Jens Axboe
    Reviewed-by: Omar Sandoval

    Jens Axboe
     

03 Feb, 2017

1 commit


28 Jan, 2017

3 commits


27 Jan, 2017

4 commits

  • When we invoke dispatch_requests(), the scheduler empties everything
    into the passed in list. This isn't always a good thing, since it
    means that we remove items that we could have potentially merged
    with.

    Change the function to dispatch single requests at the time. If
    we do that, we can backoff exactly at the point where the device
    can't consume more IO, and leave the rest with the scheduler for
    better merging and future dispatch decision making.

    Signed-off-by: Jens Axboe
    Reviewed-by: Omar Sandoval
    Tested-by: Hannes Reinecke

    Jens Axboe
     
  • If we have both multiple hardware queues and shared tag map between
    devices, we need to ensure that we propagate the hardware queue
    restart bit higher up. This is because we can get into a situation
    where we don't have any IO pending on a hardware queue, yet we fail
    getting a tag to start new IO. If that happens, it's not enough to
    mark the hardware queue as needing a restart, we need to bubble
    that up to the higher level queue as well.

    Signed-off-by: Jens Axboe
    Reviewed-by: Omar Sandoval
    Tested-by: Hannes Reinecke

    Jens Axboe
     
  • We don't trigger this from the normal IO path, since we always use
    blocking allocations from there. But Bart saw it testing multipath
    dm, since that is a heavy user of atomic request allocations in
    the map and clone path.

    Reported-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • If we come in from blk_mq_alloc_requst() with NOWAIT set in flags,
    we must ensure that we don't later overwrite that in
    blk_mq_sched_get_request(). Initialize alloc_data->flags before
    passing it in.

    Reported-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Jens Axboe
     

18 Jan, 2017

2 commits

  • Add Kconfig entries to manage what devices get assigned an MQ
    scheduler, and add a blk-mq flag for drivers to opt out of scheduling.
    The latter is useful for admin type queues that still allocate a blk-mq
    queue and tag set, but aren't use for normal IO.

    Signed-off-by: Jens Axboe
    Reviewed-by: Bart Van Assche
    Reviewed-by: Omar Sandoval

    Jens Axboe
     
  • This adds a set of hooks that intercepts the blk-mq path of
    allocating/inserting/issuing/completing requests, allowing
    us to develop a scheduler within that framework.

    We reuse the existing elevator scheduler API on the registration
    side, but augment that with the scheduler flagging support for
    the blk-mq interfce, and with a separate set of ops hooks for MQ
    devices.

    We split driver and scheduler tags, so we can run the scheduling
    independently of device queue depth.

    Signed-off-by: Jens Axboe
    Reviewed-by: Bart Van Assche
    Reviewed-by: Omar Sandoval

    Jens Axboe