04 Sep, 2020

1 commit

  • High CPU utilization on "native_queued_spin_lock_slowpath" due to lock
    contention is possible for mq-deadline and bfq IO schedulers
    when nr_hw_queues is more than one.

    It is because kblockd work queue can submit IO from all online CPUs
    (through blk_mq_run_hw_queues()) even though only one hctx has pending
    commands.

    The elevator callback .has_work for mq-deadline and bfq scheduler considers
    pending work if there are any IOs on request queue but it does not account
    hctx context.

    Add a per-hctx 'elevator_queued' count to the hctx to avoid triggering
    the elevator even though there are no requests queued.

    [jpg: Relocated atomic_dec() in dd_dispatch_request(), update commit message per Kashyap]

    Signed-off-by: Kashyap Desai
    Signed-off-by: Hannes Reinecke
    Signed-off-by: John Garry
    Tested-by: Douglas Gilbert
    Signed-off-by: Jens Axboe

    Kashyap Desai
     

30 May, 2020

1 commit


06 Sep, 2019

1 commit

  • Introduce the definition of elevator features through the
    elevator_features flags in the elevator_type structure. Each flag can
    represent a feature supported by an elevator. The first feature defined
    by this patch is support for zoned block device sequential write
    constraint with the flag ELEVATOR_F_ZBD_SEQ_WRITE, which is implemented
    by the mq-deadline elevator using zone write locking.

    Other possible features are IO priorities, write hints, latency targets
    or single-LUN dual-actuator disks (for which the elevator could maintain
    one LBA ordered list per actuator).

    The required_elevator_features field is also added to the request_queue
    structure to allow a device driver to specify elevator feature flags
    that an elevator must support for the correct operation of the device
    (e.g. device drivers for zoned block devices can have the
    ELEVATOR_F_ZBD_SEQ_WRITE flag as a required feature).
    The helper function blk_queue_required_elevator_features() is
    defined for setting this new field.

    With these two new fields in place, the elevator functions
    elevator_match() and elevator_find() are modified to allow a user to set
    only an elevator with a set of features that satisfies the device
    required features. Elevators not matching the device requirements are
    not shown in the device sysfs queue/scheduler file to prevent their use.

    The "none" elevator can always be selected as before.

    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Damien Le Moal
     

03 Sep, 2019

1 commit

  • Commit 7211aef86f79 ("block: mq-deadline: Fix write completion
    handling") added a call to blk_mq_sched_mark_restart_hctx() in
    dd_dispatch_request() to make sure that write request dispatching does
    not stall when all target zones are locked. This fix left a subtle race
    when a write completion happens during a dispatch execution on another
    CPU:

    CPU 0: Dispatch CPU1: write completion

    dd_dispatch_request()
    lock(&dd->lock);
    ...
    lock(&dd->zone_lock); dd_finish_request()
    rq = find request lock(&dd->zone_lock);
    unlock(&dd->zone_lock);
    zone write unlock
    unlock(&dd->zone_lock);
    ...
    __blk_mq_free_request
    check restart flag (not set)
    -> queue not run
    ...
    if (!rq && have writes)
    blk_mq_sched_mark_restart_hctx()
    unlock(&dd->lock)

    Since the dispatch context finishes after the write request completion
    handling, marking the queue as needing a restart is not seen from
    __blk_mq_free_request() and blk_mq_sched_restart() not executed leading
    to the dispatch stall under 100% write workloads.

    Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from
    dd_dispatch_request() into dd_finish_request() under the zone lock to
    ensure full mutual exclusion between write request dispatch selection
    and zone unlock on write request completion.

    Fixes: 7211aef86f79 ("block: mq-deadline: Fix write completion handling")
    Cc: stable@vger.kernel.org
    Reported-by: Hans Holmberg
    Reviewed-by: Hans Holmberg
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Damien Le Moal
     

15 Jul, 2019

1 commit

  • Rename the block documentation files to ReST, add an
    index for them and adjust in order to produce a nice html
    output via the Sphinx build system.

    At its new index.rst, let's add a :orphan: while this is not linked to
    the main index.rst file, in order to avoid build warnings.

    Signed-off-by: Mauro Carvalho Chehab

    Mauro Carvalho Chehab
     

21 Jun, 2019

1 commit

  • We only need the number of segments in the blk-mq submission path.
    Remove the field from struct bio, and return it from a variant of
    blk_queue_split instead of that it can passed as an argument to
    those functions that need the value.

    This also means we stop recounting segments except for cloning
    and partial segments.

    To keep the number of arguments in this how path down remove
    pointless struct request_queue arguments from any of the functions
    that had it and grew a nr_segs argument.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

01 May, 2019

1 commit


18 Dec, 2018

1 commit

  • For a zoned block device using mq-deadline, if a write request for a
    zone is received while another write was already dispatched for the same
    zone, dd_dispatch_request() will return NULL and the newly inserted
    write request is kept in the scheduler queue waiting for the ongoing
    zone write to complete. With this behavior, when no other request has
    been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
    and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
    __blk_mq_free_request() call of blk_mq_sched_restart() to not run the
    queue when the already dispatched write request completes. The newly
    dispatched request stays stuck in the scheduler queue until eventually
    another request is submitted.

    This problem does not affect SCSI disk as the SCSI stack handles queue
    restart on request completion. However, this problem is can be triggered
    the nullblk driver with zoned mode enabled.

    Fix this by always requesting a queue restart in dd_dispatch_request()
    if no request was dispatched while WRITE requests are queued.

    Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support")
    Cc:
    Signed-off-by: Damien Le Moal

    Add missing export of blk_mq_sched_restart()

    Signed-off-by: Jens Axboe

    Damien Le Moal
     

08 Nov, 2018

2 commits

  • This is a remnant of when we had ops for both SQ and MQ
    schedulers. Now it's just MQ, so get rid of the union.

    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This removes a bunch of core and elevator related code. On the core
    front, we remove anything related to queue running, draining,
    initialization, plugging, and congestions. We also kill anything
    related to request allocation, merging, retrieval, and completion.

    Remove any checking for single queue IO schedulers, as they no
    longer exist. This means we can also delete a bunch of code related
    to request issue, adding, completion, etc - and all the SQ related
    ops and helpers.

    Also kill the load_default_modules(), as all that did was provide
    for a way to load the default single queue elevator.

    Tested-by: Ming Lei
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     

25 May, 2018

1 commit

  • Convert the S_ symbolic permissions to their octal equivalents as
    using octal and not symbolic permissions is preferred by many as more
    readable.

    see: https://lkml.org/lkml/2016/8/2/1945

    Done with automated conversion via:
    $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace

    Miscellanea:

    o Wrapped modified multi-line calls to a single line where appropriate
    o Realign modified multi-line calls to open parenthesis

    Signed-off-by: Joe Perches
    Signed-off-by: Jens Axboe

    Joe Perches
     

01 Mar, 2018

1 commit

  • In case of a failed write request (all retries failed) and when using
    libata, the SCSI error handler calls scsi_finish_command(). In the
    case of blk-mq this means that scsi_mq_done() does not get called,
    that blk_mq_complete_request() does not get called and also that the
    mq-deadline .completed_request() method is not called. This results in
    the target zone of the failed write request being left in a locked
    state, preventing that any new write requests are issued to the same
    zone.

    Fix this by replacing the .completed_request() method with the
    .finish_request() method as this method is always called whether or
    not a request completes successfully. Since the .finish_request()
    method is only called by the blk-mq core if a .prepare_request()
    method exists, add a dummy .prepare_request() method.

    Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support")
    Cc: Hannes Reinecke
    Reviewed-by: Ming Lei
    Signed-off-by: Damien Le Moal
    [ bvanassche: edited patch description ]
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Damien Le Moal
     

07 Jan, 2018

1 commit


06 Jan, 2018

2 commits

  • Introduce zone write locking to avoid write request reordering with
    zoned block devices. This is achieved using a finer selection of the
    next request to dispatch:
    1) Any non-write request is always allowed to proceed.
    2) Any write to a conventional zone is always allowed to proceed.
    3) For a write to a sequential zone, the zone lock is first checked.
    a) If the zone is not locked, the write is allowed to proceed after
    its target zone is locked.
    b) If the zone is locked, the write request is skipped and the next
    request in the dispatch queue tested (back to step 1).

    For a write request that has locked its target zone, the zone is
    unlocked either when the request completes with a call to the method
    deadline_request_completed() or when the request is requeued using
    dd_insert_request().

    Requests targeting a locked zone are always left in the scheduler queue
    to preserve the lba ordering for write requests. If no write request
    can be dispatched, allow reads to be dispatched even if the write batch
    is not done.

    If the device used is not a zoned block device, or if zoned block device
    support is disabled, this patch does not modify mq-deadline behavior.

    Signed-off-by: Damien Le Moal
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Damien Le Moal
     
  • Avoid directly referencing the next_rq and fifo_list arrays using the
    helper functions deadline_next_request() and deadline_fifo_request() to
    facilitate changes in the dispatch request selection in
    __dd_dispatch_request() for zoned block devices.

    Signed-off-by: Damien Le Moal
    Reviewed-by: Bart Van Assche
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Damien Le Moal
     

26 Oct, 2017

1 commit

  • The scheduler framework now supports looking up the appropriate
    scheduler with the {name,mq} tupple. We can register mq-deadline
    with the alias of 'deadline', so that switching to 'deadline'
    will do the right thing based on the type of driver attached to
    it.

    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     

30 Aug, 2017

1 commit


29 Aug, 2017

1 commit


04 May, 2017

1 commit

  • Expose the fifo lists, cached next requests, batching state, and
    dispatch list. It'd also be possible to add the sorted lists, but there
    aren't already seq_file helpers for rbtrees.

    Signed-off-by: Omar Sandoval
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

09 Feb, 2017

1 commit


04 Feb, 2017

1 commit

  • If we end up doing a request-to-request merge when we have completed
    a bio-to-request merge, we free the request from deep down in that
    path. For blk-mq-sched, the merge path has to hold the appropriate
    lock, but we don't need it for freeing the request. And in fact
    holding the lock is problematic, since we are now calling the
    mq sched put_rq_private() hook with the lock held. Other call paths
    do not hold this lock.

    Fix this inconsistency by ensuring that the caller frees a merged
    request. Then we can do it outside of the lock, making it both more
    efficient and fixing the blk-mq-sched problem of invoking parts of
    the scheduler with an unknown lock state.

    Reported-by: Paolo Valente
    Signed-off-by: Jens Axboe
    Reviewed-by: Omar Sandoval

    Jens Axboe
     

03 Feb, 2017

1 commit


01 Feb, 2017

1 commit

  • This can be used to check for fs vs non-fs requests and basically
    removes all knowledge of BLOCK_PC specific from the block layer,
    as well as preparing for removing the cmd_type field in struct request.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

27 Jan, 2017

1 commit

  • When we invoke dispatch_requests(), the scheduler empties everything
    into the passed in list. This isn't always a good thing, since it
    means that we remove items that we could have potentially merged
    with.

    Change the function to dispatch single requests at the time. If
    we do that, we can backoff exactly at the point where the device
    can't consume more IO, and leave the rest with the scheduler for
    better merging and future dispatch decision making.

    Signed-off-by: Jens Axboe
    Reviewed-by: Omar Sandoval
    Tested-by: Hannes Reinecke

    Jens Axboe
     

18 Jan, 2017

1 commit