23 Aug, 2014

1 commit

  • This patch fixes code such as the following with scsi-mq enabled:

    rq = blk_get_request(...);
    blk_rq_set_block_pc(rq);

    rq->cmd = my_cmd_buffer; /* separate CDB buffer */

    blk_execute_rq_nowait(...);

    Code like this appears in e.g. sg_start_req() in drivers/scsi/sg.c (for
    large CDBs only). Without this patch, scsi_mq_prep_fn() will set
    rq->cmd back to rq->__cmd, causing the wrong CDB to be sent to the device.

    Signed-off-by: Tony Battersby
    Signed-off-by: Jens Axboe

    Tony Battersby
     

22 Aug, 2014

3 commits

  • While converting to percpu_ref for freezing, add703fda981 ("blk-mq:
    use percpu_ref for mq usage count") incorrectly made
    blk_mq_freeze_queue() misbehave when freezing is nested due to
    percpu_ref_kill() being invoked on an already killed ref.

    Fix it by making blk_mq_freeze_queue() kill and kick the queue only
    for the outermost freeze attempt. All the nested ones can simply wait
    for the ref to reach zero.

    While at it, remove unnecessary @wake initialization from
    blk_mq_unfreeze_queue().

    Signed-off-by: Tejun Heo
    Reported-by: Ming Lei
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Just grammar or spelling errors, nothing major.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • blk-mq uses BLK_MQ_F_SHOULD_MERGE, as set by the driver at init time,
    to determine whether it should merge IO or not. However, this could
    also be disabled by the admin, if merging is switched off through
    sysfs. So check the general queue state as well before attempting
    to merge IO.

    Reported-by: Rob Elliott
    Tested-by: Rob Elliott
    Signed-off-by: Jens Axboe

    Jens Axboe
     

16 Aug, 2014

1 commit

  • Before doing queue release, the queue has been freezed already
    by blk_cleanup_queue(), so needn't to freeze queue for deleting
    tag set.

    This patch fixes the WARNING of "percpu_ref_kill() called more than once!"
    which is triggered during unloading block driver.

    Cc: Tejun Heo
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

02 Jul, 2014

5 commits

  • Currently, blk-mq uses a percpu_counter to keep track of how many
    usages are in flight. The percpu_counter is drained while freezing to
    ensure that no usage is left in-flight after freezing is complete.
    blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this
    per-cpu gating mechanism.

    This type of code has relatively high chance of subtle bugs which are
    extremely difficult to trigger and it's way too hairy to be open coded
    in blk-mq. percpu_ref can serve the same purpose after the recent
    changes. This patch replaces the open-coded per-cpu usage counting
    and draining mechanism with percpu_ref.

    blk_mq_queue_enter() performs tryget_live on the ref and exit()
    performs put. blk_mq_freeze_queue() kills the ref and waits until the
    reference count reaches zero. blk_mq_unfreeze_queue() revives the ref
    and wakes up the waiters.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Nicholas A. Bellinger
    Cc: Kent Overstreet
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Keeping __blk_mq_drain_queue() as a separate function doesn't buy us
    anything and it's gonna be further simplified. Let's flatten it into
    its caller.

    This patch doesn't make any functional change.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Nicholas A. Bellinger
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blk_mq freezing is entangled with generic bypassing which bypasses
    blkcg and io scheduler and lets IO requests fall through the block
    layer to the drivers in FIFO order. This allows forward progress on
    IOs with the advanced features disabled so that those features can be
    configured or altered without worrying about stalling IO which may
    lead to deadlock through memory allocation.

    However, generic bypassing doesn't quite fit blk-mq. blk-mq currently
    doesn't make use of blkcg or ioscheds and it maps bypssing to
    freezing, which blocks request processing and drains all the in-flight
    ones. This causes problems as bypassing assumes that request
    processing is online. blk-mq works around this by conditionally
    allowing request processing for the problem case - during queue
    initialization.

    Another weirdity is that except for during queue cleanup, bypassing
    started on the generic side prevents blk-mq from processing new
    requests but doesn't drain the in-flight ones. This shouldn't break
    anything but again highlights that something isn't quite right here.

    The root cause is conflating blk-mq freezing and generic bypassing
    which are two different mechanisms. The only intersecting purpose
    that they serve is during queue cleanup. Let's properly separate
    blk-mq freezing from generic bypassing and simply use it where
    necessary.

    * request_queue->mq_freeze_depth is added and
    blk_mq_[un]freeze_queue() now operate on this counter instead of
    ->bypass_depth. The replacement for QUEUE_FLAG_BYPASS isn't added
    but the counter is tested directly. This will be further updated by
    later changes.

    * blk_mq_drain_queue() is dropped and "__" prefix is dropped from
    blk_mq_freeze_queue(). Queue cleanup path now calls
    blk_mq_freeze_queue() directly.

    * blk_queue_enter()'s fast path condition is simplified to simply
    check @q->mq_freeze_depth. Previously, the condition was

    !blk_queue_dying(q) &&
    (!blk_queue_bypass(q) || !blk_queue_init_done(q))

    mq_freeze_depth is incremented right after dying is set and
    blk_queue_init_done() exception isn't necessary as blk-mq doesn't
    start frozen, which only leaves the blk_queue_bypass() test which
    can be replaced by @q->mq_freeze_depth test.

    This change simplifies the code and reduces confusion in the area.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Nicholas A. Bellinger
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Currently, both blk_queue_bypass_start() and blk_mq_freeze_queue()
    skip queue draining if bypass_depth was already above zero. The
    assumption is that the one which bumped the bypass_depth should have
    performed draining already; however, there's nothing which prevents a
    new instance of bypassing/freezing from starting before the previous
    one finishes draining. The current code may allow the later
    bypassing/freezing instances to complete while there still are
    in-flight requests which haven't finished draining.

    Fix it by draining regardless of bypass_depth. We still skip draining
    from blk_queue_bypass_start() while the queue is initializing to avoid
    introducing excessive delays during boot. INIT_DONE setting is moved
    above the initial blk_queue_bypass_end() so that bypassing attempts
    can't slip inbetween.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Nicholas A. Bellinger
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • blk-mq uses a percpu_counter to keep track of how many usages are in
    flight. The percpu_counter is drained while freezing to ensure that
    no usage is left in-flight after freezing is complete.

    blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this
    per-cpu gating mechanism; unfortunately, it contains a subtle bug -
    smp_wmb() in blk_mq_queue_enter() doesn't prevent prevent the cpu from
    fetching @q->bypass_depth before incrementing @q->mq_usage_counter and
    if freezing happens inbetween the caller can slip through and freezing
    can be complete while there are active users.

    Use smp_mb() instead so that bypass_depth and mq_usage_counter
    modifications and tests are properly interlocked.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Nicholas A. Bellinger
    Signed-off-by: Jens Axboe

    Tejun Heo
     

25 Jun, 2014

1 commit

  • Currently it calls __blk_mq_run_hw_queue(), which depends on the
    CPU placement being correct. This means it's not possible to call
    blk_mq_start_hw_queues(q) from a context that is correct for all
    queues, leading to triggering the

    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask));

    in __blk_mq_run_hw_queue().

    Reported-by: Ming Lei
    Signed-off-by: Jens Axboe

    Jens Axboe
     

14 Jun, 2014

2 commits


10 Jun, 2014

1 commit

  • This way will become consistent with non-mq case, also
    avoid to update rq->deadline twice for mq.

    The comment said: "We do this early, to ensure we are on
    the right CPU.", but no percpu stuff is used in blk_add_timer(),
    so it isn't necessary. Even when inserting from plug list, there
    is no such guarantee at all.

    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

09 Jun, 2014

1 commit

  • The blk-mq core only initializes this if io stats are enabled, since
    blk-mq only reads the field in that case. But drivers could
    potentially use it internally, so ensure that we always set it to
    the current time when the request is allocated.

    Reported-by: Ming Lei
    Signed-off-by: Jens Axboe

    Jens Axboe
     

07 Jun, 2014

2 commits


06 Jun, 2014

1 commit

  • For some scsi-mq cases, the tag map can be huge. So increase the
    max number of tags we support.

    Additionally, don't fail with EINVAL if a user requests too many
    tags. Warn that the tag depth has been adjusted down, and store
    the new value inside the tag_set passed in.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

05 Jun, 2014

1 commit

  • We currently pass in the hardware queue, and get the tags from there.
    But from scsi-mq, with a shared tag space, it's a lot more convenient
    to pass in the blk_mq_tags instead as the hardware queue isn't always
    directly available. So instead of having to re-map to a given
    hardware queue from rq->mq_ctx, just pass in the tags structure.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

04 Jun, 2014

5 commits


31 May, 2014

2 commits

  • We have callers outside of the blk-mq proper (like timeouts) that
    want to call __blk_mq_complete_request(), so rename the function
    and put the decision code for whether to use ->softirq_done_fn
    or blk_mq_endio() into __blk_mq_complete_request().

    This also makes the interface more logical again.
    blk_mq_complete_request() attempts to atomically mark the request
    completed, and calls __blk_mq_complete_request() if successful.
    __blk_mq_complete_request() then just ends the request.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Commit 07068d5b8e added a direct-to-hw-queue mode, but this mode
    needs to remember to add the request timeout handler as well.
    Without it, we don't track timeouts for these requests.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

30 May, 2014

3 commits


29 May, 2014

3 commits


28 May, 2014

8 commits