26 Oct, 2016

1 commit


15 Sep, 2016

1 commit

  • All drivers use the default, so provide an inline version of it. If we
    ever need other queue mapping we can add an optional method back,
    although supporting will also require major changes to the queue setup
    code.

    This provides better code generation, and better debugability as well.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

08 Jun, 2016

4 commits

  • To avoid confusion between REQ_OP_FLUSH, which is handled by
    request_fn drivers, and upper layers requesting the block layer
    perform a flush sequence along with possibly a WRITE, this patch
    renames REQ_FLUSH to REQ_PREFLUSH.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • This adds a REQ_OP_FLUSH operation that is sent to request_fn
    based drivers by the block layer's flush code, instead of
    sending requests with the request->cmd_flags REQ_FLUSH bit set.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • This patch converts the simple bi_rw use cases in the block,
    drivers, mm and fs code to set/get the bio operation using
    bio_set_op_attrs/bio_op

    These should be simple one or two liner cases, so I just did them
    in one patch. The next patches handle the more complicated
    cases in a module per patch.

    Signed-off-by: Mike Christie
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
    instead of passing it in. This makes that use the same as
    generic_make_request and how we set the other bio fields.

    Signed-off-by: Mike Christie

    Fixed up fs/ext4/crypto.c

    Signed-off-by: Jens Axboe

    Mike Christie
     

14 Apr, 2016

1 commit


26 Nov, 2015

1 commit

  • This reverts commit 1b2ff19e6a957b1ef0f365ad331b608af80e932e.

    Jan writes:

    --

    Thanks for report! After some investigation I found out we allocate
    elevator specific data in __get_request() only for non-flush requests. And
    this is actually required since the flush machinery uses the space in
    struct request for something else. Doh. So my patch is just wrong and not
    easy to fix since at the time __get_request() is called we are not sure
    whether the flush machinery will be used in the end. Jens, please revert
    1b2ff19e6a957b1ef0f365ad331b608af80e932e. Thanks!

    I'm somewhat surprised that you can reliably hit the race where flushing
    gets disabled for the device just while the request is in flight. But I
    guess during boot it makes some sense.

    --

    So let's just revert it, we can fix the queue run manually after the
    fact. This race is rare enough that it didn't trigger in testing, it
    requires the specific disable-while-in-flight scenario to trigger.

    Jens Axboe
     

17 Nov, 2015

1 commit

  • Currently blk_insert_flush() just adds flush request to q->queue_head
    when flush is not required. That completely bypasses IO scheduler so
    e.g. CFQ can be idling waiting for new request to arrive and will idle
    through the whole window unnecessarily. Luckily this only happens in
    rare cases as usually checks in generic_make_request_checks() clear
    FLUSH and FUA flags early if they are not needed.

    When no flushing is actually required, we can easily fix the problem by
    properly queueing the request through the IO scheduler. Ideally IO
    scheduler should be also made aware of requests queued via
    blk_flush_queue_rq(). However inserting flush request through IO
    scheduler can have unwanted side-effects since due to flush batching
    delaying the flush request in IO scheduler will delay all flush requests
    possibly coming from other processes. So we keep adding the request
    directly to q->queue_head.

    Signed-off-by: Jan Kara
    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jan Kara
     

15 Aug, 2015

1 commit

  • Inside timeout handler, blk_mq_tag_to_rq() is called
    to retrieve the request from one tag. This way is obviously
    wrong because the request can be freed any time and some
    fiedds of the request can't be trusted, then kernel oops
    might be triggered[1].

    Currently wrt. blk_mq_tag_to_rq(), the only special case is
    that the flush request can share same tag with the request
    cloned from, and the two requests can't be active at the same
    time, so this patch fixes the above issue by updating tags->rqs[tag]
    with the active request(either flush rq or the request cloned
    from) of the tag.

    Also blk_mq_tag_to_rq() gets much simplified with this patch.

    Given blk_mq_tag_to_rq() is mainly for drivers and the caller must
    make sure the request can't be freed, so in bt_for_each() this
    helper is replaced with tags->rqs[tag].

    [1] kernel oops log
    [ 439.696220] BUG: unable to handle kernel NULL pointer dereference at 0000000000000158^M
    [ 439.697162] IP: [] blk_mq_tag_to_rq+0x21/0x6e^M
    [ 439.700653] PGD 7ef765067 PUD 7ef764067 PMD 0 ^M
    [ 439.700653] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
    [ 439.700653] Dumping ftrace buffer:^M
    [ 439.700653] (ftrace buffer empty)^M
    [ 439.700653] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
    [ 439.700653] CPU: 6 PID: 2779 Comm: stress-ng-sigfd Not tainted 4.2.0-rc5-next-20150805+ #265^M
    [ 439.730500] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M
    [ 439.730500] task: ffff880605308000 ti: ffff88060530c000 task.ti: ffff88060530c000^M
    [ 439.730500] RIP: 0010:[] [] blk_mq_tag_to_rq+0x21/0x6e^M
    [ 439.730500] RSP: 0018:ffff880819203da0 EFLAGS: 00010283^M
    [ 439.730500] RAX: ffff880811b0e000 RBX: ffff8800bb465f00 RCX: 0000000000000002^M
    [ 439.730500] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000^M
    [ 439.730500] RBP: ffff880819203db0 R08: 0000000000000002 R09: 0000000000000000^M
    [ 439.730500] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000202^M
    [ 439.730500] R13: ffff880814104800 R14: 0000000000000002 R15: ffff880811a2ea00^M
    [ 439.730500] FS: 00007f165b3f5740(0000) GS:ffff880819200000(0000) knlGS:0000000000000000^M
    [ 439.730500] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M
    [ 439.730500] CR2: 0000000000000158 CR3: 00000007ef766000 CR4: 00000000000006e0^M
    [ 439.730500] Stack:^M
    [ 439.730500] 0000000000000008 ffff8808114eed90 ffff880819203e00 ffffffff812dc104^M
    [ 439.755663] ffff880819203e40 ffffffff812d9f5e 0000020000000000 ffff8808114eed80^M
    [ 439.755663] Call Trace:^M
    [ 439.755663] ^M
    [ 439.755663] [] bt_for_each+0x6e/0xc8^M
    [ 439.755663] [] ? blk_mq_rq_timed_out+0x6a/0x6a^M
    [ 439.755663] [] ? blk_mq_rq_timed_out+0x6a/0x6a^M
    [ 439.755663] [] blk_mq_tag_busy_iter+0x55/0x5e^M
    [ 439.755663] [] ? blk_mq_bio_to_request+0x38/0x38^M
    [ 439.755663] [] blk_mq_rq_timer+0x5d/0xd4^M
    [ 439.755663] [] call_timer_fn+0xf7/0x284^M
    [ 439.755663] [] ? call_timer_fn+0x5/0x284^M
    [ 439.755663] [] ? blk_mq_bio_to_request+0x38/0x38^M
    [ 439.755663] [] run_timer_softirq+0x1ce/0x1f8^M
    [ 439.755663] [] __do_softirq+0x181/0x3a4^M
    [ 439.755663] [] irq_exit+0x40/0x94^M
    [ 439.755663] [] smp_apic_timer_interrupt+0x33/0x3e^M
    [ 439.755663] [] apic_timer_interrupt+0x84/0x90^M
    [ 439.755663] ^M
    [ 439.755663] [] ? _raw_spin_unlock_irq+0x32/0x4a^M
    [ 439.755663] [] finish_task_switch+0xe0/0x163^M
    [ 439.755663] [] ? finish_task_switch+0xa2/0x163^M
    [ 439.755663] [] __schedule+0x469/0x6cd^M
    [ 439.755663] [] schedule+0x82/0x9a^M
    [ 439.789267] [] signalfd_read+0x186/0x49a^M
    [ 439.790911] [] ? wake_up_q+0x47/0x47^M
    [ 439.790911] [] __vfs_read+0x28/0x9f^M
    [ 439.790911] [] ? __fget_light+0x4d/0x74^M
    [ 439.790911] [] vfs_read+0x7a/0xc6^M
    [ 439.790911] [] SyS_read+0x49/0x7f^M
    [ 439.790911] [] entry_SYSCALL_64_fastpath+0x12/0x6f^M
    [ 439.790911] Code: 48 89 e5 e8 a9 b8 e7 ff 5d c3 0f 1f 44 00 00 55 89
    f2 48 89 e5 41 54 41 89 f4 53 48 8b 47 60 48 8b 1c d0 48 8b 7b 30 48 8b
    53 38 8b 87 58 01 00 00 48 85 c0 75 09 48 8b 97 88 0c 00 00 eb 10
    ^M
    [ 439.790911] RIP [] blk_mq_tag_to_rq+0x21/0x6e^M
    [ 439.790911] RSP ^M
    [ 439.790911] CR2: 0000000000000158^M
    [ 439.790911] ---[ end trace d40af58949325661 ]---^M

    Cc:
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

26 Sep, 2014

9 commits

  • This patch supports to run one single flush machinery for
    each blk-mq dispatch queue, so that:

    - current init_request and exit_request callbacks can
    cover flush request too, then the buggy copying way of
    initializing flush request's pdu can be fixed

    - flushing performance gets improved in case of multi hw-queue

    In fio sync write test over virtio-blk(4 hw queues, ioengine=sync,
    iodepth=64, numjobs=4, bs=4K), it is observed that througput gets
    increased a lot over my test environment:
    - throughput: +70% in case of virtio-blk over null_blk
    - throughput: +30% in case of virtio-blk over SSD image

    The multi virtqueue feature isn't merged to QEMU yet, and patches for
    the feature can be found in below tree:

    git://kernel.ubuntu.com/ming/qemu.git v2.1.0-mq.4

    And simply passing 'num_queues=4 vectors=5' should be enough to
    enable multi queue(quad queue) feature for QEMU virtio-blk.

    Suggested-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • This patch adds 'blk_mq_ctx' parameter to blk_get_flush_queue(),
    so that this function can find the corresponding blk_flush_queue
    bound with current mq context since the flush queue will become
    per hw-queue.

    For legacy queue, the parameter can be simply 'NULL'.

    For multiqueue case, the parameter should be set as the context
    from which the related request is originated. With this context
    info, the hw queue and related flush queue can be found easily.

    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Just figuring out flush queue at the entry of kicking off flush
    machinery and request's completion handler, then pass it through.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Now mission of the two helpers is over, and just call
    blk_alloc_flush_queue() and blk_free_flush_queue() directly.

    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • This patch introduces 'struct blk_flush_queue' and puts all
    flush machinery related fields into this structure, so that

    - flush implementation details aren't exposed to driver
    - it is easy to convert to per dispatch-queue flush machinery

    This patch is basically a mechanical replacement.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • This patch trys to use local variable to access flush request,
    so that we can convert to per-queue flush machinery a bit easier.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • These fields are always used with the flush request, so
    initialize them together.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • These two temporary functions are introduced for holding flush
    initialization and de-initialization, so that we can
    introduce 'flush queue' easier in the following patch. And
    once 'flush queue' and its allocation/free functions are ready,
    they will be removed for sake of code readability.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • It is reasonable to allocate flush req in blk_mq_init_flush().

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

23 Sep, 2014

2 commits

  • This patch removes two unnecessary blk_clear_rq_complete(),
    the REQ_ATOM_COMPLETE flag is cleared inside blk_mq_start_request(),
    so:

    - The blk_clear_rq_complete() in blk_flush_restore_request()
    needn't because the request will be freed later, and clearing
    it here may open a small race window with timeout.

    - The blk_clear_rq_complete() in blk_mq_requeue_request() isn't
    necessary too, even though REQ_ATOM_STARTED is cleared in
    __blk_mq_requeue_request(), in theory it still may cause a small
    race window with timeout since the two clear_bit() may be
    reordered.

    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Now that we've changed the driver API on the submission side use the
    opportunity to fix up the name on the completion side to fit into the
    general scheme.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

12 Jun, 2014

1 commit


05 Jun, 2014

1 commit


30 May, 2014

1 commit


28 May, 2014

1 commit

  • Both the cache flush state machine and the SCSI midlayer want to submit
    requests from irq context, and the current per-request requeue_work
    unfortunately causes corruption due to sharing with the csd field for
    flushes. Replace them with a per-request_queue list of requests to
    be requeued.

    Based on an earlier test by Ming Lei.

    Signed-off-by: Christoph Hellwig
    Reported-by: Ming Lei
    Tested-by: Ming Lei
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

17 Apr, 2014

1 commit


16 Apr, 2014

2 commits

  • Drivers shouldn't have to care about the block layer setting aside a
    request to implement the flush state machine. We already override the
    mq context and tag to make it more transparent, but so far haven't deal
    with the driver private data in the request. Make sure to override this
    as well, and while we're at it add a proper helper sitting in blk-mq.c
    that implements the full impersonation.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Drivers can reach their private data easily using the blk_mq_rq_to_pdu
    helper and don't need req->special. By not initializing it code can
    be simplified nicely, and we also shave off a few more instructions from
    the I/O path.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

10 Apr, 2014

1 commit


09 Mar, 2014

1 commit


22 Feb, 2014

1 commit


11 Feb, 2014

1 commit

  • Witch to using a preallocated flush_rq for blk-mq similar to what's done
    with the old request path. This allows us to set up the request properly
    with a tag from the actually allowed range and ->rq_disk as needed by
    some drivers. To make life easier we also switch to dynamic allocation
    of ->flush_rq for the old path.

    This effectively reverts most of

    "blk-mq: fix for flush deadlock"

    and

    "blk-mq: Don't reserve a tag for flush request"

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

31 Jan, 2014

1 commit

  • Reserving a tag (request) for flush to avoid dead lock is a overkill. A
    tag is valuable resource. We can track the number of flush requests and
    disallow having too many pending flush requests allocated. With this
    patch, blk_mq_alloc_request_pinned() could do a busy nop (but not a dead
    loop) if too many pending requests are allocated and new flush request
    is allocated. But this should not be a problem, too many pending flush
    requests are very rare case.

    I verified this can fix the deadlock caused by too many pending flush
    requests.

    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     

24 Nov, 2013

2 commits

  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     
  • It was being open coded in a few places.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Neil Brown
    Cc: Chris Mason
    Acked-by: NeilBrown

    Kent Overstreet
     

29 Oct, 2013

1 commit

  • The flush state machine takes in a struct request, which then is
    submitted multiple times to the underling driver. The old block code
    requeses the same request for each of those, so it does not have an
    issue with tapping into the request pool. The new one on the other hand
    allocates a new request for each of the actualy steps of the flush
    sequence. If have already allocated all of the tags for IO, we will
    fail allocating the flush request.

    Set aside a reserved request just for flushes.

    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

25 Oct, 2013

1 commit

  • Linux currently has two models for block devices:

    - The classic request_fn based approach, where drivers use struct
    request units for IO. The block layer provides various helper
    functionalities to let drivers share code, things like tag
    management, timeout handling, queueing, etc.

    - The "stacked" approach, where a driver squeezes in between the
    block layer and IO submitter. Since this bypasses the IO stack,
    driver generally have to manage everything themselves.

    With drivers being written for new high IOPS devices, the classic
    request_fn based driver doesn't work well enough. The design dates
    back to when both SMP and high IOPS was rare. It has problems with
    scaling to bigger machines, and runs into scaling issues even on
    smaller machines when you have IOPS in the hundreds of thousands
    per device.

    The stacked approach is then most often selected as the model
    for the driver. But this means that everybody has to re-invent
    everything, and along with that we get all the problems again
    that the shared approach solved.

    This commit introduces blk-mq, block multi queue support. The
    design is centered around per-cpu queues for queueing IO, which
    then funnel down into x number of hardware submission queues.
    We might have a 1:1 mapping between the two, or it might be
    an N:M mapping. That all depends on what the hardware supports.

    blk-mq provides various helper functions, which include:

    - Scalable support for request tagging. Most devices need to
    be able to uniquely identify a request both in the driver and
    to the hardware. The tagging uses per-cpu caches for freed
    tags, to enable cache hot reuse.

    - Timeout handling without tracking request on a per-device
    basis. Basically the driver should be able to get a notification,
    if a request happens to fail.

    - Optional support for non 1:1 mappings between issue and
    submission queues. blk-mq can redirect IO completions to the
    desired location.

    - Support for per-request payloads. Drivers almost always need
    to associate a request structure with some driver private
    command structure. Drivers can tell blk-mq this at init time,
    and then any request handed to the driver will have the
    required size of memory associated with it.

    - Support for merging of IO, and plugging. The stacked model
    gets neither of these. Even for high IOPS devices, merging
    sequential IO reduces per-command overhead and thus
    increases bandwidth.

    For now, this is provided as a potential 3rd queueing model, with
    the hope being that, as it matures, it can replace both the classic
    and stacked model. That would get us back to having just 1 real
    model for block devices, leaving the stacked approach to dm/md
    devices (as it was originally intended).

    Contributions in this patch from the following people:

    Shaohua Li
    Alexander Gordeev
    Christoph Hellwig
    Mike Christie
    Matias Bjorling
    Jeff Moyer

    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     

23 Mar, 2013

1 commit


15 Feb, 2013

1 commit

  • Using wait_for_completion() for waiting for a IO request to be executed
    results in wrong iowait time accounting. For example, a system having
    the only task doing write() and fdatasync() on a block device can be
    reported being idle instead of iowaiting as it should because
    blkdev_issue_flush() calls wait_for_completion() which in turn calls
    schedule() that does not increment the iowait proc counter and thus does
    not turn on iowait time accounting.

    The patch makes block layer use wait_for_completion_io() instead of
    wait_for_completion() where appropriate to account iowait time
    correctly.

    Signed-off-by: Vladimir Davydov
    Signed-off-by: Jens Axboe

    Vladimir Davydov
     

24 Oct, 2011

1 commit

  • A dm-multipath user reported[1] a problem when trying to boot
    a kernel with commit 4853abaae7e4a2af938115ce9071ef8684fb7af4
    (block: fix flush machinery for stacking drivers with differring
    flush flags) applied. It turns out that an empty flush request
    can be sent into blk_insert_flush. When the BUG_ON was fixed
    to allow for this, I/O on the underlying device would stall. The
    reason is that blk_insert_cloned_request does not kick the queue.
    In the aforementioned commit, I had added a special case to
    kick the queue if data was sent down but the queue flags did
    not require a flush. A better solution is to push the queue
    kick up into blk_insert_cloned_request.

    This patch, along with a follow-on which fixes the BUG_ON, fixes
    the issue reported.

    [1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

    Reported-by: Christophe Saout
    Signed-off-by: Jeff Moyer
    Acked-by: Tejun Heo

    Stable note: 3.1
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Jeff Moyer