09 Mar, 2014

1 commit


22 Feb, 2014

1 commit


11 Feb, 2014

1 commit

  • Witch to using a preallocated flush_rq for blk-mq similar to what's done
    with the old request path. This allows us to set up the request properly
    with a tag from the actually allowed range and ->rq_disk as needed by
    some drivers. To make life easier we also switch to dynamic allocation
    of ->flush_rq for the old path.

    This effectively reverts most of

    "blk-mq: fix for flush deadlock"

    and

    "blk-mq: Don't reserve a tag for flush request"

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

31 Jan, 2014

1 commit

  • Reserving a tag (request) for flush to avoid dead lock is a overkill. A
    tag is valuable resource. We can track the number of flush requests and
    disallow having too many pending flush requests allocated. With this
    patch, blk_mq_alloc_request_pinned() could do a busy nop (but not a dead
    loop) if too many pending requests are allocated and new flush request
    is allocated. But this should not be a problem, too many pending flush
    requests are very rare case.

    I verified this can fix the deadlock caused by too many pending flush
    requests.

    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     

24 Nov, 2013

2 commits

  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     
  • It was being open coded in a few places.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Neil Brown
    Cc: Chris Mason
    Acked-by: NeilBrown

    Kent Overstreet
     

29 Oct, 2013

1 commit

  • The flush state machine takes in a struct request, which then is
    submitted multiple times to the underling driver. The old block code
    requeses the same request for each of those, so it does not have an
    issue with tapping into the request pool. The new one on the other hand
    allocates a new request for each of the actualy steps of the flush
    sequence. If have already allocated all of the tags for IO, we will
    fail allocating the flush request.

    Set aside a reserved request just for flushes.

    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

25 Oct, 2013

1 commit

  • Linux currently has two models for block devices:

    - The classic request_fn based approach, where drivers use struct
    request units for IO. The block layer provides various helper
    functionalities to let drivers share code, things like tag
    management, timeout handling, queueing, etc.

    - The "stacked" approach, where a driver squeezes in between the
    block layer and IO submitter. Since this bypasses the IO stack,
    driver generally have to manage everything themselves.

    With drivers being written for new high IOPS devices, the classic
    request_fn based driver doesn't work well enough. The design dates
    back to when both SMP and high IOPS was rare. It has problems with
    scaling to bigger machines, and runs into scaling issues even on
    smaller machines when you have IOPS in the hundreds of thousands
    per device.

    The stacked approach is then most often selected as the model
    for the driver. But this means that everybody has to re-invent
    everything, and along with that we get all the problems again
    that the shared approach solved.

    This commit introduces blk-mq, block multi queue support. The
    design is centered around per-cpu queues for queueing IO, which
    then funnel down into x number of hardware submission queues.
    We might have a 1:1 mapping between the two, or it might be
    an N:M mapping. That all depends on what the hardware supports.

    blk-mq provides various helper functions, which include:

    - Scalable support for request tagging. Most devices need to
    be able to uniquely identify a request both in the driver and
    to the hardware. The tagging uses per-cpu caches for freed
    tags, to enable cache hot reuse.

    - Timeout handling without tracking request on a per-device
    basis. Basically the driver should be able to get a notification,
    if a request happens to fail.

    - Optional support for non 1:1 mappings between issue and
    submission queues. blk-mq can redirect IO completions to the
    desired location.

    - Support for per-request payloads. Drivers almost always need
    to associate a request structure with some driver private
    command structure. Drivers can tell blk-mq this at init time,
    and then any request handed to the driver will have the
    required size of memory associated with it.

    - Support for merging of IO, and plugging. The stacked model
    gets neither of these. Even for high IOPS devices, merging
    sequential IO reduces per-command overhead and thus
    increases bandwidth.

    For now, this is provided as a potential 3rd queueing model, with
    the hope being that, as it matures, it can replace both the classic
    and stacked model. That would get us back to having just 1 real
    model for block devices, leaving the stacked approach to dm/md
    devices (as it was originally intended).

    Contributions in this patch from the following people:

    Shaohua Li
    Alexander Gordeev
    Christoph Hellwig
    Mike Christie
    Matias Bjorling
    Jeff Moyer

    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     

23 Mar, 2013

1 commit


15 Feb, 2013

1 commit

  • Using wait_for_completion() for waiting for a IO request to be executed
    results in wrong iowait time accounting. For example, a system having
    the only task doing write() and fdatasync() on a block device can be
    reported being idle instead of iowaiting as it should because
    blkdev_issue_flush() calls wait_for_completion() which in turn calls
    schedule() that does not increment the iowait proc counter and thus does
    not turn on iowait time accounting.

    The patch makes block layer use wait_for_completion_io() instead of
    wait_for_completion() where appropriate to account iowait time
    correctly.

    Signed-off-by: Vladimir Davydov
    Signed-off-by: Jens Axboe

    Vladimir Davydov
     

24 Oct, 2011

2 commits

  • A dm-multipath user reported[1] a problem when trying to boot
    a kernel with commit 4853abaae7e4a2af938115ce9071ef8684fb7af4
    (block: fix flush machinery for stacking drivers with differring
    flush flags) applied. It turns out that an empty flush request
    can be sent into blk_insert_flush. When the BUG_ON was fixed
    to allow for this, I/O on the underlying device would stall. The
    reason is that blk_insert_cloned_request does not kick the queue.
    In the aforementioned commit, I had added a special case to
    kick the queue if data was sent down but the queue flags did
    not require a flush. A better solution is to push the queue
    kick up into blk_insert_cloned_request.

    This patch, along with a follow-on which fixes the BUG_ON, fixes
    the issue reported.

    [1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

    Reported-by: Christophe Saout
    Signed-off-by: Jeff Moyer
    Acked-by: Tejun Heo

    Stable note: 3.1
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Jeff Moyer
     
  • A user reported a regression due to commit
    4853abaae7e4a2af938115ce9071ef8684fb7af4 (block: fix flush
    machinery for stacking drivers with differring flush flags).
    Part of the problem is that blk_insert_flush required a
    single bio be attached to the request. In reality, having
    no attached bio is also a valid case, as can be observed with
    an empty flush.

    [1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

    Reported-by: Christophe Saout
    Signed-off-by: Jeff Moyer

    Stable note: 3.1
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

16 Aug, 2011

1 commit

  • Commit ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae, block: reimplement
    FLUSH/FUA to support merge, introduced a performance regression when
    running any sort of fsyncing workload using dm-multipath and certain
    storage (in our case, an HP EVA). The test I ran was fs_mark, and it
    dropped from ~800 files/sec on ext4 to ~100 files/sec. It turns out
    that dm-multipath always advertised flush+fua support, and passed
    commands on down the stack, where those flags used to get stripped off.
    The above commit changed that behavior:

    static inline struct request *__elv_next_request(struct request_queue *q)
    {
    struct request *rq;

    while (1) {
    - while (!list_empty(&q->queue_head)) {
    + if (!list_empty(&q->queue_head)) {
    rq = list_entry_rq(q->queue_head.next);
    - if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
    - (rq->cmd_flags & REQ_FLUSH_SEQ))
    - return rq;
    - rq = blk_do_flush(q, rq);
    - if (rq)
    - return rq;
    + return rq;
    }

    Note that previously, a command would come in here, have
    REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:

    struct request *blk_do_flush(struct request_queue *q, struct request *rq)
    {
    unsigned int fflags = q->flush_flags; /* may change, cache it */
    bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
    bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
    bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
    REQ_FUA);
    unsigned skip = 0;
    ...
    if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
    rq->cmd_flags &= ~REQ_FLUSH;
    if (!has_fua)
    rq->cmd_flags &= ~REQ_FUA;
    return rq;
    }

    So, the flush machinery was bypassed in such cases (q->flush_flags == 0
    && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).

    Now, however, we don't get into the flush machinery at all. Instead,
    __elv_next_request just hands a request with flush and fua bits set to
    the scsi_request_fn, even if the underlying request_queue does not
    support flush or fua.

    The agreed upon approach is to fix the flush machinery to allow
    stacking. While this isn't used in practice (since there is only one
    request-based dm target, and that target will now reflect the flush
    flags of the underlying device), it does future-proof the solution, and
    make it function as designed.

    In order to make this work, I had to add a field to the struct request,
    inside the flush structure (to store the original req->end_io). Shaohua
    had suggested overloading the union with rb_node and completion_data,
    but the completion data is used by device mapper and can also be used by
    other drivers. So, I didn't see a way around the additional field.

    I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
    the lost performance. Comments and other testers, as always, are
    appreciated.

    Cheers,
    Jeff

    Signed-off-by: Jeff Moyer
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

10 Aug, 2011

1 commit

  • blk_insert_flush has the following check:

    /*
    * If there's data but flush is not necessary, the request can be
    * processed directly without going through flush machinery. Queue
    * for normal execution.
    */
    if ((policy & REQ_FSEQ_DATA) &&
    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
    list_add_tail(&rq->queuelist, &q->queue_head);
    return;
    }

    However, blk_flush_policy will not return with policy set to only
    REQ_FSEQ_DATA:

    static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq)
    {
    unsigned int policy = 0;

    if (fflags & REQ_FLUSH) {
    if (rq->cmd_flags & REQ_FLUSH)
    policy |= REQ_FSEQ_PREFLUSH;
    if (blk_rq_sectors(rq))
    policy |= REQ_FSEQ_DATA;
    if (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA))
    policy |= REQ_FSEQ_POSTFLUSH;
    }
    return policy;
    }

    Notice that REQ_FSEQ_DATA is only set if REQ_FLUSH is set. Fix this
    mismatch by moving the setting of REQ_FSEQ_DATA outside of the REQ_FLUSH
    check.

    Tejun notes:

    Hmmm... yes, this can become a correctness issue if (and only if)
    blk_queue_flush() is called to change q->flush_flags while requests
    are in-flight; otherwise, requests wouldn't reach the function at all.
    Also, I think it would be a generally good idea to always set
    FSEQ_DATA if the request has data.

    Cheers,
    Jeff

    Signed-off-by: Jeff Moyer
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

07 May, 2011

1 commit

  • In some drives, flush requests are non-queueable. When flush request is
    running, normal read/write requests can't run. If block layer dispatches
    such request, driver can't handle it and requeue it. Tejun suggested we
    can hold the queue when flush is running. This can avoid unnecessary
    requeue. Also this can improve performance. For example, we have
    request flush1, write1, flush 2. flush1 is dispatched, then queue is
    hold, write1 isn't inserted to queue. After flush1 is finished, flush2
    will be dispatched. Since disk cache is already clean, flush2 will be
    finished very soon, so looks like flush2 is folded to flush1.

    In my test, the queue holding completely solves a regression introduced by
    commit 53d63e6b0dfb95882ec0219ba6bbd50cde423794:

    block: make the flush insertion use the tail of the dispatch list

    It's not a preempt type request, in fact we have to insert it
    behind requests that do specify INSERT_FRONT.

    which causes about 20% regression running a sysbench fileio
    workload.

    Stable: 2.6.39 only

    Cc: stable@kernel.org
    Signed-off-by: Shaohua Li
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    shaohua.li@intel.com
     

18 Apr, 2011

1 commit

  • Instead of overloading __blk_run_queue to force an offload to kblockd
    add a new blk_run_queue_async helper to do it explicitly. I've kept
    the blk_queue_stopped check for now, but I suspect it's not needed
    as the check we do when the workqueue items runs should be enough.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

06 Apr, 2011

2 commits


10 Mar, 2011

3 commits

  • Conflicts:
    block/blk-core.c
    block/blk-flush.c
    drivers/md/raid1.c
    drivers/md/raid10.c
    drivers/md/raid5.c
    fs/nilfs2/btnode.c
    fs/nilfs2/mdt.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This patch adds support for creating a queuing context outside
    of the queue itself. This enables us to batch up pieces of IO
    before grabbing the block device queue lock and submitting them to
    the IO scheduler.

    The context is created on the stack of the process and assigned in
    the task structure, so that we can auto-unplug it if we hit a schedule
    event.

    The current queue plugging happens implicitly if IO is submitted to
    an empty device, yet callers have to remember to unplug that IO when
    they are going to wait for it. This is an ugly API and has caused bugs
    in the past. Additionally, it requires hacks in the vm (->sync_page()
    callback) to handle that logic. By switching to an explicit plugging
    scheme we make the API a lot nicer and can get rid of the ->sync_page()
    hack in the vm.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

05 Mar, 2011

1 commit

  • This merge creates two set of conflicts. One is simple context
    conflicts caused by removal of throtl_scheduled_delayed_work() in
    for-linus and removal of throtl_shutdown_timer_wq() in
    for-2.6.39/core.

    The other is caused by commit 255bb490c8 (block: blk-flush shouldn't
    call directly into q->request_fn() __blk_run_queue()) in for-linus
    crashing with FLUSH reimplementation in for-2.6.39/core. The conflict
    isn't trivial but the resolution is straight-forward.

    * __blk_run_queue() calls in flush_end_io() and flush_data_end_io()
    should be called with @force_kblockd set to %true.

    * elv_insert() in blk_kick_flush() should use
    %ELEVATOR_INSERT_REQUEUE.

    Both changes are to avoid invoking ->request_fn() directly from
    request completion path and closely match the changes in the commit
    255bb490c8.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

02 Mar, 2011

2 commits

  • blk-flush decomposes a flush into sequence of multiple requests. On
    completion of a request, the next one is queued; however, block layer
    must not implicitly call into q->request_fn() directly from completion
    path. This makes the queue behave unexpectedly when seen from the
    drivers and violates the assumption that q->request_fn() is called
    with process context + queue_lock.

    This patch makes blk-flush the following two changes to make sure
    q->request_fn() is not called directly from request completion path.

    - blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always
    use kblockd instead of calling directly into q->request_fn().

    - queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of
    ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the
    request queue directly.

    Reported by Jan in the following threads.

    http://thread.gmane.org/gmane.linux.ide/48778
    http://thread.gmane.org/gmane.linux.ide/48786

    stable: applicable to v2.6.37.

    Signed-off-by: Tejun Heo
    Reported-by: Jan Beulich
    Cc: "David S. Miller"
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • __blk_run_queue() automatically either calls q->request_fn() directly
    or schedules kblockd depending on whether the function is recursed.
    blk-flush implementation needs to be able to explicitly choose
    kblockd. Add @force_kblockd.

    All the current users are converted to specify %false for the
    parameter and this patch doesn't introduce any behavior change.

    stable: This is prerequisite for fixing ide oops caused by the new
    blk-flush implementation.

    Signed-off-by: Tejun Heo
    Cc: Jan Beulich
    Cc: James Bottomley
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Tejun Heo
     

25 Jan, 2011

2 commits

  • The current FLUSH/FUA support has evolved from the implementation
    which had to perform queue draining. As such, sequencing is done
    queue-wide one flush request after another. However, with the
    draining requirement gone, there's no reason to keep the queue-wide
    sequential approach.

    This patch reimplements FLUSH/FUA support such that each FLUSH/FUA
    request is sequenced individually. The actual FLUSH execution is
    double buffered and whenever a request wants to execute one for either
    PRE or POSTFLUSH, it queues on the pending queue. Once certain
    conditions are met, a flush request is issued and on its completion
    all pending requests proceed to the next sequence.

    This allows arbitrary merging of different type of flushes. How they
    are merged can be primarily controlled and tuned by adjusting the
    above said 'conditions' used to determine when to issue the next
    flush.

    This is inspired by Darrick's patches to merge multiple zero-data
    flushes which helps workloads with highly concurrent fsync requests.

    * As flush requests are never put on the IO scheduler, request fields
    used for flush share space with rq->rb_node. rq->completion_data is
    moved out of the union. This increases the request size by one
    pointer.

    As rq->elevator_private* are used only by the iosched too, it is
    possible to reduce the request size further. However, to do that,
    we need to modify request allocation path such that iosched data is
    not allocated for flush requests.

    * FLUSH/FUA processing happens on insertion now instead of dispatch.

    - Comments updated as per Vivek and Mike.

    Signed-off-by: Tejun Heo
    Cc: "Darrick J. Wong"
    Cc: Shaohua Li
    Cc: Christoph Hellwig
    Cc: Vivek Goyal
    Cc: Mike Snitzer
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • rq == &q->flush_rq was used to determine whether a rq is part of a
    flush sequence, which worked because all requests in a flush sequence
    were sequenced using the single dedicated request. This is about to
    change, so introduce REQ_FLUSH_SEQ flag to distinguish flush sequence
    requests.

    This patch doesn't cause any behavior change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

17 Sep, 2010

1 commit

  • All the blkdev_issue_* helpers can only sanely be used for synchronous
    caller. To issue cache flushes or barriers asynchronously the caller needs
    to set up a bio by itself with a completion callback to move the asynchronous
    state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always
    specified when calling blkdev_issue_* and also remove the now unused flags
    argument to blkdev_issue_flush and blkdev_issue_zeroout. For
    blkdev_issue_discard we need to keep it for the secure discard flag, which
    gains a more descriptive name and loses the bitops vs flag confusion.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

10 Sep, 2010

8 commits

  • Update blkdev_issue_flush() to use new REQ_FLUSH interface.

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • rq->rq_disk and bio->bi_bdev->bd_disk may differ if a request has
    passed through remapping drivers. FSEQ_DATA request incorrectly
    followed bio->bi_bdev->bd_disk ending up being issued w/ mismatching
    rq_disk. Make it follow orig_rq->rq_disk.

    Signed-off-by: Tejun Heo
    Reported-by: Kiyoshi Ueda
    Tested-by: Kiyoshi Ueda
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • While completing a request from a REQ_FLUSH/FUA sequence, another
    request can be pushed to the request queue. If a driver tests
    elv_queue_empty() before completing a request and runs the queue again
    only if the queue wasn't empty, this may lead to hang. Please note
    that most drivers either kick the queue unconditionally or test queue
    emptiness after completing the current request and don't have this
    problem.

    This patch removes this possibility by making REQ_FLUSH/FUA sequence
    code kick the queue if the queue was empty before completing a request
    from REQ_FLUSH/FUA sequence.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • init_flush_request() only set REQ_FLUSH when initializing flush
    requests making them READ requests. Use WRITE_FLUSH instead.

    Signed-off-by: Tejun Heo
    Reported-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • We need to call blk_rq_init and elv_insert for all cases in queue_next_fseq,
    so take these calls into common code. Also move the end_io initialization
    from queue_flush into queue_next_fseq and rename queue_flush to
    init_flush_request now that it's old name doesn't apply anymore.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Now that the backend conversion is complete, export sequenced
    FLUSH/FUA capability through REQ_FLUSH/FUA flags. REQ_FLUSH means the
    device cache should be flushed before executing the request. REQ_FUA
    means that the data in the request should be on non-volatile media on
    completion.

    Block layer will choose the correct way of implementing the semantics
    and execute it. The request may be passed to the device directly if
    the device can handle it; otherwise, it will be sequenced using one or
    more proxy requests. Devices will never see REQ_FLUSH and/or FUA
    which it doesn't support.

    Also, unlike the original REQ_HARDBARRIER, REQ_FLUSH/FUA requests are
    never failed with -EOPNOTSUPP. If the underlying device doesn't
    support FLUSH/FUA, the block layer simply make those noop. IOW, it no
    longer distinguishes between writeback cache which doesn't support
    cache flush and writethrough/no cache. Devices which have WB cache
    w/o flush are very difficult to come by these days and there's nothing
    much we can do anyway, so it doesn't make sense to require everyone to
    implement -EOPNOTSUPP handling. This will simplify filesystems and
    block drivers as they can drop -EOPNOTSUPP retry logic for barriers.

    * QUEUE_ORDERED_* are removed and QUEUE_FSEQ_* are moved into
    blk-flush.c.

    * REQ_FLUSH w/o data can also be directly passed to drivers without
    sequencing but some drivers assume that zero length requests don't
    have rq->bio which isn't true for these requests requiring the use
    of proxy requests.

    * REQ_COMMON_MASK now includes REQ_FLUSH | REQ_FUA so that they are
    copied from bio to request.

    * WRITE_BARRIER is marked deprecated and WRITE_FLUSH, WRITE_FUA and
    WRITE_FLUSH_FUA are added.

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • With ordering requirements dropped, barrier and ordered are misnomers.
    Now all block layer does is sequencing FLUSH and FUA. Rename them to
    flush.

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Without ordering requirements, barrier and ordering are minomers.
    Rename block/blk-barrier.c to block/blk-flush.c. Rename of symbols
    will follow.

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo