Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

26 Sep, 2014

9 commits

f70ced091 blk-mq: support per-distpatch_queue flush machinery ... Browse Code »
5

This patch supports to run one single flush machinery for
each blk-mq dispatch queue, so that:

- current init_request and exit_request callbacks can
cover flush request too, then the buggy copying way of
initializing flush request's pdu can be fixed

- flushing performance gets improved in case of multi hw-queue

In fio sync write test over virtio-blk(4 hw queues, ioengine=sync,
iodepth=64, numjobs=4, bs=4K), it is observed that througput gets
increased a lot over my test environment:
- throughput: +70% in case of virtio-blk over null_blk
- throughput: +30% in case of virtio-blk over SSD image

The multi virtqueue feature isn't merged to QEMU yet, and patches for
the feature can be found in below tree:

git://kernel.ubuntu.com/ming/qemu.git v2.1.0-mq.4

And simply passing 'num_queues=4 vectors=5' should be enough to
enable multi queue(quad queue) feature for QEMU virtio-blk.

Suggested-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:45 +0800
e97c293cd block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue ... Browse Code »

This patch adds 'blk_mq_ctx' parameter to blk_get_flush_queue(),
so that this function can find the corresponding blk_flush_queue
bound with current mq context since the flush queue will become
per hw-queue.

For legacy queue, the parameter can be simply 'NULL'.

For multiqueue case, the parameter should be set as the context
from which the related request is originated. With this context
info, the hw queue and related flush queue can be found easily.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:44 +0800
0bae352da block: flush: avoid to figure out flush queue unnecessarily ... Browse Code »

Just figuring out flush queue at the entry of kicking off flush
machinery and request's completion handler, then pass it through.

Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:42 +0800
ba483388e block: remove blk_init_flush() and its pair ... Browse Code »

Now mission of the two helpers is over, and just call
blk_alloc_flush_queue() and blk_free_flush_queue() directly.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:41 +0800
7c94e1c15 block: introduce blk_flush_queue to drive flush machinery ... Browse Code »

This patch introduces 'struct blk_flush_queue' and puts all
flush machinery related fields into this structure, so that

- flush implementation details aren't exposed to driver
- it is easy to convert to per dispatch-queue flush machinery

This patch is basically a mechanical replacement.

Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:40 +0800
7ddab5de5 block: avoid to use q->flush_rq directly ... Browse Code »

This patch trys to use local variable to access flush request,
so that we can convert to per-queue flush machinery a bit easier.

Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:38 +0800
3c09676c1 block: move flush initialization to blk_flush_init ... Browse Code »

These fields are always used with the flush request, so
initialize them together.

Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:37 +0800
f35526557 block: introduce blk_init_flush and its pair ... Browse Code »

These two temporary functions are introduced for holding flush
initialization and de-initialization, so that we can
introduce 'flush queue' easier in the following patch. And
once 'flush queue' and its allocation/free functions are ready,
they will be removed for sake of code readability.

Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:35 +0800
1bcb1eada blk-mq: allocate flush_rq in blk_mq_init_flush() ... Browse Code »

It is reasonable to allocate flush req in blk_mq_init_flush().

Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:34 +0800

23 Sep, 2014

2 commits

2edd2c740 blk-mq: remove unnecessary blk_clear_rq_complete() ... Browse Code »

This patch removes two unnecessary blk_clear_rq_complete(),
the REQ_ATOM_COMPLETE flag is cleared inside blk_mq_start_request(),
so:

- The blk_clear_rq_complete() in blk_flush_restore_request()
needn't because the request will be freed later, and clearing
it here may open a small race window with timeout.

- The blk_clear_rq_complete() in blk_mq_requeue_request() isn't
necessary too, even though REQ_ATOM_STARTED is cleared in
__blk_mq_requeue_request(), in theory it still may cause a small
race window with timeout since the two clear_bit() may be
reordered.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-23 02:00:07 +0800
c8a446ad6 blk-mq: rename blk_mq_end_io to blk_mq_end_request ... Browse Code »

Now that we've changed the driver API on the submission side use the
opportunity to fix up the name on the completion side to fit into the
general scheme.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-09-23 02:00:07 +0800

12 Jun, 2014

1 commit

2940474af block: remove elv_abort_queue and blk_abort_flushes ... Browse Code »

elv_abort_queue has no callers, and blk_abort_flushes is only called by
elv_abort_queue.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-06-12 05:31:21 +0800

05 Jun, 2014

1 commit

14b83e172 block: mq flush: clear flush_rq's tag in flush_end_io() ... Browse Code »

blk_mq_tag_to_rq() needs to be able to tell if it should return
the original request, or the flush request if we are doing a flush
sequence. Clear the flush tag when IO completes for a flush, since
that is what we are comparing against.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-06-05 00:40:16 +0800

30 May, 2014

1 commit

223023750 blk-mq: blk_mq_tag_to_rq should handle flush request ... Browse Code »

flush request is special, which borrows the tag from the parent
request. Hence blk_mq_tag_to_rq needs special handling to return
the flush request from the tag.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2014-05-30 22:06:42 +0800

28 May, 2014

1 commit

6fca6a611 blk-mq: add helper to insert requests from irq context ... Browse Code »

Both the cache flush state machine and the SCSI midlayer want to submit
requests from irq context, and the current per-request requeue_work
unfortunately causes corruption due to sharing with the csd field for
flushes. Replace them with a per-request_queue list of requests to
be requeued.

Based on an earlier test by Ming Lei.

Signed-off-by: Christoph Hellwig
Reported-by: Ming Lei
Tested-by: Ming Lei
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-05-28 22:08:02 +0800

17 Apr, 2014

1 commit

f88a164b7 blk-mq: rename mq_flush_work struct request member ... Browse Code »

We will use this work_struct to requeue scsi commands from the
completion handler as well, so give it a more generic name.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-04-17 04:15:25 +0800

16 Apr, 2014

2 commits

8727af4b9 blk-mq: make ->flush_rq fully transparent to drivers ... Browse Code »

Drivers shouldn't have to care about the block layer setting aside a
request to implement the flush state machine. We already override the
mq context and tag to make it more transparent, but so far haven't deal
with the driver private data in the request. Make sure to override this
as well, and while we're at it add a proper helper sitting in blk-mq.c
that implements the full impersonation.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-04-16 04:03:02 +0800
9d74e2573 blk-mq: do not initialize req->special ... Browse Code »

Drivers can reach their private data easily using the blk_mq_rq_to_pdu
helper and don't need req->special. By not initializing it code can
be simplified nicely, and we also shave off a few more instructions from
the I/O path.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-04-16 04:03:02 +0800

10 Apr, 2014

1 commit

59c3d45e4 block: remove 'q' parameter from kblockd_schedule_*_work() ... Browse Code »

The queue parameter is never used, just get rid of it.

Signed-off-by: Jens Axboe

Jens Axboe
2014-04-10 00:17:00 +0800

09 Mar, 2014

1 commit

10beafc19 block: change flush sequence list addition back to front add ... Browse Code »

Commit 18741986 inadvertently changed the rq flush insertion
from a head to a tail insertion. Fix that back up.

Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mike Snitzer
2014-03-09 11:31:31 +0800

22 Feb, 2014

1 commit

feb71dae1 blk-mq: merge blk_mq_insert_request and blk_mq_run_request ... Browse Code »

It's almost identical to blk_mq_insert_request, so fold the two into one
slightly more generic function by making the flush special case a bit
smarted.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-02-22 00:58:48 +0800

11 Feb, 2014

1 commit

18741986a blk-mq: rework flush sequencing logic ... Browse Code »

Witch to using a preallocated flush_rq for blk-mq similar to what's done
with the old request path. This allows us to set up the request properly
with a tag from the actually allowed range and ->rq_disk as needed by
some drivers. To make life easier we also switch to dynamic allocation
of ->flush_rq for the old path.

This effectively reverts most of

"blk-mq: fix for flush deadlock"

and

"blk-mq: Don't reserve a tag for flush request"

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-02-11 00:29:00 +0800

31 Jan, 2014

1 commit

f0276924f blk-mq: Don't reserve a tag for flush request ... Browse Code »

Reserving a tag (request) for flush to avoid dead lock is a overkill. A
tag is valuable resource. We can track the number of flush requests and
disallow having too many pending flush requests allocated. With this
patch, blk_mq_alloc_request_pinned() could do a busy nop (but not a dead
loop) if too many pending requests are allocated and new flush request
is allocated. But this should not be a problem, too many pending flush
requests are very rare case.

I verified this can fix the deadlock caused by too many pending flush
requests.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2014-01-31 03:57:25 +0800

24 Nov, 2013

2 commits

4f024f379 block: Abstract out bvec iterator ... Browse Code »
13

Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Geert Uytterhoeven
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "Ed L. Cashin"
Cc: Nick Piggin
Cc: Lars Ellenberg
Cc: Jiri Kosina
Cc: Matthew Wilcox
Cc: Geoff Levand
Cc: Yehuda Sadeh
Cc: Sage Weil
Cc: Alex Elder
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris
Cc: Philip Kelleher
Cc: Rusty Russell
Cc: "Michael S. Tsirkin"
Cc: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge
Cc: Neil Brown
Cc: Alasdair Kergon
Cc: Mike Snitzer
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh
Cc: Benny Halevy
Cc: "James E.J. Bottomley"
Cc: Greg Kroah-Hartman
Cc: "Nicholas A. Bellinger"
Cc: Alexander Viro
Cc: Chris Mason
Cc: "Theodore Ts'o"
Cc: Andreas Dilger
Cc: Jaegeuk Kim
Cc: Steven Whitehouse
Cc: Dave Kleikamp
Cc: Joern Engel
Cc: Prasad Joshi
Cc: Trond Myklebust
Cc: KONISHI Ryusuke
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Ben Myers
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Len Brown
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: Herton Ronaldo Krzesinski
Cc: Ben Hutchings
Cc: Andrew Morton
Cc: Guo Chao
Cc: Tejun Heo
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Wei Yongjun
Cc: "Roger Pau Monné"
Cc: Jan Beulich
Cc: Stefano Stabellini
Cc: Ian Campbell
Cc: Sebastian Ott
Cc: Christian Borntraeger
Cc: Minchan Kim
Cc: Jiang Liu
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Joe Perches
Cc: Peng Tao
Cc: Andy Adamson
Cc: fanchaoting
Cc: Jie Liu
Cc: Sunil Mushran
Cc: "Martin K. Petersen"
Cc: Namjae Jeon
Cc: Pankaj Kumar
Cc: Dan Magenheimer
Cc: Mel Gorman 6

Kent Overstreet
2013-11-24 14:33:47 +0800
33879d451 block: submit_bio_wait() conversions ... Browse Code »

It was being open coded in a few places.

Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Joern Engel
Cc: Prasad Joshi
Cc: Neil Brown
Cc: Chris Mason
Acked-by: NeilBrown

Kent Overstreet
2013-11-24 14:33:38 +0800

29 Oct, 2013

1 commit

3228f48be blk-mq: fix for flush deadlock ... Browse Code »

The flush state machine takes in a struct request, which then is
submitted multiple times to the underling driver. The old block code
requeses the same request for each of those, so it does not have an
issue with tapping into the request pool. The new one on the other hand
allocates a new request for each of the actualy steps of the flush
sequence. If have already allocated all of the tags for IO, we will
fail allocating the flush request.

Set aside a reserved request just for flushes.

Signed-off-by: Jens Axboe

Christoph Hellwig
2013-10-29 03:33:58 +0800

25 Oct, 2013

1 commit

320ae51fe blk-mq: new multi-queue block IO queueing mechanism ... Browse Code »
2

Linux currently has two models for block devices:

- The classic request_fn based approach, where drivers use struct
request units for IO. The block layer provides various helper
functionalities to let drivers share code, things like tag
management, timeout handling, queueing, etc.

- The "stacked" approach, where a driver squeezes in between the
block layer and IO submitter. Since this bypasses the IO stack,
driver generally have to manage everything themselves.

With drivers being written for new high IOPS devices, the classic
request_fn based driver doesn't work well enough. The design dates
back to when both SMP and high IOPS was rare. It has problems with
scaling to bigger machines, and runs into scaling issues even on
smaller machines when you have IOPS in the hundreds of thousands
per device.

The stacked approach is then most often selected as the model
for the driver. But this means that everybody has to re-invent
everything, and along with that we get all the problems again
that the shared approach solved.

This commit introduces blk-mq, block multi queue support. The
design is centered around per-cpu queues for queueing IO, which
then funnel down into x number of hardware submission queues.
We might have a 1:1 mapping between the two, or it might be
an N:M mapping. That all depends on what the hardware supports.

blk-mq provides various helper functions, which include:

- Scalable support for request tagging. Most devices need to
be able to uniquely identify a request both in the driver and
to the hardware. The tagging uses per-cpu caches for freed
tags, to enable cache hot reuse.

- Timeout handling without tracking request on a per-device
basis. Basically the driver should be able to get a notification,
if a request happens to fail.

- Optional support for non 1:1 mappings between issue and
submission queues. blk-mq can redirect IO completions to the
desired location.

- Support for per-request payloads. Drivers almost always need
to associate a request structure with some driver private
command structure. Drivers can tell blk-mq this at init time,
and then any request handed to the driver will have the
required size of memory associated with it.

- Support for merging of IO, and plugging. The stacked model
gets neither of these. Even for high IOPS devices, merging
sequential IO reduces per-command overhead and thus
increases bandwidth.

For now, this is provided as a potential 3rd queueing model, with
the hope being that, as it matures, it can replace both the classic
and stacked model. That would get us back to having just 1 real
model for block devices, leaving the stacked approach to dm/md
devices (as it was originally intended).

Contributions in this patch from the following people:

Shaohua Li
Alexander Gordeev
Christoph Hellwig
Mike Christie
Matias Bjorling
Jeff Moyer

Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jens Axboe
2013-10-25 18:56:00 +0800

23 Mar, 2013

1 commit

f2fc7d0ed Block: blk-flush: Fixed indent code style ... Browse Code »

Fixed code indent should use tabs where possible.

Signed-off-by: Alice Ferrazzi
Signed-off-by: Jens Axboe

Alice Ferrazzi
2013-03-23 02:22:51 +0800

15 Feb, 2013

1 commit

5577022f4 block: account iowait time when waiting for completion of IO request ... Browse Code »

Using wait_for_completion() for waiting for a IO request to be executed
results in wrong iowait time accounting. For example, a system having
the only task doing write() and fdatasync() on a block device can be
reported being idle instead of iowaiting as it should because
blkdev_issue_flush() calls wait_for_completion() which in turn calls
schedule() that does not increment the iowait proc counter and thus does
not turn on iowait time accounting.

The patch makes block layer use wait_for_completion_io() instead of
wait_for_completion() where appropriate to account iowait time
correctly.

Signed-off-by: Vladimir Davydov
Signed-off-by: Jens Axboe

Vladimir Davydov
2013-02-15 23:45:07 +0800

24 Oct, 2011

2 commits

e67b77c79 blk-flush: move the queue kick into ... Browse Code »

A dm-multipath user reported[1] a problem when trying to boot
a kernel with commit 4853abaae7e4a2af938115ce9071ef8684fb7af4
(block: fix flush machinery for stacking drivers with differring
flush flags) applied. It turns out that an empty flush request
can be sent into blk_insert_flush. When the BUG_ON was fixed
to allow for this, I/O on the underlying device would stall. The
reason is that blk_insert_cloned_request does not kick the queue.
In the aforementioned commit, I had added a special case to
kick the queue if data was sent down but the queue flags did
not require a flush. A better solution is to push the queue
kick up into blk_insert_cloned_request.

This patch, along with a follow-on which fixes the BUG_ON, fixes
the issue reported.

[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

Reported-by: Christophe Saout
Signed-off-by: Jeff Moyer
Acked-by: Tejun Heo

Stable note: 3.1
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe

Jeff Moyer
2011-10-24 22:24:31 +0800
834f9f61a blk-flush: fix invalid BUG_ON in blk_insert_flush ... Browse Code »

A user reported a regression due to commit
4853abaae7e4a2af938115ce9071ef8684fb7af4 (block: fix flush
machinery for stacking drivers with differring flush flags).
Part of the problem is that blk_insert_flush required a
single bio be attached to the request. In reality, having
no attached bio is also a valid case, as can be observed with
an empty flush.

[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

Reported-by: Christophe Saout
Signed-off-by: Jeff Moyer

Stable note: 3.1
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe

Jeff Moyer
2011-10-24 22:24:30 +0800

16 Aug, 2011

1 commit

4853abaae block: fix flush machinery for stacking drivers with differring flush flags ... Browse Code »

Commit ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae, block: reimplement
FLUSH/FUA to support merge, introduced a performance regression when
running any sort of fsyncing workload using dm-multipath and certain
storage (in our case, an HP EVA). The test I ran was fs_mark, and it
dropped from ~800 files/sec on ext4 to ~100 files/sec. It turns out
that dm-multipath always advertised flush+fua support, and passed
commands on down the stack, where those flags used to get stripped off.
The above commit changed that behavior:

static inline struct request *__elv_next_request(struct request_queue *q)
{
struct request *rq;

while (1) {
- while (!list_empty(&q->queue_head)) {
+ if (!list_empty(&q->queue_head)) {
rq = list_entry_rq(q->queue_head.next);
- if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
- (rq->cmd_flags & REQ_FLUSH_SEQ))
- return rq;
- rq = blk_do_flush(q, rq);
- if (rq)
- return rq;
+ return rq;
}

Note that previously, a command would come in here, have
REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:

struct request *blk_do_flush(struct request_queue *q, struct request *rq)
{
unsigned int fflags = q->flush_flags; /* may change, cache it */
bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
REQ_FUA);
unsigned skip = 0;
...
if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
rq->cmd_flags &= ~REQ_FLUSH;
if (!has_fua)
rq->cmd_flags &= ~REQ_FUA;
return rq;
}

So, the flush machinery was bypassed in such cases (q->flush_flags == 0
&& rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).

Now, however, we don't get into the flush machinery at all. Instead,
__elv_next_request just hands a request with flush and fua bits set to
the scsi_request_fn, even if the underlying request_queue does not
support flush or fua.

The agreed upon approach is to fix the flush machinery to allow
stacking. While this isn't used in practice (since there is only one
request-based dm target, and that target will now reflect the flush
flags of the underlying device), it does future-proof the solution, and
make it function as designed.

In order to make this work, I had to add a field to the struct request,
inside the flush structure (to store the original req->end_io). Shaohua
had suggested overloading the union with rb_node and completion_data,
but the completion data is used by device mapper and can also be used by
other drivers. So, I didn't see a way around the additional field.

I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
the lost performance. Comments and other testers, as always, are
appreciated.

Cheers,
Jeff

Signed-off-by: Jeff Moyer
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Jeff Moyer
2011-08-16 03:37:25 +0800

10 Aug, 2011

1 commit

fa1bf42ff allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH ... Browse Code »

blk_insert_flush has the following check:

/*
* If there's data but flush is not necessary, the request can be
* processed directly without going through flush machinery. Queue
* for normal execution.
*/
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
list_add_tail(&rq->queuelist, &q->queue_head);
return;
}

However, blk_flush_policy will not return with policy set to only
REQ_FSEQ_DATA:

static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq)
{
unsigned int policy = 0;

if (fflags & REQ_FLUSH) {
if (rq->cmd_flags & REQ_FLUSH)
policy |= REQ_FSEQ_PREFLUSH;
if (blk_rq_sectors(rq))
policy |= REQ_FSEQ_DATA;
if (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA))
policy |= REQ_FSEQ_POSTFLUSH;
}
return policy;
}

Notice that REQ_FSEQ_DATA is only set if REQ_FLUSH is set. Fix this
mismatch by moving the setting of REQ_FSEQ_DATA outside of the REQ_FLUSH
check.

Tejun notes:

Hmmm... yes, this can become a correctness issue if (and only if)
blk_queue_flush() is called to change q->flush_flags while requests
are in-flight; otherwise, requests wouldn't reach the function at all.
Also, I think it would be a generally good idea to always set
FSEQ_DATA if the request has data.

Cheers,
Jeff

Signed-off-by: Jeff Moyer
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Jeff Moyer
2011-08-10 02:32:09 +0800

07 May, 2011

1 commit

3ac0cc450 block: hold queue if flush is running for non-queueable flush drive ... Browse Code »

In some drives, flush requests are non-queueable. When flush request is
running, normal read/write requests can't run. If block layer dispatches
such request, driver can't handle it and requeue it. Tejun suggested we
can hold the queue when flush is running. This can avoid unnecessary
requeue. Also this can improve performance. For example, we have
request flush1, write1, flush 2. flush1 is dispatched, then queue is
hold, write1 isn't inserted to queue. After flush1 is finished, flush2
will be dispatched. Since disk cache is already clean, flush2 will be
finished very soon, so looks like flush2 is folded to flush1.

In my test, the queue holding completely solves a regression introduced by
commit 53d63e6b0dfb95882ec0219ba6bbd50cde423794:

block: make the flush insertion use the tail of the dispatch list

It's not a preempt type request, in fact we have to insert it
behind requests that do specify INSERT_FRONT.

which causes about 20% regression running a sysbench fileio
workload.

Stable: 2.6.39 only

Cc: stable@kernel.org
Signed-off-by: Shaohua Li
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

shaohua.li@intel.com
2011-05-07 01:36:25 +0800

18 Apr, 2011

1 commit

24ecfbe27 block: add blk_run_queue_async ... Browse Code »

Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly. I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2011-04-18 17:41:33 +0800

06 Apr, 2011

2 commits

53d63e6b0 block: make the flush insertion use the tail of the dispatch list ... Browse Code »

It's not a preempt type request, in fact we have to insert it
behind requests that do specify INSERT_FRONT.

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-06 05:51:37 +0800
b710a4805 block: get rid of elv_insert() interface ... Browse Code »

Merge it with __elv_add_request(), it's pretty pointless to
have a function with only two callers. The main interface
is elv_add_request()/__elv_add_request().

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-06 05:51:37 +0800

10 Mar, 2011

3 commits

4c63f5646 Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core ... Browse Code »

Conflicts:
block/blk-core.c
block/blk-flush.c
drivers/md/raid1.c
drivers/md/raid10.c
drivers/md/raid5.c
fs/nilfs2/btnode.c
fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:58:35 +0800
7eaceacca block: remove per-queue plugging ... Browse Code »
30

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800
73c101011 block: initial patch for on-stack per-task plugging ... Browse Code »

This patch adds support for creating a queuing context outside
of the queue itself. This enables us to batch up pieces of IO
before grabbing the block device queue lock and submitting them to
the IO scheduler.

The context is created on the stack of the process and assigned in
the task structure, so that we can auto-unplug it if we hit a schedule
event.

The current queue plugging happens implicitly if IO is submitted to
an empty device, yet callers have to remember to unplug that IO when
they are going to wait for it. This is an ugly API and has caused bugs
in the past. Additionally, it requires hacks in the vm (->sync_page()
callback) to handle that logic. By switching to an explicit plugging
scheme we make the API a lot nicer and can get rid of the ->sync_page()
hack in the vm.

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:45:54 +0800