Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

09 Mar, 2014

1 commit

10beafc19 block: change flush sequence list addition back to front add ... Browse Code »

Commit 18741986 inadvertently changed the rq flush insertion
from a head to a tail insertion. Fix that back up.

Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mike Snitzer
2014-03-09 11:31:31 +0800

22 Feb, 2014

1 commit

feb71dae1 blk-mq: merge blk_mq_insert_request and blk_mq_run_request ... Browse Code »

It's almost identical to blk_mq_insert_request, so fold the two into one
slightly more generic function by making the flush special case a bit
smarted.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-02-22 00:58:48 +0800

11 Feb, 2014

1 commit

18741986a blk-mq: rework flush sequencing logic ... Browse Code »

Witch to using a preallocated flush_rq for blk-mq similar to what's done
with the old request path. This allows us to set up the request properly
with a tag from the actually allowed range and ->rq_disk as needed by
some drivers. To make life easier we also switch to dynamic allocation
of ->flush_rq for the old path.

This effectively reverts most of

"blk-mq: fix for flush deadlock"

and

"blk-mq: Don't reserve a tag for flush request"

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-02-11 00:29:00 +0800

31 Jan, 2014

1 commit

f0276924f blk-mq: Don't reserve a tag for flush request ... Browse Code »

Reserving a tag (request) for flush to avoid dead lock is a overkill. A
tag is valuable resource. We can track the number of flush requests and
disallow having too many pending flush requests allocated. With this
patch, blk_mq_alloc_request_pinned() could do a busy nop (but not a dead
loop) if too many pending requests are allocated and new flush request
is allocated. But this should not be a problem, too many pending flush
requests are very rare case.

I verified this can fix the deadlock caused by too many pending flush
requests.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2014-01-31 03:57:25 +0800

24 Nov, 2013

2 commits

4f024f379 block: Abstract out bvec iterator ... Browse Code »
13

Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Geert Uytterhoeven
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "Ed L. Cashin"
Cc: Nick Piggin
Cc: Lars Ellenberg
Cc: Jiri Kosina
Cc: Matthew Wilcox
Cc: Geoff Levand
Cc: Yehuda Sadeh
Cc: Sage Weil
Cc: Alex Elder
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris
Cc: Philip Kelleher
Cc: Rusty Russell
Cc: "Michael S. Tsirkin"
Cc: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge
Cc: Neil Brown
Cc: Alasdair Kergon
Cc: Mike Snitzer
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh
Cc: Benny Halevy
Cc: "James E.J. Bottomley"
Cc: Greg Kroah-Hartman
Cc: "Nicholas A. Bellinger"
Cc: Alexander Viro
Cc: Chris Mason
Cc: "Theodore Ts'o"
Cc: Andreas Dilger
Cc: Jaegeuk Kim
Cc: Steven Whitehouse
Cc: Dave Kleikamp
Cc: Joern Engel
Cc: Prasad Joshi
Cc: Trond Myklebust
Cc: KONISHI Ryusuke
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Ben Myers
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Len Brown
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: Herton Ronaldo Krzesinski
Cc: Ben Hutchings
Cc: Andrew Morton
Cc: Guo Chao
Cc: Tejun Heo
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Wei Yongjun
Cc: "Roger Pau Monné"
Cc: Jan Beulich
Cc: Stefano Stabellini
Cc: Ian Campbell
Cc: Sebastian Ott
Cc: Christian Borntraeger
Cc: Minchan Kim
Cc: Jiang Liu
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Joe Perches
Cc: Peng Tao
Cc: Andy Adamson
Cc: fanchaoting
Cc: Jie Liu
Cc: Sunil Mushran
Cc: "Martin K. Petersen"
Cc: Namjae Jeon
Cc: Pankaj Kumar
Cc: Dan Magenheimer
Cc: Mel Gorman 6

Kent Overstreet
2013-11-24 14:33:47 +0800
33879d451 block: submit_bio_wait() conversions ... Browse Code »

It was being open coded in a few places.

Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Joern Engel
Cc: Prasad Joshi
Cc: Neil Brown
Cc: Chris Mason
Acked-by: NeilBrown

Kent Overstreet
2013-11-24 14:33:38 +0800

29 Oct, 2013

1 commit

3228f48be blk-mq: fix for flush deadlock ... Browse Code »

The flush state machine takes in a struct request, which then is
submitted multiple times to the underling driver. The old block code
requeses the same request for each of those, so it does not have an
issue with tapping into the request pool. The new one on the other hand
allocates a new request for each of the actualy steps of the flush
sequence. If have already allocated all of the tags for IO, we will
fail allocating the flush request.

Set aside a reserved request just for flushes.

Signed-off-by: Jens Axboe

Christoph Hellwig
2013-10-29 03:33:58 +0800

25 Oct, 2013

1 commit

320ae51fe blk-mq: new multi-queue block IO queueing mechanism ... Browse Code »
2

Linux currently has two models for block devices:

- The classic request_fn based approach, where drivers use struct
request units for IO. The block layer provides various helper
functionalities to let drivers share code, things like tag
management, timeout handling, queueing, etc.

- The "stacked" approach, where a driver squeezes in between the
block layer and IO submitter. Since this bypasses the IO stack,
driver generally have to manage everything themselves.

With drivers being written for new high IOPS devices, the classic
request_fn based driver doesn't work well enough. The design dates
back to when both SMP and high IOPS was rare. It has problems with
scaling to bigger machines, and runs into scaling issues even on
smaller machines when you have IOPS in the hundreds of thousands
per device.

The stacked approach is then most often selected as the model
for the driver. But this means that everybody has to re-invent
everything, and along with that we get all the problems again
that the shared approach solved.

This commit introduces blk-mq, block multi queue support. The
design is centered around per-cpu queues for queueing IO, which
then funnel down into x number of hardware submission queues.
We might have a 1:1 mapping between the two, or it might be
an N:M mapping. That all depends on what the hardware supports.

blk-mq provides various helper functions, which include:

- Scalable support for request tagging. Most devices need to
be able to uniquely identify a request both in the driver and
to the hardware. The tagging uses per-cpu caches for freed
tags, to enable cache hot reuse.

- Timeout handling without tracking request on a per-device
basis. Basically the driver should be able to get a notification,
if a request happens to fail.

- Optional support for non 1:1 mappings between issue and
submission queues. blk-mq can redirect IO completions to the
desired location.

- Support for per-request payloads. Drivers almost always need
to associate a request structure with some driver private
command structure. Drivers can tell blk-mq this at init time,
and then any request handed to the driver will have the
required size of memory associated with it.

- Support for merging of IO, and plugging. The stacked model
gets neither of these. Even for high IOPS devices, merging
sequential IO reduces per-command overhead and thus
increases bandwidth.

For now, this is provided as a potential 3rd queueing model, with
the hope being that, as it matures, it can replace both the classic
and stacked model. That would get us back to having just 1 real
model for block devices, leaving the stacked approach to dm/md
devices (as it was originally intended).

Contributions in this patch from the following people:

Shaohua Li
Alexander Gordeev
Christoph Hellwig
Mike Christie
Matias Bjorling
Jeff Moyer

Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Jens Axboe
2013-10-25 18:56:00 +0800

23 Mar, 2013

1 commit

f2fc7d0ed Block: blk-flush: Fixed indent code style ... Browse Code »

Fixed code indent should use tabs where possible.

Signed-off-by: Alice Ferrazzi
Signed-off-by: Jens Axboe

Alice Ferrazzi
2013-03-23 02:22:51 +0800

15 Feb, 2013

1 commit

5577022f4 block: account iowait time when waiting for completion of IO request ... Browse Code »

Using wait_for_completion() for waiting for a IO request to be executed
results in wrong iowait time accounting. For example, a system having
the only task doing write() and fdatasync() on a block device can be
reported being idle instead of iowaiting as it should because
blkdev_issue_flush() calls wait_for_completion() which in turn calls
schedule() that does not increment the iowait proc counter and thus does
not turn on iowait time accounting.

The patch makes block layer use wait_for_completion_io() instead of
wait_for_completion() where appropriate to account iowait time
correctly.

Signed-off-by: Vladimir Davydov
Signed-off-by: Jens Axboe

Vladimir Davydov
2013-02-15 23:45:07 +0800

24 Oct, 2011

2 commits

e67b77c79 blk-flush: move the queue kick into ... Browse Code »

A dm-multipath user reported[1] a problem when trying to boot
a kernel with commit 4853abaae7e4a2af938115ce9071ef8684fb7af4
(block: fix flush machinery for stacking drivers with differring
flush flags) applied. It turns out that an empty flush request
can be sent into blk_insert_flush. When the BUG_ON was fixed
to allow for this, I/O on the underlying device would stall. The
reason is that blk_insert_cloned_request does not kick the queue.
In the aforementioned commit, I had added a special case to
kick the queue if data was sent down but the queue flags did
not require a flush. A better solution is to push the queue
kick up into blk_insert_cloned_request.

This patch, along with a follow-on which fixes the BUG_ON, fixes
the issue reported.

[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

Reported-by: Christophe Saout
Signed-off-by: Jeff Moyer
Acked-by: Tejun Heo

Stable note: 3.1
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe

Jeff Moyer
2011-10-24 22:24:31 +0800
834f9f61a blk-flush: fix invalid BUG_ON in blk_insert_flush ... Browse Code »

A user reported a regression due to commit
4853abaae7e4a2af938115ce9071ef8684fb7af4 (block: fix flush
machinery for stacking drivers with differring flush flags).
Part of the problem is that blk_insert_flush required a
single bio be attached to the request. In reality, having
no attached bio is also a valid case, as can be observed with
an empty flush.

[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html

Reported-by: Christophe Saout
Signed-off-by: Jeff Moyer

Stable note: 3.1
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe

Jeff Moyer
2011-10-24 22:24:30 +0800

16 Aug, 2011

1 commit

4853abaae block: fix flush machinery for stacking drivers with differring flush flags ... Browse Code »

Commit ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae, block: reimplement
FLUSH/FUA to support merge, introduced a performance regression when
running any sort of fsyncing workload using dm-multipath and certain
storage (in our case, an HP EVA). The test I ran was fs_mark, and it
dropped from ~800 files/sec on ext4 to ~100 files/sec. It turns out
that dm-multipath always advertised flush+fua support, and passed
commands on down the stack, where those flags used to get stripped off.
The above commit changed that behavior:

static inline struct request *__elv_next_request(struct request_queue *q)
{
struct request *rq;

while (1) {
- while (!list_empty(&q->queue_head)) {
+ if (!list_empty(&q->queue_head)) {
rq = list_entry_rq(q->queue_head.next);
- if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
- (rq->cmd_flags & REQ_FLUSH_SEQ))
- return rq;
- rq = blk_do_flush(q, rq);
- if (rq)
- return rq;
+ return rq;
}

Note that previously, a command would come in here, have
REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:

struct request *blk_do_flush(struct request_queue *q, struct request *rq)
{
unsigned int fflags = q->flush_flags; /* may change, cache it */
bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
REQ_FUA);
unsigned skip = 0;
...
if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
rq->cmd_flags &= ~REQ_FLUSH;
if (!has_fua)
rq->cmd_flags &= ~REQ_FUA;
return rq;
}

So, the flush machinery was bypassed in such cases (q->flush_flags == 0
&& rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).

Now, however, we don't get into the flush machinery at all. Instead,
__elv_next_request just hands a request with flush and fua bits set to
the scsi_request_fn, even if the underlying request_queue does not
support flush or fua.

The agreed upon approach is to fix the flush machinery to allow
stacking. While this isn't used in practice (since there is only one
request-based dm target, and that target will now reflect the flush
flags of the underlying device), it does future-proof the solution, and
make it function as designed.

In order to make this work, I had to add a field to the struct request,
inside the flush structure (to store the original req->end_io). Shaohua
had suggested overloading the union with rb_node and completion_data,
but the completion data is used by device mapper and can also be used by
other drivers. So, I didn't see a way around the additional field.

I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
the lost performance. Comments and other testers, as always, are
appreciated.

Cheers,
Jeff

Signed-off-by: Jeff Moyer
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Jeff Moyer
2011-08-16 03:37:25 +0800

10 Aug, 2011

1 commit

fa1bf42ff allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH ... Browse Code »

blk_insert_flush has the following check:

/*
* If there's data but flush is not necessary, the request can be
* processed directly without going through flush machinery. Queue
* for normal execution.
*/
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
list_add_tail(&rq->queuelist, &q->queue_head);
return;
}

However, blk_flush_policy will not return with policy set to only
REQ_FSEQ_DATA:

static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq)
{
unsigned int policy = 0;

if (fflags & REQ_FLUSH) {
if (rq->cmd_flags & REQ_FLUSH)
policy |= REQ_FSEQ_PREFLUSH;
if (blk_rq_sectors(rq))
policy |= REQ_FSEQ_DATA;
if (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA))
policy |= REQ_FSEQ_POSTFLUSH;
}
return policy;
}

Notice that REQ_FSEQ_DATA is only set if REQ_FLUSH is set. Fix this
mismatch by moving the setting of REQ_FSEQ_DATA outside of the REQ_FLUSH
check.

Tejun notes:

Hmmm... yes, this can become a correctness issue if (and only if)
blk_queue_flush() is called to change q->flush_flags while requests
are in-flight; otherwise, requests wouldn't reach the function at all.
Also, I think it would be a generally good idea to always set
FSEQ_DATA if the request has data.

Cheers,
Jeff

Signed-off-by: Jeff Moyer
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Jeff Moyer
2011-08-10 02:32:09 +0800

07 May, 2011

1 commit

3ac0cc450 block: hold queue if flush is running for non-queueable flush drive ... Browse Code »

In some drives, flush requests are non-queueable. When flush request is
running, normal read/write requests can't run. If block layer dispatches
such request, driver can't handle it and requeue it. Tejun suggested we
can hold the queue when flush is running. This can avoid unnecessary
requeue. Also this can improve performance. For example, we have
request flush1, write1, flush 2. flush1 is dispatched, then queue is
hold, write1 isn't inserted to queue. After flush1 is finished, flush2
will be dispatched. Since disk cache is already clean, flush2 will be
finished very soon, so looks like flush2 is folded to flush1.

In my test, the queue holding completely solves a regression introduced by
commit 53d63e6b0dfb95882ec0219ba6bbd50cde423794:

block: make the flush insertion use the tail of the dispatch list

It's not a preempt type request, in fact we have to insert it
behind requests that do specify INSERT_FRONT.

which causes about 20% regression running a sysbench fileio
workload.

Stable: 2.6.39 only

Cc: stable@kernel.org
Signed-off-by: Shaohua Li
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

shaohua.li@intel.com
2011-05-07 01:36:25 +0800

18 Apr, 2011

1 commit

24ecfbe27 block: add blk_run_queue_async ... Browse Code »

Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly. I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2011-04-18 17:41:33 +0800

06 Apr, 2011

2 commits

53d63e6b0 block: make the flush insertion use the tail of the dispatch list ... Browse Code »

It's not a preempt type request, in fact we have to insert it
behind requests that do specify INSERT_FRONT.

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-06 05:51:37 +0800
b710a4805 block: get rid of elv_insert() interface ... Browse Code »

Merge it with __elv_add_request(), it's pretty pointless to
have a function with only two callers. The main interface
is elv_add_request()/__elv_add_request().

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-06 05:51:37 +0800

10 Mar, 2011

3 commits

4c63f5646 Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core ... Browse Code »

Conflicts:
block/blk-core.c
block/blk-flush.c
drivers/md/raid1.c
drivers/md/raid10.c
drivers/md/raid5.c
fs/nilfs2/btnode.c
fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:58:35 +0800
7eaceacca block: remove per-queue plugging ... Browse Code »
30

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800
73c101011 block: initial patch for on-stack per-task plugging ... Browse Code »

This patch adds support for creating a queuing context outside
of the queue itself. This enables us to batch up pieces of IO
before grabbing the block device queue lock and submitting them to
the IO scheduler.

The context is created on the stack of the process and assigned in
the task structure, so that we can auto-unplug it if we hit a schedule
event.

The current queue plugging happens implicitly if IO is submitted to
an empty device, yet callers have to remember to unplug that IO when
they are going to wait for it. This is an ugly API and has caused bugs
in the past. Additionally, it requires hacks in the vm (->sync_page()
callback) to handle that logic. By switching to an explicit plugging
scheme we make the API a lot nicer and can get rid of the ->sync_page()
hack in the vm.

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:45:54 +0800

05 Mar, 2011

1 commit

e83a46bbb Merge branch 'for-linus' of ../linux-2.6-block into block-for-2.6.39/core ... Browse Code »

This merge creates two set of conflicts. One is simple context
conflicts caused by removal of throtl_scheduled_delayed_work() in
for-linus and removal of throtl_shutdown_timer_wq() in
for-2.6.39/core.

The other is caused by commit 255bb490c8 (block: blk-flush shouldn't
call directly into q->request_fn() __blk_run_queue()) in for-linus
crashing with FLUSH reimplementation in for-2.6.39/core. The conflict
isn't trivial but the resolution is straight-forward.

* __blk_run_queue() calls in flush_end_io() and flush_data_end_io()
should be called with @force_kblockd set to %true.

* elv_insert() in blk_kick_flush() should use
%ELEVATOR_INSERT_REQUEUE.

Both changes are to avoid invoking ->request_fn() directly from
request completion path and closely match the changes in the commit
255bb490c8.

Signed-off-by: Tejun Heo

Tejun Heo
2011-03-05 02:09:02 +0800

02 Mar, 2011

2 commits

255bb490c block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue() ... Browse Code »

blk-flush decomposes a flush into sequence of multiple requests. On
completion of a request, the next one is queued; however, block layer
must not implicitly call into q->request_fn() directly from completion
path. This makes the queue behave unexpectedly when seen from the
drivers and violates the assumption that q->request_fn() is called
with process context + queue_lock.

This patch makes blk-flush the following two changes to make sure
q->request_fn() is not called directly from request completion path.

- blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always
use kblockd instead of calling directly into q->request_fn().

- queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of
ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the
request queue directly.

Reported by Jan in the following threads.

http://thread.gmane.org/gmane.linux.ide/48778
http://thread.gmane.org/gmane.linux.ide/48786

stable: applicable to v2.6.37.

Signed-off-by: Tejun Heo
Reported-by: Jan Beulich
Cc: "David S. Miller"
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Tejun Heo
2011-03-02 21:48:06 +0800
1654e7411 block: add @force_kblockd to __blk_run_queue() ... Browse Code »

__blk_run_queue() automatically either calls q->request_fn() directly
or schedules kblockd depending on whether the function is recursed.
blk-flush implementation needs to be able to explicitly choose
kblockd. Add @force_kblockd.

All the current users are converted to specify %false for the
parameter and this patch doesn't introduce any behavior change.

stable: This is prerequisite for fixing ide oops caused by the new
blk-flush implementation.

Signed-off-by: Tejun Heo
Cc: Jan Beulich
Cc: James Bottomley
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Tejun Heo
2011-03-02 21:48:05 +0800

25 Jan, 2011

2 commits

ae1b15396 block: reimplement FLUSH/FUA to support merge ... Browse Code »

The current FLUSH/FUA support has evolved from the implementation
which had to perform queue draining. As such, sequencing is done
queue-wide one flush request after another. However, with the
draining requirement gone, there's no reason to keep the queue-wide
sequential approach.

This patch reimplements FLUSH/FUA support such that each FLUSH/FUA
request is sequenced individually. The actual FLUSH execution is
double buffered and whenever a request wants to execute one for either
PRE or POSTFLUSH, it queues on the pending queue. Once certain
conditions are met, a flush request is issued and on its completion
all pending requests proceed to the next sequence.

This allows arbitrary merging of different type of flushes. How they
are merged can be primarily controlled and tuned by adjusting the
above said 'conditions' used to determine when to issue the next
flush.

This is inspired by Darrick's patches to merge multiple zero-data
flushes which helps workloads with highly concurrent fsync requests.

* As flush requests are never put on the IO scheduler, request fields
used for flush share space with rq->rb_node. rq->completion_data is
moved out of the union. This increases the request size by one
pointer.

As rq->elevator_private* are used only by the iosched too, it is
possible to reduce the request size further. However, to do that,
we need to modify request allocation path such that iosched data is
not allocated for flush requests.

* FLUSH/FUA processing happens on insertion now instead of dispatch.

- Comments updated as per Vivek and Mike.

Signed-off-by: Tejun Heo
Cc: "Darrick J. Wong"
Cc: Shaohua Li
Cc: Christoph Hellwig
Cc: Vivek Goyal
Cc: Mike Snitzer
Signed-off-by: Jens Axboe

Tejun Heo
2011-01-25 19:43:54 +0800
414b4ff5e block: add REQ_FLUSH_SEQ ... Browse Code »

rq == &q->flush_rq was used to determine whether a rq is part of a
flush sequence, which worked because all requests in a flush sequence
were sequenced using the single dedicated request. This is about to
change, so introduce REQ_FLUSH_SEQ flag to distinguish flush sequence
requests.

This patch doesn't cause any behavior change.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2011-01-25 19:43:49 +0800

17 Sep, 2010

1 commit

dd3932edd block: remove BLKDEV_IFL_WAIT ... Browse Code »

All the blkdev_issue_* helpers can only sanely be used for synchronous
caller. To issue cache flushes or barriers asynchronously the caller needs
to set up a bio by itself with a completion callback to move the asynchronous
state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always
specified when calling blkdev_issue_* and also remove the now unused flags
argument to blkdev_issue_flush and blkdev_issue_zeroout. For
blkdev_issue_discard we need to keep it for the secure discard flag, which
gains a more descriptive name and loses the bitops vs flag confusion.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-09-17 02:52:58 +0800

10 Sep, 2010

8 commits

d391a2dda block: use REQ_FLUSH in blkdev_issue_flush() ... Browse Code »

Update blkdev_issue_flush() to use new REQ_FLUSH interface.

Signed-off-by: Tejun Heo
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe

Tejun Heo
2010-09-10 18:35:37 +0800
09d60c701 block: make sure FSEQ_DATA request has the same rq_disk as the original ... Browse Code »

rq->rq_disk and bio->bi_bdev->bd_disk may differ if a request has
passed through remapping drivers. FSEQ_DATA request incorrectly
followed bio->bi_bdev->bd_disk ending up being issued w/ mismatching
rq_disk. Make it follow orig_rq->rq_disk.

Signed-off-by: Tejun Heo
Reported-by: Kiyoshi Ueda
Tested-by: Kiyoshi Ueda
Signed-off-by: Jens Axboe

Tejun Heo
2010-09-10 18:35:37 +0800
47f70d5a6 block: kick queue after sequencing REQ_FLUSH/FUA ... Browse Code »

While completing a request from a REQ_FLUSH/FUA sequence, another
request can be pushed to the request queue. If a driver tests
elv_queue_empty() before completing a request and runs the queue again
only if the queue wasn't empty, this may lead to hang. Please note
that most drivers either kick the queue unconditionally or test queue
emptiness after completing the current request and don't have this
problem.

This patch removes this possibility by making REQ_FLUSH/FUA sequence
code kick the queue if the queue was empty before completing a request
from REQ_FLUSH/FUA sequence.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2010-09-10 18:35:37 +0800
337238be1 block: initialize flush request with WRITE_FLUSH instead of REQ_FLUSH ... Browse Code »

init_flush_request() only set REQ_FLUSH when initializing flush
requests making them READ requests. Use WRITE_FLUSH instead.

Signed-off-by: Tejun Heo
Reported-by: Mike Snitzer
Signed-off-by: Jens Axboe

Tejun Heo
2010-09-10 18:35:37 +0800
cde4c406d block: simplify queue_next_fseq ... Browse Code »

We need to call blk_rq_init and elv_insert for all cases in queue_next_fseq,
so take these calls into common code. Also move the end_io initialization
from queue_flush into queue_next_fseq and rename queue_flush to
init_flush_request now that it's old name doesn't apply anymore.

Signed-off-by: Christoph Hellwig
Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-09-10 18:35:37 +0800
4fed947cb block: implement REQ_FLUSH/FUA based interface for FLUSH/FUA requests ... Browse Code »

Now that the backend conversion is complete, export sequenced
FLUSH/FUA capability through REQ_FLUSH/FUA flags. REQ_FLUSH means the
device cache should be flushed before executing the request. REQ_FUA
means that the data in the request should be on non-volatile media on
completion.

Block layer will choose the correct way of implementing the semantics
and execute it. The request may be passed to the device directly if
the device can handle it; otherwise, it will be sequenced using one or
more proxy requests. Devices will never see REQ_FLUSH and/or FUA
which it doesn't support.

Also, unlike the original REQ_HARDBARRIER, REQ_FLUSH/FUA requests are
never failed with -EOPNOTSUPP. If the underlying device doesn't
support FLUSH/FUA, the block layer simply make those noop. IOW, it no
longer distinguishes between writeback cache which doesn't support
cache flush and writethrough/no cache. Devices which have WB cache
w/o flush are very difficult to come by these days and there's nothing
much we can do anyway, so it doesn't make sense to require everyone to
implement -EOPNOTSUPP handling. This will simplify filesystems and
block drivers as they can drop -EOPNOTSUPP retry logic for barriers.

* QUEUE_ORDERED_* are removed and QUEUE_FSEQ_* are moved into
blk-flush.c.

* REQ_FLUSH w/o data can also be directly passed to drivers without
sequencing but some drivers assume that zero length requests don't
have rq->bio which isn't true for these requests requiring the use
of proxy requests.

* REQ_COMMON_MASK now includes REQ_FLUSH | REQ_FUA so that they are
copied from bio to request.

* WRITE_BARRIER is marked deprecated and WRITE_FLUSH, WRITE_FUA and
WRITE_FLUSH_FUA are added.

Signed-off-by: Tejun Heo
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe

Tejun Heo
2010-09-10 18:35:37 +0800
dd4c133f3 block: rename barrier/ordered to flush ... Browse Code »

With ordering requirements dropped, barrier and ordered are misnomers.
Now all block layer does is sequencing FLUSH and FUA. Rename them to
flush.

Signed-off-by: Tejun Heo
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe

Tejun Heo
2010-09-10 18:35:36 +0800
8839a0e05 block: rename blk-barrier.c to blk-flush.c ... Browse Code »

Without ordering requirements, barrier and ordering are minomers.
Rename block/blk-barrier.c to block/blk-flush.c. Rename of symbols
will follow.

Signed-off-by: Tejun Heo
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe

Tejun Heo
2010-09-10 18:35:36 +0800