10 Nov, 2010
1 commit
-
REQ_HARDBARRIER is dead now, so remove the leftovers. What's left
at this point is:- various checks inside the block layer.
- sanity checks in bio based drivers.
- now unused bio_empty_barrier helper.
- Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
but Xen really needs to sort out it's barrier situaton.
- setting of ordered tags in uas - dead code copied from old scsi
drivers.
- scsi different retry for barriers - it's dead and should have been
removed when flushes were converted to FS requests.
- blktrace handling of barriers - removed. Someone who knows blktrace
better should add support for REQ_FLUSH and REQ_FUA, though.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
19 Oct, 2010
1 commit
-
Conflicts:
block/blk-core.c
drivers/block/loop.c
mm/swapfile.cSigned-off-by: Jens Axboe
07 Oct, 2010
1 commit
-
2.6.36 introduces an API for drivers to switch the IO scheduler
instead of manually calling the elevator exit and init functions.
This API was added since q->elevator must be cleared in between
those two calls. And since we already have this functionality
directly from use by the sysfs interface to switch schedulers
online, it was prudent to reuse it internally too.But this API needs the queue to be in a fully initialized state
before it is called, or it will attempt to unregister elevator
kobjects before they have been added. This results in an oops
like this:BUG: unable to handle kernel NULL pointer dereference at 0000000000000051
IP: [] sysfs_create_dir+0x2e/0xc0
PGD 47ddfc067 PUD 47c6a1067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/irq
CPU 2
Modules linked in: t(+) loop hid_apple usbhid ahci ehci_hcd uhci_hcd libahci usbcore nls_base igbPid: 7319, comm: modprobe Not tainted 2.6.36-rc6+ #132 QSSC-S4R/QSSC-S4R
RIP: 0010:[] [] sysfs_create_dir+0x2e/0xc0
RSP: 0018:ffff88027da25d08 EFLAGS: 00010246
RAX: ffff88047c68c528 RBX: 00000000fffffffe RCX: 0000000000000000
RDX: 000000000000002f RSI: 000000000000002f RDI: ffff88047e196c88
RBP: ffff88027da25d38 R08: 0000000000000000 R09: d84156c5635688c0
R10: d84156c5635688c0 R11: 0000000000000000 R12: ffff88047e196c88
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047c68c528
FS: 00007fcb0b26f6e0(0000) GS:ffff880287400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000051 CR3: 000000047e76e000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 7319, threadinfo ffff88027da24000, task ffff88027d377090)
Stack:
ffff88027da25d58 ffff88047c68c528 00000000fffffffe ffff88047e196c88
ffff88047c68c528 ffff88047e05bd90 ffff88027da25d78 ffffffff8123fb77
ffff88047e05bd90 0000000000000000 ffff88047e196c88 ffff88047c68c528
Call Trace:
[] kobject_add_internal+0xe7/0x1f0
[] kobject_add_varg+0x38/0x60
[] kobject_add+0x69/0x90
[] ? sysfs_remove_dir+0x20/0xa0
[] ? sub_preempt_count+0x9d/0xe0
[] ? _raw_spin_unlock+0x30/0x50
[] ? sysfs_remove_dir+0x20/0xa0
[] ? sysfs_remove_dir+0x34/0xa0
[] elv_register_queue+0x34/0xa0
[] elevator_change+0xfd/0x250
[] ? t_init+0x0/0x361 [t]
[] ? t_init+0x0/0x361 [t]
[] t_init+0xa8/0x361 [t]
[] do_one_initcall+0x3e/0x170
[] sys_init_module+0xbd/0x220
[] system_call_fastpath+0x16/0x1b
Code: e5 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 48 85 ff 74 52 48 8b 47 18 49 c7 c5 00 46 61 81 48 85 c0 74 04 4c 8b 68 30 45 31 f6 80 7d 51 00 74 0e 49 8b 44 24 28 4c 89 e7 ff 50 20 49 89 c6
RIP [] sysfs_create_dir+0x2e/0xc0
RSP
CR2: 0000000000000051
---[ end trace a6541d3bf07945df ]---Fix this by adding a registered bit to the elevator queue, which is
set when the sysfs kobjects have been registered.Signed-off-by: Jens Axboe
10 Sep, 2010
1 commit
-
Filesystems will take all the responsibilities for ordering requests
around commit writes and will only indicate how the commit writes
themselves should be handled by block layers. This patch drops
barrier ordering by queue draining from block layer. Ordering by
draining implementation was somewhat invasive to request handling.
List of notable changes follow.* Each queue has 1 bit color which is flipped on each barrier issue.
This is used to track whether a given request is issued before the
current barrier or not. REQ_ORDERED_COLOR flag and coloring
implementation in __elv_add_request() are removed.* Requests which shouldn't be processed yet for draining were stalled
by returning -EAGAIN from blk_do_ordered() according to the test
result between blk_ordered_req_seq() and blk_blk_ordered_cur_seq().
This logic is removed.* Draining completion logic in elv_completed_request() removed.
* All barrier sequence requests were queued to request queue and then
trckled to lower layer according to progress and thus maintaining
request orders during requeue was necessary. This is replaced by
queueing the next request in the barrier sequence only after the
current one is complete from blk_ordered_complete_seq(), which
removes the need for multiple proxy requests in struct request_queue
and the request sorting logic in the ELEVATOR_INSERT_REQUEUE path of
elv_insert().* As barriers no longer have ordering constraints, there's no need to
dump the whole elevator onto the dispatch queue on each barrier.
Insert barriers at the front instead.* If other barrier requests come to the front of the dispatch queue
while one is already in progress, they are stored in
q->pending_barriers and restored to dispatch queue one-by-one after
each barrier completion from blk_ordered_complete_seq().Signed-off-by: Tejun Heo
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe
23 Aug, 2010
1 commit
-
Currently drivers must do an elevator_exit() + elevator_init()
to switch IO schedulers. There are a few problems with this:- Since commit 1abec4fdbb142e3ccb6ce99832fae42129134a96,
elevator_init() requires a zeroed out q->elevator
pointer. The two existing in-kernel users don't do that.- It will only work at initialization time, since using the
above two-staged construct does not properly quisce the queue.So add elevator_change() which takes care of this, and convert
the elv_iosched_store() sysfs interface to use this helper as well.Reported-by: Peter Oberparleiter
Reported-by: Kevin Vigor
Signed-off-by: Jens Axboe
12 Aug, 2010
1 commit
-
Secure discard is the same as discard except that all copies of the
discarded sectors (perhaps created by garbage collection) must also be
erased.Signed-off-by: Adrian Hunter
Acked-by: Jens Axboe
Cc: Kyungmin Park
Cc: Madhusudhan Chikkature
Cc: Christoph Hellwig
Cc: Ben Gardiner
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Aug, 2010
2 commits
-
Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests. This allows much easier grepping for different request
types instead of unwinding through macros.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
04 Jun, 2010
1 commit
-
blk_init_allocated_queue_node may fail and the caller _could_ retry.
Accommodate the unlikely event that blk_init_allocated_queue_node is
called on an already initialized (possibly partially) request_queue.Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe
24 May, 2010
1 commit
-
Bio-based DM doesn't use an elevator (queue is !blk_queue_stackable()).
Longer-term DM will not allocate an elevator for bio-based DM. But even
then there will be small potential for an elevator to be allocated for
a request-based DM table only to have a bio-based table be loaded in the
end.Displaying "none" for bio-based DM will help avoid user confusion.
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe
11 May, 2010
1 commit
-
blk_init_queue() allocates the request_queue structure and then
initializes it as needed (request_fn, elevator, etc).Split initialization out to blk_init_allocated_queue_node.
Introduce blk_init_allocated_queue wrapper function to model existing
blk_init_queue and blk_init_queue_node interfaces.Export elv_register_queue to allow a newly added elevator to be
registered with sysfs. Export elv_unregister_queue for symmetry.These changes allow DM to initialize a device's request_queue with more
precision. In particular, DM no longer unconditionally initializes a
full request_queue (elevator et al). It only does so for a
request-based DM device.Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe
09 Apr, 2010
1 commit
-
This includes both the number of bios merged into requests belonging to this
cgroup as well as the number of requests merged together.
In the past, we've observed different merging behavior across upstream kernels,
some by design some actual bugs. This stat helps a lot in debugging such
problems when applications report decreased throughput with a new kernel
version.This needed adding an extra elevator function to capture bios being merged as I
did not want to pollute elevator code with blkiocg knowledge and hence needed
the accounting invocation to come from CFQ.Signed-off-by: Divyesh Shah
Signed-off-by: Jens Axboe
02 Apr, 2010
1 commit
-
elevator_get() not check the name length, if the name length > sizeof(elv),
elv will miss the '\0'. And elv buffer will be replace "-iosched" as something
like aaaaaaaaa, then call request_module() can load an not trust module.Signed-off-by: Zhitong Wang
Signed-off-by: Jens Axboe
08 Mar, 2010
1 commit
-
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime* potentially better optimized code as the compiler
can assume that the const data cannot be changed* the compiler/linker move const data into .rodata
and therefore exclude them from false sharingSigned-off-by: Emese Revfy
Acked-by: David Teigland
Acked-by: Matt Domsch
Acked-by: Maciej Sosnowski
Acked-by: Hans J. Koch
Acked-by: Pekka Enberg
Acked-by: Jens Axboe
Acked-by: Stephen Hemminger
Signed-off-by: Greg Kroah-Hartman
29 Jan, 2010
1 commit
-
Updated 'nomerges' tunable to accept a value of '2' - indicating that _no_
merges at all are to be attempted (not even the simple one-hit cache).The following table illustrates the additional benefit - 5 minute runs of
a random I/O load were applied to a dozen devices on a 16-way x86_64 system.nomerges Throughput %System Improvement (tput / %sys)
-------- ------------ ----------- -------------------------
0 12.45 MB/sec 0.669365609
1 12.50 MB/sec 0.641519199 0.40% / 2.71%
2 12.52 MB/sec 0.639849750 0.56% / 2.96%Signed-off-by: Alan D. Brunelle
Signed-off-by: Jens Axboe
13 Oct, 2009
1 commit
09 Oct, 2009
1 commit
-
elv_iosched_store() ignore the return value of strstrip(). It makes small
inconsistent behavior.This patch fixes it.
====================================
# cd /sys/block/{blockdev}/queuecase1:
# echo "anticipatory" > scheduler
# cat scheduler
noop [anticipatory] deadline cfqcase2:
# echo "anticipatory " > scheduler
# cat scheduler
noop [anticipatory] deadline cfqcase3:
# echo " anticipatory" > scheduler
bash: echo: write error: Invalid argument
====================================
# cd /sys/block/{blockdev}/queuecase1:
# echo "anticipatory" > scheduler
# cat scheduler
noop [anticipatory] deadline cfqcase2:
# echo "anticipatory " > scheduler
# cat scheduler
noop [anticipatory] deadline cfqcase3:
# echo " anticipatory" > scheduler
noop [anticipatory] deadline cfqCc: Li Zefan
Cc: Jens Axboe
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Jens Axboe
03 Oct, 2009
1 commit
-
AS is mostly a subset of CFQ, so there's little point in still
providing this separate IO scheduler. Hopefully at some point we
can get down to one single IO scheduler again, at least this brings
us closer by having only one intelligent IO scheduler.Signed-off-by: Jens Axboe
11 Sep, 2009
2 commits
-
Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.Signed-off-by: Jens Axboe
-
Update scsi_io_completion() such that it only fails requests till the
next error boundary and retry the leftover. This enables block layer
to merge requests with different failfast settings and still behave
correctly on errors. Allow merge of requests of different failfast
settings.As SCSI is currently the only subsystem which follows failfast status,
there's no need to worry about other block drivers for now.Signed-off-by: Tejun Heo
Cc: Niel Lambrechts
Cc: James Bottomley
Signed-off-by: Jens Axboe
17 Jul, 2009
1 commit
-
Commit ab0fd1debe730ec9998678a0c53caefbd121ed10 tries to prevent merge
of requests with different failfast settings. In elv_rq_merge_ok(),
it compares new bio's failfast flags against the merge target
request's. However, the flag testing accessors for bio and blk don't
return boolean but the tested bit value directly and FAILFAST on bio
and blk don't match, so directly comparing them with == results in
false negative unnecessary preventing merge of readahead requests.This patch convert the results to boolean by negating them before
comparison.Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Boaz Harrosh
Cc: FUJITA Tomonori
Cc: James Bottomley
Cc: Jeff Garzik
04 Jul, 2009
1 commit
-
Block layer used to merge requests and bios with different failfast
settings. This caused regular IOs to fail prematurely when they were
merged into failfast requests for readahead.Niel Lambrechts could trigger the problem semi-reliably on ext4 when
resuming from STR. ext4 uses readahead when reading inodes and
combined with the deterministic extra SATA PHY exception cycle during
resume on the specific configuration, non-readahead inode read would
fail causing ext4 errors. Please read the following thread for
details.http://lkml.org/lkml/2009/5/23/21
This patch makes block layer reject merging if the failfast settings
don't match. This is correct but likely to lower IO performance by
preventing regular IOs from mingling into surrounding readahead
requests. Changes to allow such mixed merges and handle errors
correctly will be added later.Signed-off-by: Tejun Heo
Reported-by: Niel Lambrechts
Cc: Theodore Tso
Signed-off-by: Jens Axboe
12 Jun, 2009
1 commit
-
* 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
block: add request clone interface (v2)
floppy: fix hibernation
ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
fs/bio.c: add missing __user annotation
block: prevent possible io_context->refcount overflow
Add serial number support for virtio_blk, V4a
block: Add missing bounce_pfn stacking and fix comments
Revert "block: Fix bounce limit setting in DM"
cciss: decode unit attention in SCSI error handling code
cciss: Remove no longer needed sendcmd reject processing code
cciss: change SCSI error handling routines to work with interrupts enabled.
cciss: separate error processing and command retrying code in sendcmd_withirq_core()
cciss: factor out fix target status processing code from sendcmd functions
cciss: simplify interface of sendcmd() and sendcmd_withirq()
cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
block: needs to set the residual length of a bidi request
Revert "block: implement blkdev_readpages"
block: Fix bounce limit setting in DM
Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
...Manually fix conflicts with tracing updates in:
block/blk-sysfs.c
drivers/ide/ide-atapi.c
drivers/ide/ide-cd.c
drivers/ide/ide-floppy.c
drivers/ide/ide-tape.c
include/trace/events/block.h
kernel/trace/blktrace.c
10 Jun, 2009
1 commit
-
TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
these new capabilities to this tracepoint:- zero-copy and per-cpu splice() tracing
- binary tracing without printf overhead
- structured logging records exposed under /debug/tracing/events
- trace events embedded in function tracer output and other plugins
- user-defined, per tracepoint filter expressions
...Cons:
- no dev_t info for the output of plug, unplug_timer and unplug_io events.
no dev_t info for getrq and sleeprq events if bio == NULL.
no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.This is mainly because we can't get the deivce from a request queue.
But this may change in the future.- A packet command is converted to a string in TP_assign, not TP_print.
While blktrace do the convertion just before output.Since pc requests should be rather rare, this is not a big issue.
- In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
has a unique format, which means we have some unused data in a trace entry.The overhead is minimized by using __dynamic_array() instead of __array().
I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:
dd dd + ioctl blktrace dd + TRACE_EVENT (splice)
1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s
2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s
3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/sSo the overhead of tracing is very small, and no regression when using
those trace events vs blktrace.And the binary output of TRACE_EVENT is much smaller than blktrace:
# ls -l -h
-rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
-rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
-rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.outFollowing are some comparisons between TRACE_EVENT and blktrace:
plug:
kjournald-480 [000] 303.084981: block_plug: [kjournald]
kjournald-480 [000] 303.084981: 8,0 P N [kjournald]unplug_io:
kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1
kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1remap:
kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 v3:- use the newly introduced __dynamic_array().
Changelog from v1 -> v2:
- use __string() instead of __array() to minimize the memory required
to store hex dump of rq->cmd().- support large pc requests.
- add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.
- some cleanups.
Signed-off-by: Li Zefan
LKML-Reference:
Signed-off-by: Steven Rostedt
02 Jun, 2009
1 commit
-
I found one more mis-conversion to the 'request is always dequeued
when completing' model in elv_abort_queue() during code inspection.
Although I haven't hit any problem caused by this mis-conversion yet
and just done compile/boot test, please apply if you have no problem.Request must be dequeued when it completes.
However, elv_abort_queue() completes requests without dequeueing.
This will cause oops in the __blk_end_request_all().
This patch fixes the oops.Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Jens Axboe
23 May, 2009
1 commit
-
Currently stacking devices do not have a queue directory in sysfs.
However, many of the I/O characteristics like sector size, maximum
request size, etc. are queue properties.This patch enables the queue directory for MD/DM devices. The elevator
code has been modified to deal with queues that do not have an I/O
scheduler.Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe
20 May, 2009
1 commit
-
Make them fully share the tag space, but disallow async requests using
the last any two slots.Signed-off-by: Jens Axboe
11 May, 2009
1 commit
-
With recent cleanups, there is no place where low level driver
directly manipulates request fields. This means that the 'hard'
request fields always equal the !hard fields. Convert all
rq->sectors, nr_sectors and current_nr_sectors references to
accessors.While at it, drop superflous blk_rq_pos() < 0 test in swim.c.
[ Impact: use pos and nr_sectors accessors ]
Signed-off-by: Tejun Heo
Acked-by: Geert Uytterhoeven
Tested-by: Grant Likely
Acked-by: Grant Likely
Tested-by: Adrian McMenamin
Acked-by: Adrian McMenamin
Acked-by: Mike Miller
Cc: James Bottomley
Cc: Bartlomiej Zolnierkiewicz
Cc: Borislav Petkov
Cc: Sergei Shtylyov
Cc: Eric Moore
Cc: Alan Stern
Cc: FUJITA Tomonori
Cc: Pete Zaitcev
Cc: Stephen Rothwell
Cc: Paul Clements
Cc: Tim Waugh
Cc: Jeff Garzik
Cc: Jeremy Fitzhardinge
Cc: Alex Dubov
Cc: David Woodhouse
Cc: Martin Schwidefsky
Cc: Dario Ballabio
Cc: David S. Miller
Cc: Rusty Russell
Cc: unsik Kim
Cc: Laurent Vivier
Signed-off-by: Jens Axboe
28 Apr, 2009
3 commits
-
There are many [__]blk_end_request() call sites which call it with
full request length and expect full completion. Many of them ensure
that the request actually completes by doing BUG_ON() the return
value, which is awkward and error-prone.This patch adds [__]blk_end_request_all() which takes @rq and @error
and fully completes the request. BUG_ON() is added to to ensure that
this actually happens.Most conversions are simple but there are a few noteworthy ones.
* cdrom/viocd: viocd_end_request() replaced with direct calls to
__blk_end_request_all().* s390/block/dasd: dasd_end_request() replaced with direct calls to
__blk_end_request_all().* s390/char/tape_block: tapeblock_end_request() replaced with direct
calls to blk_end_request_all().[ Impact: cleanup ]
Signed-off-by: Tejun Heo
Cc: Russell King
Cc: Stephen Rothwell
Cc: Mike Miller
Cc: Martin Schwidefsky
Cc: Jeff Garzik
Cc: Rusty Russell
Cc: Jeremy Fitzhardinge
Cc: Alex Dubov
Cc: James Bottomley -
Impact: code reorganization
elv_next_request() and elv_dequeue_request() are public block layer
interface than actual elevator implementation. They mostly deal with
how requests interact with block layer and low level drivers at the
beginning of rqeuest processing whereas __elv_next_request() is the
actual eleveator request fetching interface.Move the two functions to blk-core.c. This prepares for further
interface cleanup.Signed-off-by: Tejun Heo
-
blk_start_queueing() is identical to __blk_run_queue() except that it
doesn't check for recursion. None of the current users depends on
blk_start_queueing() running request_fn directly. Replace usages of
blk_start_queueing() with [__]blk_run_queue() and kill it.[ Impact: removal of mostly duplicate interface function ]
Signed-off-by: Tejun Heo
15 Apr, 2009
1 commit
-
Credit goes to Andrew Morton for spotting this one.
Signed-off-by: Jens Axboe
07 Apr, 2009
2 commits
-
This forces in_flight to be zero when turning off or on the I/O stat
accounting and stops updating I/O stats in attempt_merge() when
accounting is turned off.Signed-off-by: Jerome Marchand
Signed-off-by: Jens Axboe -
Simple helper functions to quiesce the request queue. These are
currently only used for switching IO schedulers on-the-fly, but
we can use them to properly switch IO accounting on and off as well.Signed-off-by: Jerome Marchand
Signed-off-by: Jens Axboe
06 Apr, 2009
1 commit
-
This makes sure that we never wait on async IO for sync requests, instead
of doing the split on writes vs reads.Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds
29 Dec, 2008
3 commits
-
Just use struct elevator_queue everywhere instead.
Signed-off-by: Jens Axboe
-
Empty barrier required special handling in __elv_next_request() to
complete it without letting the low level driver see it.With previous changes, barrier code is now flexible enough to skip the
BAR step using the same barrier sequence selection mechanism. Drop
the special handling and mask off q->ordered from start_ordered().Remove blk_empty_barrier() test which now has no user.
Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
Barrier completion had the following assumptions.
* start_ordered() couldn't finish the whole sequence properly. If all
actions are to be skipped, q->ordseq is set correctly but the actual
completion was never triggered thus hanging the barrier request.* Drain completion in elv_complete_request() assumed that there's
always at least one request in the queue when drain completes.Both assumptions are true but these assumptions need to be removed to
improve empty barrier implementation. This patch makes the following
changes.* Make start_ordered() use blk_ordered_complete_seq() to mark skipped
steps complete and notify __elv_next_request() that it should fetch
the next request if the whole barrier has completed inside
start_ordered().* Make drain completion path in elv_complete_request() check whether
the queue is empty. Empty queue also indicates drain completion.* While at it, convert 0/1 return from blk_do_ordered() to false/true.
Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe
05 Dec, 2008
1 commit
-
…gent' into tracing/core
03 Dec, 2008
1 commit
-
blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
both start the timeout timer. Barrier code dequeues the original
barrier request but doesn't passes the request itself to lower level
driver, only broken down proxy requests; however, as the original
barrier code goes through the same dequeue path and timeout timer is
started on it. If barrier sequence takes long enough, this timer
expires but the low level driver has no idea about this request and
oops follows.Timeout timer shouldn't have been started on the original barrier
request as it never goes through actual IO. This patch unexports
elv_dequeue_request(), which has no external user anyway, and makes it
operate on elevator proper w/o adding the timer and make
blkdev_dequeue_request() call elv_dequeue_request() and add timer.
Internal users which don't pass the request to driver - barrier code
and end_that_request_last() - are converted to use
elv_dequeue_request().Signed-off-by: Tejun Heo
Cc: Mike Anderson
Signed-off-by: Jens Axboe