10 Nov, 2010

1 commit

  • REQ_HARDBARRIER is dead now, so remove the leftovers. What's left
    at this point is:

    - various checks inside the block layer.
    - sanity checks in bio based drivers.
    - now unused bio_empty_barrier helper.
    - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
    but Xen really needs to sort out it's barrier situaton.
    - setting of ordered tags in uas - dead code copied from old scsi
    drivers.
    - scsi different retry for barriers - it's dead and should have been
    removed when flushes were converted to FS requests.
    - blktrace handling of barriers - removed. Someone who knows blktrace
    better should add support for REQ_FLUSH and REQ_FUA, though.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

19 Oct, 2010

1 commit


07 Oct, 2010

1 commit

  • 2.6.36 introduces an API for drivers to switch the IO scheduler
    instead of manually calling the elevator exit and init functions.
    This API was added since q->elevator must be cleared in between
    those two calls. And since we already have this functionality
    directly from use by the sysfs interface to switch schedulers
    online, it was prudent to reuse it internally too.

    But this API needs the queue to be in a fully initialized state
    before it is called, or it will attempt to unregister elevator
    kobjects before they have been added. This results in an oops
    like this:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000051
    IP: [] sysfs_create_dir+0x2e/0xc0
    PGD 47ddfc067 PUD 47c6a1067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP
    last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/irq
    CPU 2
    Modules linked in: t(+) loop hid_apple usbhid ahci ehci_hcd uhci_hcd libahci usbcore nls_base igb

    Pid: 7319, comm: modprobe Not tainted 2.6.36-rc6+ #132 QSSC-S4R/QSSC-S4R
    RIP: 0010:[] [] sysfs_create_dir+0x2e/0xc0
    RSP: 0018:ffff88027da25d08 EFLAGS: 00010246
    RAX: ffff88047c68c528 RBX: 00000000fffffffe RCX: 0000000000000000
    RDX: 000000000000002f RSI: 000000000000002f RDI: ffff88047e196c88
    RBP: ffff88027da25d38 R08: 0000000000000000 R09: d84156c5635688c0
    R10: d84156c5635688c0 R11: 0000000000000000 R12: ffff88047e196c88
    R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047c68c528
    FS: 00007fcb0b26f6e0(0000) GS:ffff880287400000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000051 CR3: 000000047e76e000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process modprobe (pid: 7319, threadinfo ffff88027da24000, task ffff88027d377090)
    Stack:
    ffff88027da25d58 ffff88047c68c528 00000000fffffffe ffff88047e196c88
    ffff88047c68c528 ffff88047e05bd90 ffff88027da25d78 ffffffff8123fb77
    ffff88047e05bd90 0000000000000000 ffff88047e196c88 ffff88047c68c528
    Call Trace:
    [] kobject_add_internal+0xe7/0x1f0
    [] kobject_add_varg+0x38/0x60
    [] kobject_add+0x69/0x90
    [] ? sysfs_remove_dir+0x20/0xa0
    [] ? sub_preempt_count+0x9d/0xe0
    [] ? _raw_spin_unlock+0x30/0x50
    [] ? sysfs_remove_dir+0x20/0xa0
    [] ? sysfs_remove_dir+0x34/0xa0
    [] elv_register_queue+0x34/0xa0
    [] elevator_change+0xfd/0x250
    [] ? t_init+0x0/0x361 [t]
    [] ? t_init+0x0/0x361 [t]
    [] t_init+0xa8/0x361 [t]
    [] do_one_initcall+0x3e/0x170
    [] sys_init_module+0xbd/0x220
    [] system_call_fastpath+0x16/0x1b
    Code: e5 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 48 85 ff 74 52 48 8b 47 18 49 c7 c5 00 46 61 81 48 85 c0 74 04 4c 8b 68 30 45 31 f6 80 7d 51 00 74 0e 49 8b 44 24 28 4c 89 e7 ff 50 20 49 89 c6
    RIP [] sysfs_create_dir+0x2e/0xc0
    RSP
    CR2: 0000000000000051
    ---[ end trace a6541d3bf07945df ]---

    Fix this by adding a registered bit to the elevator queue, which is
    set when the sysfs kobjects have been registered.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

10 Sep, 2010

1 commit

  • Filesystems will take all the responsibilities for ordering requests
    around commit writes and will only indicate how the commit writes
    themselves should be handled by block layers. This patch drops
    barrier ordering by queue draining from block layer. Ordering by
    draining implementation was somewhat invasive to request handling.
    List of notable changes follow.

    * Each queue has 1 bit color which is flipped on each barrier issue.
    This is used to track whether a given request is issued before the
    current barrier or not. REQ_ORDERED_COLOR flag and coloring
    implementation in __elv_add_request() are removed.

    * Requests which shouldn't be processed yet for draining were stalled
    by returning -EAGAIN from blk_do_ordered() according to the test
    result between blk_ordered_req_seq() and blk_blk_ordered_cur_seq().
    This logic is removed.

    * Draining completion logic in elv_completed_request() removed.

    * All barrier sequence requests were queued to request queue and then
    trckled to lower layer according to progress and thus maintaining
    request orders during requeue was necessary. This is replaced by
    queueing the next request in the barrier sequence only after the
    current one is complete from blk_ordered_complete_seq(), which
    removes the need for multiple proxy requests in struct request_queue
    and the request sorting logic in the ELEVATOR_INSERT_REQUEUE path of
    elv_insert().

    * As barriers no longer have ordering constraints, there's no need to
    dump the whole elevator onto the dispatch queue on each barrier.
    Insert barriers at the front instead.

    * If other barrier requests come to the front of the dispatch queue
    while one is already in progress, they are stored in
    q->pending_barriers and restored to dispatch queue one-by-one after
    each barrier completion from blk_ordered_complete_seq().

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     

23 Aug, 2010

1 commit

  • Currently drivers must do an elevator_exit() + elevator_init()
    to switch IO schedulers. There are a few problems with this:

    - Since commit 1abec4fdbb142e3ccb6ce99832fae42129134a96,
    elevator_init() requires a zeroed out q->elevator
    pointer. The two existing in-kernel users don't do that.

    - It will only work at initialization time, since using the
    above two-staged construct does not properly quisce the queue.

    So add elevator_change() which takes care of this, and convert
    the elv_iosched_store() sysfs interface to use this helper as well.

    Reported-by: Peter Oberparleiter
    Reported-by: Kevin Vigor
    Signed-off-by: Jens Axboe

    Jens Axboe
     

12 Aug, 2010

1 commit

  • Secure discard is the same as discard except that all copies of the
    discarded sectors (perhaps created by garbage collection) must also be
    erased.

    Signed-off-by: Adrian Hunter
    Acked-by: Jens Axboe
    Cc: Kyungmin Park
    Cc: Madhusudhan Chikkature
    Cc: Christoph Hellwig
    Cc: Ben Gardiner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Hunter
     

08 Aug, 2010

2 commits

  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
    struct requests. This allows much easier grepping for different request
    types instead of unwinding through macros.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

04 Jun, 2010

1 commit


24 May, 2010

1 commit

  • Bio-based DM doesn't use an elevator (queue is !blk_queue_stackable()).

    Longer-term DM will not allocate an elevator for bio-based DM. But even
    then there will be small potential for an elevator to be allocated for
    a request-based DM table only to have a bio-based table be loaded in the
    end.

    Displaying "none" for bio-based DM will help avoid user confusion.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

11 May, 2010

1 commit

  • blk_init_queue() allocates the request_queue structure and then
    initializes it as needed (request_fn, elevator, etc).

    Split initialization out to blk_init_allocated_queue_node.
    Introduce blk_init_allocated_queue wrapper function to model existing
    blk_init_queue and blk_init_queue_node interfaces.

    Export elv_register_queue to allow a newly added elevator to be
    registered with sysfs. Export elv_unregister_queue for symmetry.

    These changes allow DM to initialize a device's request_queue with more
    precision. In particular, DM no longer unconditionally initializes a
    full request_queue (elevator et al). It only does so for a
    request-based DM device.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

09 Apr, 2010

1 commit

  • This includes both the number of bios merged into requests belonging to this
    cgroup as well as the number of requests merged together.
    In the past, we've observed different merging behavior across upstream kernels,
    some by design some actual bugs. This stat helps a lot in debugging such
    problems when applications report decreased throughput with a new kernel
    version.

    This needed adding an extra elevator function to capture bios being merged as I
    did not want to pollute elevator code with blkiocg knowledge and hence needed
    the accounting invocation to come from CFQ.

    Signed-off-by: Divyesh Shah
    Signed-off-by: Jens Axboe

    Divyesh Shah
     

02 Apr, 2010

1 commit


08 Mar, 2010

1 commit

  • Constify struct sysfs_ops.

    This is part of the ops structure constification
    effort started by Arjan van de Ven et al.

    Benefits of this constification:

    * prevents modification of data that is shared
    (referenced) by many other structure instances
    at runtime

    * detects/prevents accidental (but not intentional)
    modification attempts on archs that enforce
    read-only kernel data at runtime

    * potentially better optimized code as the compiler
    can assume that the const data cannot be changed

    * the compiler/linker move const data into .rodata
    and therefore exclude them from false sharing

    Signed-off-by: Emese Revfy
    Acked-by: David Teigland
    Acked-by: Matt Domsch
    Acked-by: Maciej Sosnowski
    Acked-by: Hans J. Koch
    Acked-by: Pekka Enberg
    Acked-by: Jens Axboe
    Acked-by: Stephen Hemminger
    Signed-off-by: Greg Kroah-Hartman

    Emese Revfy
     

29 Jan, 2010

1 commit

  • Updated 'nomerges' tunable to accept a value of '2' - indicating that _no_
    merges at all are to be attempted (not even the simple one-hit cache).

    The following table illustrates the additional benefit - 5 minute runs of
    a random I/O load were applied to a dozen devices on a 16-way x86_64 system.

    nomerges Throughput %System Improvement (tput / %sys)
    -------- ------------ ----------- -------------------------
    0 12.45 MB/sec 0.669365609
    1 12.50 MB/sec 0.641519199 0.40% / 2.71%
    2 12.52 MB/sec 0.639849750 0.56% / 2.96%

    Signed-off-by: Alan D. Brunelle
    Signed-off-by: Jens Axboe

    Alan D. Brunelle
     

13 Oct, 2009

1 commit


09 Oct, 2009

1 commit

  • elv_iosched_store() ignore the return value of strstrip(). It makes small
    inconsistent behavior.

    This patch fixes it.


    ====================================
    # cd /sys/block/{blockdev}/queue

    case1:
    # echo "anticipatory" > scheduler
    # cat scheduler
    noop [anticipatory] deadline cfq

    case2:
    # echo "anticipatory " > scheduler
    # cat scheduler
    noop [anticipatory] deadline cfq

    case3:
    # echo " anticipatory" > scheduler
    bash: echo: write error: Invalid argument


    ====================================
    # cd /sys/block/{blockdev}/queue

    case1:
    # echo "anticipatory" > scheduler
    # cat scheduler
    noop [anticipatory] deadline cfq

    case2:
    # echo "anticipatory " > scheduler
    # cat scheduler
    noop [anticipatory] deadline cfq

    case3:
    # echo " anticipatory" > scheduler
    noop [anticipatory] deadline cfq

    Cc: Li Zefan
    Cc: Jens Axboe
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Jens Axboe

    KOSAKI Motohiro
     

03 Oct, 2009

1 commit

  • AS is mostly a subset of CFQ, so there's little point in still
    providing this separate IO scheduler. Hopefully at some point we
    can get down to one single IO scheduler again, at least this brings
    us closer by having only one intelligent IO scheduler.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

11 Sep, 2009

2 commits

  • Get rid of any functions that test for these bits and make callers
    use bio_rw_flagged() directly. Then it is at least directly apparent
    what variable and flag they check.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Update scsi_io_completion() such that it only fails requests till the
    next error boundary and retry the leftover. This enables block layer
    to merge requests with different failfast settings and still behave
    correctly on errors. Allow merge of requests of different failfast
    settings.

    As SCSI is currently the only subsystem which follows failfast status,
    there's no need to worry about other block drivers for now.

    Signed-off-by: Tejun Heo
    Cc: Niel Lambrechts
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    Tejun Heo
     

17 Jul, 2009

1 commit

  • Commit ab0fd1debe730ec9998678a0c53caefbd121ed10 tries to prevent merge
    of requests with different failfast settings. In elv_rq_merge_ok(),
    it compares new bio's failfast flags against the merge target
    request's. However, the flag testing accessors for bio and blk don't
    return boolean but the tested bit value directly and FAILFAST on bio
    and blk don't match, so directly comparing them with == results in
    false negative unnecessary preventing merge of readahead requests.

    This patch convert the results to boolean by negating them before
    comparison.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Boaz Harrosh
    Cc: FUJITA Tomonori
    Cc: James Bottomley
    Cc: Jeff Garzik

    Tejun Heo
     

04 Jul, 2009

1 commit

  • Block layer used to merge requests and bios with different failfast
    settings. This caused regular IOs to fail prematurely when they were
    merged into failfast requests for readahead.

    Niel Lambrechts could trigger the problem semi-reliably on ext4 when
    resuming from STR. ext4 uses readahead when reading inodes and
    combined with the deterministic extra SATA PHY exception cycle during
    resume on the specific configuration, non-readahead inode read would
    fail causing ext4 errors. Please read the following thread for
    details.

    http://lkml.org/lkml/2009/5/23/21

    This patch makes block layer reject merging if the failfast settings
    don't match. This is correct but likely to lower IO performance by
    preventing regular IOs from mingling into surrounding readahead
    requests. Changes to allow such mixed merges and handle errors
    correctly will be added later.

    Signed-off-by: Tejun Heo
    Reported-by: Niel Lambrechts
    Cc: Theodore Tso
    Signed-off-by: Jens Axboe

    Tejun Heo
     

12 Jun, 2009

1 commit

  • * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
    block: add request clone interface (v2)
    floppy: fix hibernation
    ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
    fs/bio.c: add missing __user annotation
    block: prevent possible io_context->refcount overflow
    Add serial number support for virtio_blk, V4a
    block: Add missing bounce_pfn stacking and fix comments
    Revert "block: Fix bounce limit setting in DM"
    cciss: decode unit attention in SCSI error handling code
    cciss: Remove no longer needed sendcmd reject processing code
    cciss: change SCSI error handling routines to work with interrupts enabled.
    cciss: separate error processing and command retrying code in sendcmd_withirq_core()
    cciss: factor out fix target status processing code from sendcmd functions
    cciss: simplify interface of sendcmd() and sendcmd_withirq()
    cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
    cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
    block: needs to set the residual length of a bidi request
    Revert "block: implement blkdev_readpages"
    block: Fix bounce limit setting in DM
    Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
    ...

    Manually fix conflicts with tracing updates in:
    block/blk-sysfs.c
    drivers/ide/ide-atapi.c
    drivers/ide/ide-cd.c
    drivers/ide/ide-floppy.c
    drivers/ide/ide-tape.c
    include/trace/events/block.h
    kernel/trace/blktrace.c

    Linus Torvalds
     

10 Jun, 2009

1 commit

  • TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
    these new capabilities to this tracepoint:

    - zero-copy and per-cpu splice() tracing
    - binary tracing without printf overhead
    - structured logging records exposed under /debug/tracing/events
    - trace events embedded in function tracer output and other plugins
    - user-defined, per tracepoint filter expressions
    ...

    Cons:

    - no dev_t info for the output of plug, unplug_timer and unplug_io events.
    no dev_t info for getrq and sleeprq events if bio == NULL.
    no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.

    This is mainly because we can't get the deivce from a request queue.
    But this may change in the future.

    - A packet command is converted to a string in TP_assign, not TP_print.
    While blktrace do the convertion just before output.

    Since pc requests should be rather rare, this is not a big issue.

    - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
    has a unique format, which means we have some unused data in a trace entry.

    The overhead is minimized by using __dynamic_array() instead of __array().

    I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:

    dd dd + ioctl blktrace dd + TRACE_EVENT (splice)
    1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s
    2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s
    3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s

    So the overhead of tracing is very small, and no regression when using
    those trace events vs blktrace.

    And the binary output of TRACE_EVENT is much smaller than blktrace:

    # ls -l -h
    -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
    -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
    -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out

    Following are some comparisons between TRACE_EVENT and blktrace:

    plug:
    kjournald-480 [000] 303.084981: block_plug: [kjournald]
    kjournald-480 [000] 303.084981: 8,0 P N [kjournald]

    unplug_io:
    kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1
    kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1

    remap:
    kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 v3:

    - use the newly introduced __dynamic_array().

    Changelog from v1 -> v2:

    - use __string() instead of __array() to minimize the memory required
    to store hex dump of rq->cmd().

    - support large pc requests.

    - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.

    - some cleanups.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

02 Jun, 2009

1 commit

  • I found one more mis-conversion to the 'request is always dequeued
    when completing' model in elv_abort_queue() during code inspection.
    Although I haven't hit any problem caused by this mis-conversion yet
    and just done compile/boot test, please apply if you have no problem.

    Request must be dequeued when it completes.
    However, elv_abort_queue() completes requests without dequeueing.
    This will cause oops in the __blk_end_request_all().
    This patch fixes the oops.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda
     

23 May, 2009

1 commit

  • Currently stacking devices do not have a queue directory in sysfs.
    However, many of the I/O characteristics like sector size, maximum
    request size, etc. are queue properties.

    This patch enables the queue directory for MD/DM devices. The elevator
    code has been modified to deal with queues that do not have an I/O
    scheduler.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

20 May, 2009

1 commit


11 May, 2009

1 commit

  • With recent cleanups, there is no place where low level driver
    directly manipulates request fields. This means that the 'hard'
    request fields always equal the !hard fields. Convert all
    rq->sectors, nr_sectors and current_nr_sectors references to
    accessors.

    While at it, drop superflous blk_rq_pos() < 0 test in swim.c.

    [ Impact: use pos and nr_sectors accessors ]

    Signed-off-by: Tejun Heo
    Acked-by: Geert Uytterhoeven
    Tested-by: Grant Likely
    Acked-by: Grant Likely
    Tested-by: Adrian McMenamin
    Acked-by: Adrian McMenamin
    Acked-by: Mike Miller
    Cc: James Bottomley
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Borislav Petkov
    Cc: Sergei Shtylyov
    Cc: Eric Moore
    Cc: Alan Stern
    Cc: FUJITA Tomonori
    Cc: Pete Zaitcev
    Cc: Stephen Rothwell
    Cc: Paul Clements
    Cc: Tim Waugh
    Cc: Jeff Garzik
    Cc: Jeremy Fitzhardinge
    Cc: Alex Dubov
    Cc: David Woodhouse
    Cc: Martin Schwidefsky
    Cc: Dario Ballabio
    Cc: David S. Miller
    Cc: Rusty Russell
    Cc: unsik Kim
    Cc: Laurent Vivier
    Signed-off-by: Jens Axboe

    Tejun Heo
     

28 Apr, 2009

3 commits

  • There are many [__]blk_end_request() call sites which call it with
    full request length and expect full completion. Many of them ensure
    that the request actually completes by doing BUG_ON() the return
    value, which is awkward and error-prone.

    This patch adds [__]blk_end_request_all() which takes @rq and @error
    and fully completes the request. BUG_ON() is added to to ensure that
    this actually happens.

    Most conversions are simple but there are a few noteworthy ones.

    * cdrom/viocd: viocd_end_request() replaced with direct calls to
    __blk_end_request_all().

    * s390/block/dasd: dasd_end_request() replaced with direct calls to
    __blk_end_request_all().

    * s390/char/tape_block: tapeblock_end_request() replaced with direct
    calls to blk_end_request_all().

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo
    Cc: Russell King
    Cc: Stephen Rothwell
    Cc: Mike Miller
    Cc: Martin Schwidefsky
    Cc: Jeff Garzik
    Cc: Rusty Russell
    Cc: Jeremy Fitzhardinge
    Cc: Alex Dubov
    Cc: James Bottomley

    Tejun Heo
     
  • Impact: code reorganization

    elv_next_request() and elv_dequeue_request() are public block layer
    interface than actual elevator implementation. They mostly deal with
    how requests interact with block layer and low level drivers at the
    beginning of rqeuest processing whereas __elv_next_request() is the
    actual eleveator request fetching interface.

    Move the two functions to blk-core.c. This prepares for further
    interface cleanup.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blk_start_queueing() is identical to __blk_run_queue() except that it
    doesn't check for recursion. None of the current users depends on
    blk_start_queueing() running request_fn directly. Replace usages of
    blk_start_queueing() with [__]blk_run_queue() and kill it.

    [ Impact: removal of mostly duplicate interface function ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     

15 Apr, 2009

1 commit


07 Apr, 2009

2 commits


06 Apr, 2009

1 commit


29 Dec, 2008

3 commits

  • Just use struct elevator_queue everywhere instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Empty barrier required special handling in __elv_next_request() to
    complete it without letting the low level driver see it.

    With previous changes, barrier code is now flexible enough to skip the
    BAR step using the same barrier sequence selection mechanism. Drop
    the special handling and mask off q->ordered from start_ordered().

    Remove blk_empty_barrier() test which now has no user.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Barrier completion had the following assumptions.

    * start_ordered() couldn't finish the whole sequence properly. If all
    actions are to be skipped, q->ordseq is set correctly but the actual
    completion was never triggered thus hanging the barrier request.

    * Drain completion in elv_complete_request() assumed that there's
    always at least one request in the queue when drain completes.

    Both assumptions are true but these assumptions need to be removed to
    improve empty barrier implementation. This patch makes the following
    changes.

    * Make start_ordered() use blk_ordered_complete_seq() to mark skipped
    steps complete and notify __elv_next_request() that it should fetch
    the next request if the whole barrier has completed inside
    start_ordered().

    * Make drain completion path in elv_complete_request() check whether
    the queue is empty. Empty queue also indicates drain completion.

    * While at it, convert 0/1 return from blk_do_ordered() to false/true.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

05 Dec, 2008

1 commit


03 Dec, 2008

1 commit

  • blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
    both start the timeout timer. Barrier code dequeues the original
    barrier request but doesn't passes the request itself to lower level
    driver, only broken down proxy requests; however, as the original
    barrier code goes through the same dequeue path and timeout timer is
    started on it. If barrier sequence takes long enough, this timer
    expires but the low level driver has no idea about this request and
    oops follows.

    Timeout timer shouldn't have been started on the original barrier
    request as it never goes through actual IO. This patch unexports
    elv_dequeue_request(), which has no external user anyway, and makes it
    operate on elevator proper w/o adding the timer and make
    blkdev_dequeue_request() call elv_dequeue_request() and add timer.
    Internal users which don't pass the request to driver - barrier code
    and end_that_request_last() - are converted to use
    elv_dequeue_request().

    Signed-off-by: Tejun Heo
    Cc: Mike Anderson
    Signed-off-by: Jens Axboe

    Tejun Heo