13 Oct, 2009

1 commit


09 Oct, 2009

1 commit

  • elv_iosched_store() ignore the return value of strstrip(). It makes small
    inconsistent behavior.

    This patch fixes it.


    ====================================
    # cd /sys/block/{blockdev}/queue

    case1:
    # echo "anticipatory" > scheduler
    # cat scheduler
    noop [anticipatory] deadline cfq

    case2:
    # echo "anticipatory " > scheduler
    # cat scheduler
    noop [anticipatory] deadline cfq

    case3:
    # echo " anticipatory" > scheduler
    bash: echo: write error: Invalid argument


    ====================================
    # cd /sys/block/{blockdev}/queue

    case1:
    # echo "anticipatory" > scheduler
    # cat scheduler
    noop [anticipatory] deadline cfq

    case2:
    # echo "anticipatory " > scheduler
    # cat scheduler
    noop [anticipatory] deadline cfq

    case3:
    # echo " anticipatory" > scheduler
    noop [anticipatory] deadline cfq

    Cc: Li Zefan
    Cc: Jens Axboe
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Jens Axboe

    KOSAKI Motohiro
     

03 Oct, 2009

1 commit

  • AS is mostly a subset of CFQ, so there's little point in still
    providing this separate IO scheduler. Hopefully at some point we
    can get down to one single IO scheduler again, at least this brings
    us closer by having only one intelligent IO scheduler.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

11 Sep, 2009

2 commits

  • Get rid of any functions that test for these bits and make callers
    use bio_rw_flagged() directly. Then it is at least directly apparent
    what variable and flag they check.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Update scsi_io_completion() such that it only fails requests till the
    next error boundary and retry the leftover. This enables block layer
    to merge requests with different failfast settings and still behave
    correctly on errors. Allow merge of requests of different failfast
    settings.

    As SCSI is currently the only subsystem which follows failfast status,
    there's no need to worry about other block drivers for now.

    Signed-off-by: Tejun Heo
    Cc: Niel Lambrechts
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    Tejun Heo
     

17 Jul, 2009

1 commit

  • Commit ab0fd1debe730ec9998678a0c53caefbd121ed10 tries to prevent merge
    of requests with different failfast settings. In elv_rq_merge_ok(),
    it compares new bio's failfast flags against the merge target
    request's. However, the flag testing accessors for bio and blk don't
    return boolean but the tested bit value directly and FAILFAST on bio
    and blk don't match, so directly comparing them with == results in
    false negative unnecessary preventing merge of readahead requests.

    This patch convert the results to boolean by negating them before
    comparison.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Boaz Harrosh
    Cc: FUJITA Tomonori
    Cc: James Bottomley
    Cc: Jeff Garzik

    Tejun Heo
     

04 Jul, 2009

1 commit

  • Block layer used to merge requests and bios with different failfast
    settings. This caused regular IOs to fail prematurely when they were
    merged into failfast requests for readahead.

    Niel Lambrechts could trigger the problem semi-reliably on ext4 when
    resuming from STR. ext4 uses readahead when reading inodes and
    combined with the deterministic extra SATA PHY exception cycle during
    resume on the specific configuration, non-readahead inode read would
    fail causing ext4 errors. Please read the following thread for
    details.

    http://lkml.org/lkml/2009/5/23/21

    This patch makes block layer reject merging if the failfast settings
    don't match. This is correct but likely to lower IO performance by
    preventing regular IOs from mingling into surrounding readahead
    requests. Changes to allow such mixed merges and handle errors
    correctly will be added later.

    Signed-off-by: Tejun Heo
    Reported-by: Niel Lambrechts
    Cc: Theodore Tso
    Signed-off-by: Jens Axboe

    Tejun Heo
     

12 Jun, 2009

1 commit

  • * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
    block: add request clone interface (v2)
    floppy: fix hibernation
    ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
    fs/bio.c: add missing __user annotation
    block: prevent possible io_context->refcount overflow
    Add serial number support for virtio_blk, V4a
    block: Add missing bounce_pfn stacking and fix comments
    Revert "block: Fix bounce limit setting in DM"
    cciss: decode unit attention in SCSI error handling code
    cciss: Remove no longer needed sendcmd reject processing code
    cciss: change SCSI error handling routines to work with interrupts enabled.
    cciss: separate error processing and command retrying code in sendcmd_withirq_core()
    cciss: factor out fix target status processing code from sendcmd functions
    cciss: simplify interface of sendcmd() and sendcmd_withirq()
    cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
    cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
    block: needs to set the residual length of a bidi request
    Revert "block: implement blkdev_readpages"
    block: Fix bounce limit setting in DM
    Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
    ...

    Manually fix conflicts with tracing updates in:
    block/blk-sysfs.c
    drivers/ide/ide-atapi.c
    drivers/ide/ide-cd.c
    drivers/ide/ide-floppy.c
    drivers/ide/ide-tape.c
    include/trace/events/block.h
    kernel/trace/blktrace.c

    Linus Torvalds
     

10 Jun, 2009

1 commit

  • TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
    these new capabilities to this tracepoint:

    - zero-copy and per-cpu splice() tracing
    - binary tracing without printf overhead
    - structured logging records exposed under /debug/tracing/events
    - trace events embedded in function tracer output and other plugins
    - user-defined, per tracepoint filter expressions
    ...

    Cons:

    - no dev_t info for the output of plug, unplug_timer and unplug_io events.
    no dev_t info for getrq and sleeprq events if bio == NULL.
    no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.

    This is mainly because we can't get the deivce from a request queue.
    But this may change in the future.

    - A packet command is converted to a string in TP_assign, not TP_print.
    While blktrace do the convertion just before output.

    Since pc requests should be rather rare, this is not a big issue.

    - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
    has a unique format, which means we have some unused data in a trace entry.

    The overhead is minimized by using __dynamic_array() instead of __array().

    I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:

    dd dd + ioctl blktrace dd + TRACE_EVENT (splice)
    1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s
    2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s
    3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s

    So the overhead of tracing is very small, and no regression when using
    those trace events vs blktrace.

    And the binary output of TRACE_EVENT is much smaller than blktrace:

    # ls -l -h
    -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
    -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
    -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out

    Following are some comparisons between TRACE_EVENT and blktrace:

    plug:
    kjournald-480 [000] 303.084981: block_plug: [kjournald]
    kjournald-480 [000] 303.084981: 8,0 P N [kjournald]

    unplug_io:
    kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1
    kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1

    remap:
    kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 v3:

    - use the newly introduced __dynamic_array().

    Changelog from v1 -> v2:

    - use __string() instead of __array() to minimize the memory required
    to store hex dump of rq->cmd().

    - support large pc requests.

    - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.

    - some cleanups.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

02 Jun, 2009

1 commit

  • I found one more mis-conversion to the 'request is always dequeued
    when completing' model in elv_abort_queue() during code inspection.
    Although I haven't hit any problem caused by this mis-conversion yet
    and just done compile/boot test, please apply if you have no problem.

    Request must be dequeued when it completes.
    However, elv_abort_queue() completes requests without dequeueing.
    This will cause oops in the __blk_end_request_all().
    This patch fixes the oops.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda
     

23 May, 2009

1 commit

  • Currently stacking devices do not have a queue directory in sysfs.
    However, many of the I/O characteristics like sector size, maximum
    request size, etc. are queue properties.

    This patch enables the queue directory for MD/DM devices. The elevator
    code has been modified to deal with queues that do not have an I/O
    scheduler.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

20 May, 2009

1 commit


11 May, 2009

1 commit

  • With recent cleanups, there is no place where low level driver
    directly manipulates request fields. This means that the 'hard'
    request fields always equal the !hard fields. Convert all
    rq->sectors, nr_sectors and current_nr_sectors references to
    accessors.

    While at it, drop superflous blk_rq_pos() < 0 test in swim.c.

    [ Impact: use pos and nr_sectors accessors ]

    Signed-off-by: Tejun Heo
    Acked-by: Geert Uytterhoeven
    Tested-by: Grant Likely
    Acked-by: Grant Likely
    Tested-by: Adrian McMenamin
    Acked-by: Adrian McMenamin
    Acked-by: Mike Miller
    Cc: James Bottomley
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Borislav Petkov
    Cc: Sergei Shtylyov
    Cc: Eric Moore
    Cc: Alan Stern
    Cc: FUJITA Tomonori
    Cc: Pete Zaitcev
    Cc: Stephen Rothwell
    Cc: Paul Clements
    Cc: Tim Waugh
    Cc: Jeff Garzik
    Cc: Jeremy Fitzhardinge
    Cc: Alex Dubov
    Cc: David Woodhouse
    Cc: Martin Schwidefsky
    Cc: Dario Ballabio
    Cc: David S. Miller
    Cc: Rusty Russell
    Cc: unsik Kim
    Cc: Laurent Vivier
    Signed-off-by: Jens Axboe

    Tejun Heo
     

28 Apr, 2009

3 commits

  • There are many [__]blk_end_request() call sites which call it with
    full request length and expect full completion. Many of them ensure
    that the request actually completes by doing BUG_ON() the return
    value, which is awkward and error-prone.

    This patch adds [__]blk_end_request_all() which takes @rq and @error
    and fully completes the request. BUG_ON() is added to to ensure that
    this actually happens.

    Most conversions are simple but there are a few noteworthy ones.

    * cdrom/viocd: viocd_end_request() replaced with direct calls to
    __blk_end_request_all().

    * s390/block/dasd: dasd_end_request() replaced with direct calls to
    __blk_end_request_all().

    * s390/char/tape_block: tapeblock_end_request() replaced with direct
    calls to blk_end_request_all().

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo
    Cc: Russell King
    Cc: Stephen Rothwell
    Cc: Mike Miller
    Cc: Martin Schwidefsky
    Cc: Jeff Garzik
    Cc: Rusty Russell
    Cc: Jeremy Fitzhardinge
    Cc: Alex Dubov
    Cc: James Bottomley

    Tejun Heo
     
  • Impact: code reorganization

    elv_next_request() and elv_dequeue_request() are public block layer
    interface than actual elevator implementation. They mostly deal with
    how requests interact with block layer and low level drivers at the
    beginning of rqeuest processing whereas __elv_next_request() is the
    actual eleveator request fetching interface.

    Move the two functions to blk-core.c. This prepares for further
    interface cleanup.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • blk_start_queueing() is identical to __blk_run_queue() except that it
    doesn't check for recursion. None of the current users depends on
    blk_start_queueing() running request_fn directly. Replace usages of
    blk_start_queueing() with [__]blk_run_queue() and kill it.

    [ Impact: removal of mostly duplicate interface function ]

    Signed-off-by: Tejun Heo

    Tejun Heo
     

15 Apr, 2009

1 commit


07 Apr, 2009

2 commits


06 Apr, 2009

1 commit


29 Dec, 2008

3 commits

  • Just use struct elevator_queue everywhere instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Empty barrier required special handling in __elv_next_request() to
    complete it without letting the low level driver see it.

    With previous changes, barrier code is now flexible enough to skip the
    BAR step using the same barrier sequence selection mechanism. Drop
    the special handling and mask off q->ordered from start_ordered().

    Remove blk_empty_barrier() test which now has no user.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Barrier completion had the following assumptions.

    * start_ordered() couldn't finish the whole sequence properly. If all
    actions are to be skipped, q->ordseq is set correctly but the actual
    completion was never triggered thus hanging the barrier request.

    * Drain completion in elv_complete_request() assumed that there's
    always at least one request in the queue when drain completes.

    Both assumptions are true but these assumptions need to be removed to
    improve empty barrier implementation. This patch makes the following
    changes.

    * Make start_ordered() use blk_ordered_complete_seq() to mark skipped
    steps complete and notify __elv_next_request() that it should fetch
    the next request if the whole barrier has completed inside
    start_ordered().

    * Make drain completion path in elv_complete_request() check whether
    the queue is empty. Empty queue also indicates drain completion.

    * While at it, convert 0/1 return from blk_do_ordered() to false/true.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

05 Dec, 2008

1 commit


03 Dec, 2008

1 commit

  • blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
    both start the timeout timer. Barrier code dequeues the original
    barrier request but doesn't passes the request itself to lower level
    driver, only broken down proxy requests; however, as the original
    barrier code goes through the same dequeue path and timeout timer is
    started on it. If barrier sequence takes long enough, this timer
    expires but the low level driver has no idea about this request and
    oops follows.

    Timeout timer shouldn't have been started on the original barrier
    request as it never goes through actual IO. This patch unexports
    elv_dequeue_request(), which has no external user anyway, and makes it
    operate on elevator proper w/o adding the timer and make
    blkdev_dequeue_request() call elv_dequeue_request() and add timer.
    Internal users which don't pass the request to driver - barrier code
    and end_that_request_last() - are converted to use
    elv_dequeue_request().

    Signed-off-by: Tejun Heo
    Cc: Mike Anderson
    Signed-off-by: Jens Axboe

    Tejun Heo
     

26 Nov, 2008

2 commits

  • Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE()
    sites. Spread them out to the usage sites, as suggested by
    Mathieu Desnoyers.

    Signed-off-by: Ingo Molnar
    Acked-by: Mathieu Desnoyers

    Ingo Molnar
     
  • This was a forward port of work done by Mathieu Desnoyers, I changed it to
    encode the 'what' parameter on the tracepoint name, so that one can register
    interest in specific events and not on classes of events to then check the
    'what' parameter.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Jens Axboe
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

06 Nov, 2008

1 commit

  • Block queue supports two usage models - one where block driver peeks
    at the front of queue using elv_next_request(), processes it and
    finishes it and the other where block driver peeks at the front of
    queue, dequeue the request using blkdev_dequeue_request() and finishes
    it. The latter is more flexible as it allows the driver to process
    multiple commands concurrently.

    These two inconsistent usage models affect the block layer
    implementation confusing. For some, elv_next_request() is considered
    the issue point while others consider blkdev_dequeue_request() the
    issue point.

    Till now the inconsistency mostly affect only accounting, so it didn't
    really break anything seriously; however, with block layer timeout,
    this inconsistency hits hard. Block layer considers
    elv_next_request() the issue point and adds timer but SCSI layer
    thinks it was just peeking and when the request can't process the
    command right away, it's just left there without further processing.
    This makes the request dangling on the timer list and, when the timer
    goes off, the request which the SCSI layer and below think is still on
    the block queue ends up in the EH queue, causing various problems - EH
    hang (failed count goes over busy count and EH never wakes up),
    WARN_ON() and oopses as low level driver trying to handle the unknown
    command, etc. depending on the timing.

    As SCSI midlayer is the only user of block layer timer at the moment,
    moving blk_add_timer() to elv_dequeue_request() fixes the problem;
    however, this two usage models definitely need to be cleaned up in the
    future.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

17 Oct, 2008

2 commits


09 Oct, 2008

6 commits


03 Jul, 2008

2 commits

  • Avoid bad things happening if the module has a printk control string in
    its name.

    Signed-off-by: maximilian attems
    Signed-off-by: Jens Axboe

    maximilian attems
     
  • Some block devices support verifying the integrity of requests by way
    of checksums or other protection information that is submitted along
    with the I/O.

    This patch implements support for generating and verifying integrity
    metadata, as well as correctly merging, splitting and cloning bios and
    requests that have this extra information attached.

    See Documentation/block/data-integrity.txt for more information.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

28 May, 2008

1 commit


01 May, 2008

1 commit