29 Dec, 2008

3 commits

  • Just use struct elevator_queue everywhere instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Empty barrier required special handling in __elv_next_request() to
    complete it without letting the low level driver see it.

    With previous changes, barrier code is now flexible enough to skip the
    BAR step using the same barrier sequence selection mechanism. Drop
    the special handling and mask off q->ordered from start_ordered().

    Remove blk_empty_barrier() test which now has no user.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Barrier completion had the following assumptions.

    * start_ordered() couldn't finish the whole sequence properly. If all
    actions are to be skipped, q->ordseq is set correctly but the actual
    completion was never triggered thus hanging the barrier request.

    * Drain completion in elv_complete_request() assumed that there's
    always at least one request in the queue when drain completes.

    Both assumptions are true but these assumptions need to be removed to
    improve empty barrier implementation. This patch makes the following
    changes.

    * Make start_ordered() use blk_ordered_complete_seq() to mark skipped
    steps complete and notify __elv_next_request() that it should fetch
    the next request if the whole barrier has completed inside
    start_ordered().

    * Make drain completion path in elv_complete_request() check whether
    the queue is empty. Empty queue also indicates drain completion.

    * While at it, convert 0/1 return from blk_do_ordered() to false/true.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

05 Dec, 2008

1 commit


03 Dec, 2008

1 commit

  • blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
    both start the timeout timer. Barrier code dequeues the original
    barrier request but doesn't passes the request itself to lower level
    driver, only broken down proxy requests; however, as the original
    barrier code goes through the same dequeue path and timeout timer is
    started on it. If barrier sequence takes long enough, this timer
    expires but the low level driver has no idea about this request and
    oops follows.

    Timeout timer shouldn't have been started on the original barrier
    request as it never goes through actual IO. This patch unexports
    elv_dequeue_request(), which has no external user anyway, and makes it
    operate on elevator proper w/o adding the timer and make
    blkdev_dequeue_request() call elv_dequeue_request() and add timer.
    Internal users which don't pass the request to driver - barrier code
    and end_that_request_last() - are converted to use
    elv_dequeue_request().

    Signed-off-by: Tejun Heo
    Cc: Mike Anderson
    Signed-off-by: Jens Axboe

    Tejun Heo
     

26 Nov, 2008

2 commits

  • Port to the new tracepoints API: split DEFINE_TRACE() and DECLARE_TRACE()
    sites. Spread them out to the usage sites, as suggested by
    Mathieu Desnoyers.

    Signed-off-by: Ingo Molnar
    Acked-by: Mathieu Desnoyers

    Ingo Molnar
     
  • This was a forward port of work done by Mathieu Desnoyers, I changed it to
    encode the 'what' parameter on the tracepoint name, so that one can register
    interest in specific events and not on classes of events to then check the
    'what' parameter.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Jens Axboe
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

06 Nov, 2008

1 commit

  • Block queue supports two usage models - one where block driver peeks
    at the front of queue using elv_next_request(), processes it and
    finishes it and the other where block driver peeks at the front of
    queue, dequeue the request using blkdev_dequeue_request() and finishes
    it. The latter is more flexible as it allows the driver to process
    multiple commands concurrently.

    These two inconsistent usage models affect the block layer
    implementation confusing. For some, elv_next_request() is considered
    the issue point while others consider blkdev_dequeue_request() the
    issue point.

    Till now the inconsistency mostly affect only accounting, so it didn't
    really break anything seriously; however, with block layer timeout,
    this inconsistency hits hard. Block layer considers
    elv_next_request() the issue point and adds timer but SCSI layer
    thinks it was just peeking and when the request can't process the
    command right away, it's just left there without further processing.
    This makes the request dangling on the timer list and, when the timer
    goes off, the request which the SCSI layer and below think is still on
    the block queue ends up in the EH queue, causing various problems - EH
    hang (failed count goes over busy count and EH never wakes up),
    WARN_ON() and oopses as low level driver trying to handle the unknown
    command, etc. depending on the timing.

    As SCSI midlayer is the only user of block layer timer at the moment,
    moving blk_add_timer() to elv_dequeue_request() fixes the problem;
    however, this two usage models definitely need to be cleaned up in the
    future.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

17 Oct, 2008

2 commits


09 Oct, 2008

6 commits


03 Jul, 2008

2 commits

  • Avoid bad things happening if the module has a printk control string in
    its name.

    Signed-off-by: maximilian attems
    Signed-off-by: Jens Axboe

    maximilian attems
     
  • Some block devices support verifying the integrity of requests by way
    of checksums or other protection information that is submitted along
    with the I/O.

    This patch implements support for generating and verifying integrity
    metadata, as well as correctly merging, splitting and cloning bios and
    requests that have this extra information attached.

    See Documentation/block/data-integrity.txt for more information.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

28 May, 2008

1 commit


01 May, 2008

1 commit


29 Apr, 2008

3 commits

  • The block I/O + elevator + I/O scheduler code spend a lot of time trying
    to merge I/Os -- rightfully so under "normal" circumstances. However,
    if one were to know that the incoming I/O stream was /very/ random in
    nature, the cycles are wasted.

    This patch adds a per-request_queue tunable that (when set) disables
    merge attempts (beyond the simple one-hit cache check), thus freeing up
    a non-trivial amount of CPU cycles.

    Signed-off-by: Alan D. Brunelle
    Signed-off-by: Jens Axboe

    Alan D. Brunelle
     
  • This patch fixes the following build error with UML and gcc 4.3:

    ...
    CC block/elevator.o
    /home/bunk/linux/kernel-2.6/git/linux-2.6/block/elevator.c: In function ‘elv_merge’:
    /home/bunk/linux/kernel-2.6/git/linux-2.6/block/elevator.c:73: sorry, unimplemented: inlining failed in call to ‘elv_rq_merge_ok’: function body not available
    /home/bunk/linux/kernel-2.6/git/linux-2.6/block/elevator.c:103: sorry, unimplemented: called from here
    /home/bunk/linux/kernel-2.6/git/linux-2.6/block/elevator.c:73: sorry, unimplemented: inlining failed in call to ‘elv_rq_merge_ok’: function body not available
    /home/bunk/linux/kernel-2.6/git/linux-2.6/block/elevator.c:495: sorry, unimplemented: called from here
    make[2]: *** [block/elevator.o] Error 1
    make[1]: *** [block] Error 2

    Signed-off-by: Adrian Bunk
    Signed-off-by: Jens Axboe

    Adrian Bunk
     
  • We can save some atomic ops in the IO path, if we clearly define
    the rules of how to modify the queue flags.

    Signed-off-by: Jens Axboe

    Nick Piggin
     

19 Feb, 2008

1 commit

  • Currently we fail if someone requests a valid io scheduler, but it's
    modular and not currently loaded. That can happen from a driver init
    asking for a different scheduler, or online switching through sysfs
    as requested by a user.

    This patch makes elevator_get() request_module() to attempt to load
    the appropriate module, instead of requiring that done manually.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

01 Feb, 2008

1 commit


28 Jan, 2008

1 commit

  • These DMA drain buffer implementations in drivers are pretty horrible
    to do in terms of manipulating the scatterlist. Plus they're being
    done at least in drivers/ide and drivers/ata, so we now have code
    duplication.

    The one use case for this, as I understand it is AHCI controllers doing
    PIO mode to mmc devices but translating this to DMA at the controller
    level.

    So, what about adding a callback to the block layer that permits the
    adding of the drain buffer for the problem devices. The idea is that
    you'd do this in slave_configure after you find one of these devices.

    The beauty of doing it in the block layer is that it quietly adds the
    drain buffer to the end of the sg list, so it automatically gets mapped
    (and unmapped) without anything unusual having to be done to the
    scatterlist in driver/scsi or drivers/ata and without any alteration to
    the transfer length.

    Signed-off-by: James Bottomley
    Signed-off-by: Jens Axboe

    James Bottomley
     

25 Jan, 2008

3 commits


18 Dec, 2007

1 commit


20 Oct, 2007

1 commit


16 Oct, 2007

2 commits


13 Oct, 2007

1 commit


24 Jul, 2007

1 commit

  • Some of the code has been gradually transitioned to using the proper
    struct request_queue, but there's lots left. So do a full sweet of
    the kernel and get rid of this typedef and replace its uses with
    the proper type.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

18 Jul, 2007

1 commit

  • kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing
    variant in the past. But with __GFP_ZERO it is possible now to do zeroing
    while allocating.

    Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever
    we can.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

10 Jul, 2007

1 commit


30 Apr, 2007

1 commit


27 Mar, 2007

1 commit

  • Booting 2.6.21-rc3-g45592145 I noticed the following on one of my
    machines in the bootlog:

    io scheduler noop registeredTime: jiffies clocksource has been installed.

    io scheduler deadline registered (default)

    Looking at block/elevator.c, it appears that elv_register() uses two
    consecutive printks in a non-atomic way, leading to the above glitch. The
    attached trivial patch fixes this issue, by using a single printk.

    Signed-off-by: Thibaut VARENE
    Signed-off-by: Jens Axboe

    Thibaut VARENE
     

12 Feb, 2007

1 commit