06 Nov, 2008

1 commit


17 Oct, 2008

1 commit

  • This fixes the bug reported by Nikanth Karthikesan :

    http://lkml.org/lkml/2008/10/2/203

    The root cause of the bug is that blk_phys_contig_segment
    miscalculates q->max_segment_size.

    blk_phys_contig_segment checks:

    req->biotail->bi_size + next_req->bio->bi_size > q->max_segment_size

    But blk_recalc_rq_segments might expect that req->biotail and the
    previous bio in the req are supposed be merged into one
    segment. blk_recalc_rq_segments might also expect that next_req->bio
    and the next bio in the next_req are supposed be merged into one
    segment. In such case, we merge two requests that can't be merged
    here. Later, blk_rq_map_sg gives more segments than it should.

    We need to keep track of segment size in blk_recalc_rq_segments and
    use it to see if two requests can be merged. This patch implements it
    in the similar way that we used to do for hw merging (virtual
    merging).

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     

09 Oct, 2008

8 commits

  • Somewhat incomplete, as we do allow merges of requests and bios
    that have different completion CPUs given. This is done on the
    assumption that a larger IO is still more beneficial than CPU
    locality.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Move stats related fields - stamp, in_flight, dkstats - from disk to
    part0 and unify stat handling such that...

    * part_stat_*() now updates part0 together if the specified partition
    is not part0. ie. part_stat_*() are now essentially all_stat_*().

    * {disk|all}_stat_*() are gone.

    * part_round_stats() is updated similary. It handles part0 stats
    automatically and disk_round_stats() is killed.

    * part_{inc|dec}_in_fligh() is implemented which automatically updates
    part0 stats for parts other than part0.

    * disk_map_sector_rcu() is updated to return part0 if no part matches.
    Combined with the above changes, this makes NULL special case
    handling in callers unnecessary.

    * Separate stats show code paths for disk are collapsed into part
    stats show code paths.

    * Rename disk_stat_lock/unlock() to part_stat_lock/unlock()

    While at it, reposition stat handling macros a bit and add missing
    parentheses around macro parameters.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • There are two variants of stat functions - ones prefixed with double
    underbars which don't care about preemption and ones without which
    disable preemption before manipulating per-cpu counters. It's unclear
    whether the underbarred ones assume that preemtion is disabled on
    entry as some callers don't do that.

    This patch unifies diskstats access by implementing disk_stat_lock()
    and disk_stat_unlock() which take care of both RCU (for partition
    access) and preemption (for per-cpu counter access). diskstats access
    should always be enclosed between the two functions. As such, there's
    no need for the versions which disables preemption. They're removed
    and double underbars ones are renamed to drop the underbars. As an
    extra argument is added, there's no danger of using the old version
    unconverted.

    disk_stat_lock() uses get_cpu() and returns the cpu index and all
    diskstat functions which access per-cpu counters now has @cpu
    argument to help RT.

    This change adds RCU or preemption operations at some places but also
    collapses several preemption ops into one at others. Overall, the
    performance difference should be negligible as all involved ops are
    very lightweight per-cpu ones.

    Signed-off-by: Tejun Heo
    Cc: Peter Zijlstra
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • disk->part[] is protected by its matching bdev's lock. However,
    non-critical accesses like collecting stats and printing out sysfs and
    proc information used to be performed without any locking. As
    partitions can come and go dynamically, partitions can go away
    underneath those non-critical accesses. As some of those accesses are
    writes, this theoretically can lead to silent corruption.

    This patch fixes the race by using RCU for the partition array and dev
    reference counter to hold partitions.

    * Rename disk->part[] to disk->__part[] to make sure no one outside
    genhd layer proper accesses it directly.

    * Use RCU for disk->__part[] dereferencing.

    * Implement disk_{get|put}_part() which can be used to get and put
    partitions from gendisk respectively.

    * Iterators are implemented to help iterate through all partitions
    safely.

    * Functions which require RCU readlock are marked with _rcu suffix.

    * Use disk_put_part() in __blkdev_put() instead of directly putting
    the contained kobject.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • This patch makes the following misc updates in preparation for
    disk->part dereference fix and extended block devt support.

    * implment part_to_disk()

    * fix comment about gendisk->part indexing

    * rename get_part() to disk_map_sector()

    * don't use n which is always zero while printing disk information in
    diskstats_show()

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Remove hw_segments field from struct bio and struct request. Without virtual
    merge accounting they have no purpose.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Jens Axboe

    Mikulas Patocka
     
  • Remove virtual merge accounting.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Jens Axboe

    Mikulas Patocka
     
  • But blkdev_issue_discard() still emits requests which are interpreted as
    soft barriers, because naïve callers might otherwise issue subsequent
    writes to those same sectors, which might cross on the queue (if they're
    reallocated quickly enough).

    Callers still _can_ issue non-barrier discard requests, but they have to
    take care of queue ordering for themselves.

    Signed-off-by: David Woodhouse
    Signed-off-by: Jens Axboe

    David Woodhouse
     

03 Jul, 2008

1 commit

  • Some block devices support verifying the integrity of requests by way
    of checksums or other protection information that is submitted along
    with the I/O.

    This patch implements support for generating and verifying integrity
    metadata, as well as correctly merging, splitting and cloning bios and
    requests that have this extra information attached.

    See Documentation/block/data-integrity.txt for more information.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

07 May, 2008

1 commit


29 Apr, 2008

1 commit


21 Apr, 2008

1 commit

  • blk_rq_map_user adjusts bi_size of the last bio. It breaks the rule
    that req->data_len (the true data length) is equal to sum(bio). It
    broke the scsi command completion code.

    commit e97a294ef6938512b655b1abf17656cf2b26f709 was introduced to fix
    the above issue. However, the partial completion code doesn't work
    with it. The commit is also a layer violation (scsi mid-layer should
    not know about the block layer's padding).

    This patch moves the padding adjustment to blk_rq_map_sg (suggested by
    James). The padding works like the drain buffer. This patch breaks the
    rule that req->data_len is equal to sum(sg), however, the drain buffer
    already broke it. So this patch just restores the rule that
    req->data_len is equal to sub(bio) without breaking anything new.

    Now when a low level driver needs padding, blk_rq_map_user and
    blk_rq_map_user_iov guarantee there's enough room for padding.
    blk_rq_map_sg can safely extend the last entry of a scatter list.

    blk_rq_map_sg must extend the last entry of a scatter list only for a
    request that got through bio_copy_user_iov. This patches introduces
    new REQ_COPY_USER flag.

    Signed-off-by: FUJITA Tomonori
    Cc: Tejun Heo
    Cc: Mike Christie
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     

04 Mar, 2008

1 commit

  • The meaning of rq->data_len was changed to the length of an allocated
    buffer from the true data length. It breaks SG_IO friends and
    bsg. This patch restores the meaning of rq->data_len to the true data
    length and adds rq->extra_len to store an extended length (due to
    drain buffer and padding).

    This patch also removes the code to update bio in blk_rq_map_user
    introduced by the commit 40b01b9bbdf51ae543a04744283bf2d56c4a6afa.
    The commit adjusts bio according to memory alignment
    (queue_dma_alignment). However, memory alignment is NOT padding
    alignment. This adjustment also breaks SG_IO friends and bsg. Padding
    alignment needs to be fixed in a proper way (by a separate patch).

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     

19 Feb, 2008

3 commits

  • Clear drain buffer before chaining if the command in question is a
    write.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Draining shouldn't be done for commands where overflow may indicate
    data integrity issues. Add dma_drain_needed callback to
    request_queue. Drain buffer is appened iff this function returns
    non-zero.

    Signed-off-by: Tejun Heo
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • With padding and draining moved into it, block layer now may extend
    requests as directed by queue parameters, so now a request has two
    sizes - the original request size and the extended size which matches
    the size of area pointed to by bios and later by sgs. The latter size
    is what lower layers are primarily interested in when allocating,
    filling up DMA tables and setting up the controller.

    Both padding and draining extend the data area to accomodate
    controller characteristics. As any controller which speaks SCSI can
    handle underflows, feeding larger data area is safe.

    So, this patch makes the primary data length field, request->data_len,
    indicate the size of full data area and add a separate length field,
    request->raw_data_len, for the unmodified request size. The latter is
    used to report to higher layer (userland) and where the original
    request size should be fed to the controller or device.

    Signed-off-by: Tejun Heo
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    Tejun Heo
     

08 Feb, 2008

1 commit


01 Feb, 2008

1 commit


30 Jan, 2008

1 commit