29 Jan, 2008

2 commits

  • * 'for-2.6.25' of git://git.kernel.dk/linux-2.6-block:
    block: implement drain buffers
    __bio_clone: don't calculate hw/phys segment counts
    block: allow queue dma_alignment of zero
    blktrace: Add blktrace ioctls to SCSI generic devices

    Linus Torvalds
     
  • * 'blk-end-request' of git://git.kernel.dk/linux-2.6-block: (30 commits)
    blk_end_request: changing xsysace (take 4)
    blk_end_request: changing ub (take 4)
    blk_end_request: cleanup of request completion (take 4)
    blk_end_request: cleanup 'uptodate' related code (take 4)
    blk_end_request: remove/unexport end_that_request_* (take 4)
    blk_end_request: changing scsi (take 4)
    blk_end_request: add bidi completion interface (take 4)
    blk_end_request: changing ide-cd (take 4)
    blk_end_request: add callback feature (take 4)
    blk_end_request: changing ide normal caller (take 4)
    blk_end_request: changing cpqarray (take 4)
    blk_end_request: changing cciss (take 4)
    blk_end_request: changing ide-scsi (take 4)
    blk_end_request: changing s390 (take 4)
    blk_end_request: changing mmc (take 4)
    blk_end_request: changing i2o_block (take 4)
    blk_end_request: changing viocd (take 4)
    blk_end_request: changing xen-blkfront (take 4)
    blk_end_request: changing viodasd (take 4)
    blk_end_request: changing sx8 (take 4)
    ...

    Linus Torvalds
     

28 Jan, 2008

17 commits

  • Use of inlines were a bit over the top, trim them down a bit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently you must be root to set idle io prio class on a process. This
    is due to the fact that the idle class is implemented as a true idle
    class, meaning that it will not make progress if someone else is
    requesting disk access. Unfortunately this means that it opens DOS
    opportunities by locking down file system resources, hence it is root
    only at the moment.

    This patch relaxes the idle class a little, by removing the truly idle
    part (which entals a grace period with associated timer). The
    modifications make the idle class as close to zero impact as can be done
    while still guarenteeing progress. This means we can relax the root only
    criteria as well.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • These DMA drain buffer implementations in drivers are pretty horrible
    to do in terms of manipulating the scatterlist. Plus they're being
    done at least in drivers/ide and drivers/ata, so we now have code
    duplication.

    The one use case for this, as I understand it is AHCI controllers doing
    PIO mode to mmc devices but translating this to DMA at the controller
    level.

    So, what about adding a callback to the block layer that permits the
    adding of the drain buffer for the problem devices. The idea is that
    you'd do this in slave_configure after you find one of these devices.

    The beauty of doing it in the block layer is that it quietly adds the
    drain buffer to the end of the sg list, so it automatically gets mapped
    (and unmapped) without anything unusual having to be done to the
    scatterlist in driver/scsi or drivers/ata and without any alteration to
    the transfer length.

    Signed-off-by: James Bottomley
    Signed-off-by: Jens Axboe

    James Bottomley
     
  • changes to anticipatory io scheduler for io_context sharing

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • The io context sharing introduced a per-ioc spinlock, that would protect
    the cfq io context lookup. That is a regression from the original, since
    we never needed any locking there because the ioc/cic were process private.

    The cic lookup is changed from an rbtree construct to a radix tree, which
    we can then use RCU to make the reader side lockless. That is the performance
    critical path, modifying the radix tree is only done on process creation
    (when that process first does IO, actually) and on process exit (if that
    process has done IO).

    As it so happens, radix trees are also much faster for this type of
    lookup where the key is a pointer. It's a very sparse tree.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • changes in the cfq for io_context sharing

    Signed-off-by: Jens Axboe

    Nikanth Karthikesan
     
  • Detach task state from ioc, instead keep track of how many processes
    are accessing the ioc.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This is where it belongs and then it doesn't take up space for a
    process that doesn't do IO.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This patch merges complete_request() into end_that_request_last()
    for cleanup.

    complete_request() was introduced by earlier part of this patch-set,
    not to break the existing users of end_that_request_last().

    Since all users are converted to blk_end_request interfaces and
    end_that_request_last() is no longer exported, the code can be
    merged to end_that_request_last().

    Cc: Boaz Harrosh
    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda
     
  • This patch converts 'uptodate' arguments of no longer exported
    interfaces, end_that_request_first/last, to 'error', and removes
    internal conversions for it in blk_end_request interfaces.

    Also, this patch removes no longer needed end_io_error().

    Cc: Boaz Harrosh
    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda
     
  • This patch removes the following functions:
    o end_that_request_first()
    o end_that_request_chunk()
    and stops exporting the functions below:
    o end_that_request_last()

    Cc: Boaz Harrosh
    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda
     
  • This patch adds a variant of the interface, blk_end_bidi_request(),
    which completes a bidi request.

    Bidi request must be completed as a whole, both rq and rq->next_rq
    at once. So the interface has 2 arguments for completion size.

    As for ->end_io, only rq->end_io is called (rq->next_rq->end_io is not
    called). So if special completion handling is needed, the handler
    must be set to rq->end_io.
    And the handler must take care of freeing next_rq too, since
    the interface doesn't care of it if rq->end_io is not NULL.

    Cc: Boaz Harrosh
    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda
     
  • This patch adds a variant of the interface, blk_end_request_callback(),
    which has driver callback feature.

    Drivers may need to do special works between end_that_request_first()
    and end_that_request_last().
    For such drivers, blk_end_request_callback() allows it to pass
    a callback function which is called between end_that_request_first()
    and end_that_request_last().

    This interface is only for fallback of other blk_end_request interfaces.
    Drivers should avoid their tricky behaviors and use other interfaces
    as much as possible.

    Currently, only one driver, ide-cd, needs this interface.
    So this interface should/will be removed, after the driver removes
    such tricky behaviors.

    o ide-cd (cdrom_newpc_intr())
    In PIO mode, cdrom_newpc_intr() needs to defer end_that_request_last()
    until the device clears DRQ_STAT and raises an interrupt after
    end_that_request_first().
    So end_that_request_first() and end_that_request_last() are called
    separately in cdrom_newpc_intr().

    This means blk_end_request_callback() has to return without
    completing request even if no leftover in the request.
    To satisfy the requirement, callback function has return value
    so that drivers can tell blk_end_request_callback() to return
    without completing request.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda
     
  • This patch converts core parts of block layer to use blk_end_request
    interfaces. Related 'uptodate' arguments are converted to 'error'.

    'dequeue' argument was originally introduced for end_dequeued_request(),
    where no attempt should be made to dequeue the request as it's already
    dequeued.
    However, it's not necessary as it can be checked with
    list_empty(&rq->queuelist).
    (Dequeued request has empty list and queued request doesn't.)
    And it has been done in blk_end_request interfaces.

    As a result of this patch, end_queued_request() and
    end_dequeued_request() become identical. A future patch will merge
    and rename them and change users of those functions.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda
     
  • This patch adds/exports functions to get the size of request in bytes.
    They are useful because blk_end_request interfaces take bytes
    as a completed I/O size instead of sectors.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda
     
  • This patch adds 2 new interfaces for request completion:
    o blk_end_request() : called without queue lock
    o __blk_end_request() : called with queue lock held

    blk_end_request takes 'error' as an argument instead of 'uptodate',
    which current end_that_request_* take.
    The meanings of values are below and the value is used when bio is
    completed.
    0 : success
    < 0 : error

    Some device drivers call some generic functions below between
    end_that_request_{first/chunk} and end_that_request_last().
    o add_disk_randomness()
    o blk_queue_end_tag()
    o blkdev_dequeue_request()
    These are called in the blk_end_request interfaces as a part of
    generic request completion.
    So all device drivers become to call above functions.
    To decide whether to call blkdev_dequeue_request(), blk_end_request
    uses list_empty(&rq->queuelist) (blk_queued_rq() macro is added for it).
    So drivers must re-initialize it using list_init() or so before calling
    blk_end_request if drivers use it for its specific purpose.
    (Currently, there is no driver which completes request without
    re-initializing the queuelist after used it. So rq->queuelist
    can be used for the purpose above.)

    "Normal" drivers can be converted to use blk_end_request()
    in a standard way shown below.

    a) end_that_request_{chunk/first}
    spin_lock_irqsave()
    (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
    end_that_request_last()
    spin_unlock_irqrestore()
    => blk_end_request()

    b) spin_lock_irqsave()
    end_that_request_{chunk/first}
    (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
    end_that_request_last()
    spin_unlock_irqrestore()
    => spin_lock_irqsave()
    __blk_end_request()
    spin_unlock_irqsave()

    c) spin_lock_irqsave()
    (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
    end_that_request_last()
    spin_unlock_irqrestore()
    => blk_end_request() or spin_lock_irqsave()
    __blk_end_request()
    spin_unlock_irqrestore()

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda
     
  • Since the SCSI layer uses the request queues from the block layer, blktrace can
    also be used to trace the requests to all SCSI devices (like SCSI tape drives),
    not only disks. The only missing part is the ioctl interface to start and stop
    tracing.

    This patch adds the SETUP, START, STOP and TEARDOWN ioctls from blktrace to the
    sg device files. With this change, blktrace can be used for SCSI devices like
    for disks, e.g.: blktrace -d /dev/sg1 -o - | blkparse -i -

    Signed-off-by: Christof Schmitt
    Signed-off-by: Jens Axboe

    Christof Schmitt
     

26 Jan, 2008

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
    [SCSI] usbstorage: use last_sector_bug flag universally
    [SCSI] libsas: abstract STP task status into a function
    [SCSI] ultrastor: clean up inline asm warnings
    [SCSI] aic7xxx: fix firmware build
    [SCSI] aacraid: fib context lock for management ioctls
    [SCSI] ch: remove forward declarations
    [SCSI] ch: fix device minor number management bug
    [SCSI] ch: handle class_device_create failure properly
    [SCSI] NCR5380: fix section mismatch
    [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
    [SCSI] IB/iSER: add logical unit reset support
    [SCSI] don't use __GFP_DMA for sense buffers if not required
    [SCSI] use dynamically allocated sense buffer
    [SCSI] scsi.h: add macro for enclosure bit of inquiry data
    [SCSI] sd: add fix for devices with last sector access problems
    [SCSI] fix pcmcia compile problem
    [SCSI] aacraid: add Voodoo Lite class of cards.
    [SCSI] aacraid: add new driver features flags
    [SCSI] qla2xxx: Update version number to 8.02.00-k7.
    [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
    ...

    Linus Torvalds
     

25 Jan, 2008

7 commits

  • Now that the old kobject_init() function is gone, rename
    kobject_init_ng() to kobject_init() to clean up the namespace.

    Cc: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • Now that the old kobject_add() function is gone, rename kobject_add_ng()
    to kobject_add() to clean up the namespace.

    Cc: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This converts the code to use the new kobject functions, cleaning up the
    logic in doing so.

    Cc: Jens Axboe
    Cc: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This converts the code to use the new kobject functions, cleaning up the
    logic in doing so.

    Cc: Jens Axboe
    Cc: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This moves the block devices to /sys/class/block. It will create a
    flat list of all block devices, with the disks and partitions in one
    directory. For compatibility /sys/block is created and contains symlinks
    to the disks.

    /sys/class/block
    |-- sda -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
    |-- sda1 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda1
    |-- sda10 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda10
    |-- sda5 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda5
    |-- sda6 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda6
    |-- sda7 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda7
    |-- sda8 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda8
    |-- sda9 -> ../../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda9
    `-- sr0 -> ../../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0

    /sys/block/
    |-- sda -> ../devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda
    `-- sr0 -> ../devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0

    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     
  • Dynamically create the kset instead of declaring it statically. We also
    rename block_subsys to block_kset to catch all users of this symbol
    with a build error instead of an easy-to-ignore build warning.

    Cc: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • We don't need a "default" ktype for a kset. We should set this
    explicitly every time for each kset. This change is needed so that we
    can make ksets dynamic, and cleans up one of the odd, undocumented
    assumption that the kset/kobject/ktype model has.

    This patch is based on a lot of help from Kay Sievers.

    Nasty bug in the block code was found by Dave Young

    Cc: Kay Sievers
    Cc: Dave Young
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

12 Jan, 2008

2 commits

  • The purpose of this is to allow stacked alignment settings, with the
    ultimate queue alignment being set to the largest alignment requirement
    in the stack.

    The reason for this is so that the SCSI mid-layer can relax the default
    alignment requirements (which are basically causing a lot of superfluous
    copying to go on in the SG_IO interface) while allowing transports,
    devices or HBAs to add stricter limits if they need them.

    Acked-by: Jens Axboe
    Signed-off-by: James Bottomley

    James Bottomley
     
  • Currently in BSG, errors returned in req->errors aren't passed back to
    the calling programme (either via SG_IO or via read/write). Fix this,
    while preserving the SCSI convention of returning status in
    req->errors.

    Now update libsas to return errors correctly instead of to ignore
    them.

    Acked-by: FUJITA Tomonori
    Signed-off-by: James Bottomley

    James Bottomley
     

11 Jan, 2008

2 commits

  • It just inits the mutex, we can do that with DEFINE_MUTEX() instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • David Dillow reported broken blktrace timestamps. The reason
    is cpu_clock() which is not a global time source.

    Fix bkltrace timestamps by using ktime_get() like the networking
    code does for packet timestamps. This also removes a whole lot
    of complexity from bkltrace.c and shrinks the code by 500 bytes:

    text data bss dec hex filename
    2888 124 44 3056 bf0 blktrace.o.before
    2390 116 44 2550 9f6 blktrace.o.after

    Signed-off-by: Ingo Molnar
    Signed-off-by: Jens Axboe

    Ingo Molnar
     

18 Dec, 2007

4 commits


27 Nov, 2007

3 commits

  • This was a temporary debugging thing for sg chaining testing, revert
    it now as it has served its purpose.

    This reverts commit 563063a808de6b2004d5b8a09ddcb6125481f4b2.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Fix a memory leak in alloc_disk_node(). Don't forget to free 'dkstats' when the allocation of 'part' failed.

    Signed-off-by: Jerome Marchand
    Signed-off-by: Jens Axboe

    Jerome Marchand
     
  • if blktrace program segfault it will not be able
    to call BLKTRACETEARDOWN. Now if we run the blktrace
    again that would result in a failure to create the
    block/ debugfs directory.This will result
    in blk_remove_root() to be called which will set
    blk_tree_root to NULL. But the debugfs block dir
    still exist because it contain subdirectory.

    Now if we try to fix it using BLKTRACETEARDOWN
    it won't work because blk_tree_root is NULL.

    Fix the same.

    Tested as below

    root@qemu-image:/home/kvaneesh/blktrace# ./blktrace -d /dev/hdc
    Segmentation fault
    root@qemu-image:/home/kvaneesh/blktrace# ./blktrace -d /dev/hdc
    BLKTRACESETUP: No such file or directory
    Failed to start trace on /dev/hdc
    root@qemu-image:/home/kvaneesh/blktrace# ./blktrace -k /dev/hdc
    root@qemu-image:/home/kvaneesh/blktrace# ./blktrace -d /dev/hdc

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Jens Axboe

    Aneesh Kumar K.V
     

09 Nov, 2007

2 commits

  • Added blk_unplug interface, allowing all invocations of unplugs to result
    in a generated blktrace UNPLUG.

    Signed-off-by: Alan D. Brunelle
    Signed-off-by: Jens Axboe

    Alan D. Brunelle
     
  • Credit goes to juergen.kadidlo@exasol.com for diagnosing this issue
    and supplying the initial patch.

    blk_queue_invalidate_tags() must use the proper requeueing paths instead
    of open coding the re-add of the request, otherwise we bug out in rq
    accounting. Just switch to using blk_requeue_request(), that takes care
    of end-tag handling as well and also adds the blktrace REQUEUE notify
    event that is also appropriate here.

    Signed-off-by: Jens Axboe

    Jens Axboe