17 Nov, 2006

1 commit

  • ATAPI devices transfer fixed number of bytes for CDBs (12 or 16). Some
    ATAPI devices choke when shorter CDB is used and the left bytes contain
    garbage. Block SG_IO cleared left bytes but SCSI SG_IO didn't. This patch
    makes SCSI SG_IO clear it and simplify CDB clearing in block SG_IO.

    Signed-off-by: Tejun Heo
    Cc: Mathieu Fluhr
    Cc: James Bottomley
    Cc: Douglas Gilbert
    Acked-by: Jens Axboe
    Cc:
    Acked-by: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

14 Nov, 2006

1 commit

  • Contrary to what the name misleads you to believe, SG_DXFER_TO_FROM_DEV
    is really just a normal read seen from the device side.

    This patch fixes http://lkml.org/lkml/2006/10/13/100

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

04 Nov, 2006

1 commit


01 Nov, 2006

2 commits

  • In very rare circumstances would we be pruning a merged request and at
    the same time delete the implicated cfqq from the rr_list, and not readd
    it when the merged request got added. This could cause io stalls until
    that process issued io again.

    Fix it up by putting the rr_list add handling into cfq_add_rq_rb(),
    identical to how pruning is handled in cfq_del_rq_rb(). This fixes a
    hang reproducible with fsx-linux.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     
  • Partitions are not limited to live within a device. So we should range
    check after partition mapping.

    Note that 'maxsector' was being used for two different things. I have
    split off the second usage into 'old_sector' so that maxsector can be still
    be used for it's primary usage later in the function.

    Cc: Jens Axboe
    Signed-off-by: Neil Brown
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

31 Oct, 2006

2 commits

  • When the ioprio code recently got juggled a bit, a bug was introduced.
    changed_ioprio() is no longer called with interrupts disabled, so using
    plain spin_lock() on the queue_lock is a bug.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     
  • If cfq_set_request() is called for a new process AND a non-fs io
    request (so that __GFP_WAIT may not be set), cfq_cic_link() may
    use spin_lock_irq() and spin_unlock_irq() with interrupts already
    disabled.

    Fix is to always use irq safe locking in cfq_cic_link()

    Acked-By: Arjan van de Ven
    Acked-by: Ingo Molnar
    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

21 Oct, 2006

2 commits

  • Separate out the concept of "queue congestion" from "backing-dev congestion".
    Congestion is a backing-dev concept, not a queue concept.

    The blk_* congestion functions are retained, as wrappers around the core
    backing-dev congestion functions.

    This proper layering is needed so that NFS can cleanly use the congestion
    functions, and so that CONFIG_BLOCK=n actually links.

    Cc: "Thomas Maier"
    Cc: "Jens Axboe"
    Cc: Trond Myklebust
    Cc: David Howells
    Cc: Peter Osterlund
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Export the clear_queue_congested() and set_queue_congested() functions
    located in ll_rw_blk.c

    The functions are renamed to blk_clear_queue_congested() and
    blk_set_queue_congested().

    (needed in the pktcdvd driver's bio write congestion control)

    Signed-off-by: Thomas Maier
    Cc: Peter Osterlund
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Maier
     

12 Oct, 2006

2 commits


05 Oct, 2006

1 commit


03 Oct, 2006

1 commit

  • Export blkdev_driver_ioctl for device-mapper.

    If we get as far as the device-mapper ioctl handler, we know the ioctl is not
    a standard block layer BLK* one, so we don't need to check for them a second
    time and can call blkdev_driver_ioctl() directly.

    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alasdair G Kergon
     

01 Oct, 2006

27 commits

  • All on stack DECLARE_COMPLETIONs should be replaced by:
    DECLARE_COMPLETION_ONSTACK

    Signed-off-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • It's too easy for people to shoot themselves in the foot, and it
    only makes sense for embedded folks anyway.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • If we share the tag map between two or more queues, then we cannot
    use __set_bit() to set the bit. In fact we need to make sure we
    atomically acquire this tag, so loop using test_and_set_bit() to
    protect from that.

    Noticed by Mike Christie

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • As people often look for the copyright in files to see who to mail,
    update the link to a neutral one.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Make it possible to disable the block layer. Not all embedded devices require
    it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
    the block layer to be present.

    This patch does the following:

    (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
    support.

    (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
    an item that uses the block layer. This includes:

    (*) Block I/O tracing.

    (*) Disk partition code.

    (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

    (*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
    block layer to do scheduling. Some drivers that use SCSI facilities -
    such as USB storage - end up disabled indirectly from this.

    (*) Various block-based device drivers, such as IDE and the old CDROM
    drivers.

    (*) MTD blockdev handling and FTL.

    (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
    taking a leaf out of JFFS2's book.

    (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
    linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
    however, still used in places, and so is still available.

    (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
    parts of linux/fs.h.

    (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

    (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

    (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
    is not enabled.

    (*) fs/no-block.c is created to hold out-of-line stubs and things that are
    required when CONFIG_BLOCK is not set:

    (*) Default blockdev file operations (to give error ENODEV on opening).

    (*) Makes some /proc changes:

    (*) /proc/devices does not list any blockdevs.

    (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

    (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

    (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
    given command other than Q_SYNC or if a special device is specified.

    (*) In init/do_mounts.c, no reference is made to the blockdev routines if
    CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

    (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
    error ENOSYS by way of cond_syscall if so).

    (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
    CONFIG_BLOCK is not set, since they can't then happen.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     
  • This patch kills a few lines of code in blktrace by making use of
    on_each_cpu().

    Signed-off-by: Martin Peschke
    Signed-off-by: Jens Axboe

    Martin Peschke
     
  • We don't need to disable irqs to clear current->io_context, it is protected
    by ->alloc_lock. Even IF it was possible to submit I/O from IRQ on behalf of
    current this irq_disable() can't help: current_io_context() will re-instantiate
    ->io_context after irq_enable().

    We don't need task_lock() or local_irq_disable() to clear ioc->task. This can't
    prevent other CPUs from playing with our io_context anyway.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Jens Axboe

    Oleg Nesterov
     
  • Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Give meta data reads preference over regular reads, as the process
    often needs to get that out of the way to do the io it was actually
    interested in.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We can use this information for making more intelligent priority
    decisions, and it will also be useful for blktrace.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • It can make sense to set read-ahead larger than a single request.
    We should not be enforcing such policy on the user. Additionally,
    using the BLKRASET ioctl doesn't impose such a restriction. So
    additionally we now expose identical behaviour through the two.

    Issue also reported by Anton

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Don't touch the current queues, just make sure that the wanted queue
    is selected next. Simplifies the logic.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • CFQ implements this on its own now, but it's really block layer
    knowledge. Tells a device queue to start dispatching requests to
    the driver, taking care to unplug if needed. Also fixes the issue
    where as/cfq will invoke a stopped queue, which we really don't
    want.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • No point in having a place holder list just for empty queues, so remove
    it. It's not used for anything other than to keep ->cfq_list busy.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently it scales with number of processes in that priority group,
    which is potentially not very nice as it's called quite often.
    Basically we always need to do tail inserts, except for the case of a
    new process. So just mark/detect a queue as such.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Some were kmalloc_node(), some were still kmalloc(). Change them all to
    kmalloc_node().

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Kill a few inlines that bring in too much code to more than one location
    Shrinks kernel text by about 300 bytes on 32-bit x86.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • It's ok if the read path is a lot more costly, as long as inc/dec is
    really cheap. The inc/dec will happen for each created/freed io context,
    while the reading only happens when a disk queue exits.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • It's ok if the read path is a lot more costly, as long as inc/dec is
    really cheap. The inc/dec will happen for each created/freed io context,
    while the reading only happens when a disk queue exits.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • cfq_exit_lock is protecting two things now:

    - The per-ioc rbtree of cfq_io_contexts

    - The per-cfqd linked list of cfq_io_contexts

    The per-cfqd linked list can be protected by the queue lock, as it is (by
    definition) per cfqd as the queue lock is.

    The per-ioc rbtree is mainly used and updated by the process itself only.
    The only outside use is the io priority changing. If we move the
    priority changing to not browsing the rbtree, we can remove any locking
    from the rbtree updates and lookup completely. Let the sys_ioprio syscall
    just mark processes as having the iopriority changed and lazily update
    the private cfq io contexts the next time io is queued, and we can
    remove this locking as well.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • A collection of little fixes and cleanups:

    - We don't use the 'queued' sysfs exported attribute, since the
    may_queue() logic was rewritten. So kill it.

    - Remove dead defines.

    - cfq_set_active_queue() can be rewritten cleaner with else if conditions.

    - Several places had cfq_exit_cfqq() like logic, abstract that out and
    use that.

    - Annotate the cfqq kmem_cache_alloc() so the allocator knows that this
    is a repeat allocation if it fails with __GFP_WAIT set. Allows the
    allocator to start freeing some memory, if needed. CFQ already loops for
    this condition, so might as well pass the hint down.

    - Remove cfqd->rq_starved logic. It's not needed anymore after we dropped
    the crq allocation in cfq_set_request().

    - Remove uneeded parameter passing.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • - Don't assign variables that are only used once.

    - Kill spin_lock() prefetching, it's opportunistic at best.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • It's not needed for anything, so kill the bio passing.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • After Christophs SCSI change, the only usage left is RQ_ACTIVE
    and RQ_INACTIVE. The block layer sets RQ_INACTIVE right before freeing
    the request, so any check for RQ_INACTIVE in a driver is a bug and
    indicates use-after-free.

    So kill/clean the remaining users, straight forward.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • It is always identical to &q->rq, and we only use it for detecting
    whether this request came out of our mempool or not. So replace it
    with an additional ->flags bit flag.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • As the comments indicates in blkdev.h, we can fold it into ->end_io_data
    usage as that is really what ->waiting is. Fixup the users of
    blk_end_sync_rq().

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Get rid of the as_rq request type. With the added elevator_private2, we
    have enough room in struct request to get rid of any arq allocation/free
    for each request.

    Signed-off-by: Jens Axboe
    Signed-off-by: Nick Piggin

    Jens Axboe