11 Dec, 2006

1 commit

  • The only time it is safe to call aio_complete() is when the ->ki_retry
    function returns -EIOCBQUEUED to the AIO core. direct_io_worker() has
    historically done this by relying on its caller to translate positive return
    codes into -EIOCBQUEUED for the aio case. It did this by trying to keep
    conditionals in sync. direct_io_worker() knew when finished_one_bio() was
    going to call aio_complete(). It would reverse the test and wait and free the
    dio in the cases it thought that finished_one_bio() wasn't going to.

    Not surprisingly, it ended up getting it wrong. 'ret' could be a negative
    errno from the submission path but it failed to communicate this to
    finished_one_bio(). direct_io_worker() would return < 0, it's callers
    wouldn't raise -EIOCBQUEUED, and aio_complete() would be called. In the
    future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
    would be called for a second time which can manifest as an oops.

    The previous cleanups have whittled the sync and async completion paths down
    to the point where we can collapse them and clearly reassert the invariant
    that we must only call aio_complete() after returning -EIOCBQUEUED.
    direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
    drop the dio refcount and the aio bio completion path will only call
    aio_complete() when it is the last to drop the dio refcount.
    direct_io_worker() can ensure that it is the last to drop the reference count
    by waiting for bios to drain. It does this for sync ops, of course, and for
    partial dio writes that must fall back to buffered and for aio ops that saw
    errors during submission.

    This means that operations that end up waiting, even if they were issued as
    aio ops, will not call aio_complete() from dio. Instead we return the return
    code of the operation and let the aio core call aio_complete(). This is
    purposely done to fix a bug where AIO DIO file extensions would call
    aio_complete() before their callers have a chance to update i_size.

    Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
    no longer have to translate for it. XFS needs to be careful not to free
    resources that will be used during AIO completion if -EIOCBQUEUED is returned.
    We maintain the previous behaviour of trying to write fs metadata for O_SYNC
    aio+dio writes.

    Signed-off-by: Zach Brown
    Cc: Badari Pulavarty
    Cc: Suparna Bhattacharya
    Acked-by: Jeff Moyer
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     

09 Dec, 2006

1 commit


08 Dec, 2006

2 commits


22 Nov, 2006

1 commit


21 Nov, 2006

2 commits


11 Nov, 2006

7 commits


21 Oct, 2006

1 commit

  • Separate out the concept of "queue congestion" from "backing-dev congestion".
    Congestion is a backing-dev concept, not a queue concept.

    The blk_* congestion functions are retained, as wrappers around the core
    backing-dev congestion functions.

    This proper layering is needed so that NFS can cleanly use the congestion
    functions, and so that CONFIG_BLOCK=n actually links.

    Cc: "Thomas Maier"
    Cc: "Jens Axboe"
    Cc: Trond Myklebust
    Cc: David Howells
    Cc: Peter Osterlund
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

04 Oct, 2006

1 commit


01 Oct, 2006

4 commits

  • This patch cleans up generic_file_*_read/write() interfaces. Christoph
    Hellwig gave me the idea for this clean ups.

    In a nutshell, all filesystems should set .aio_read/.aio_write methods and use
    do_sync_read/ do_sync_write() as their .read/.write methods. This allows us
    to cleanup all variants of generic_file_* routines.

    Final available interfaces:

    generic_file_aio_read() - read handler
    generic_file_aio_write() - write handler
    generic_file_aio_write_nolock() - no lock write handler

    __generic_file_aio_write_nolock() - internal worker routine

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • This patch removes readv() and writev() methods and replaces them with
    aio_read()/aio_write() methods.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • This patch vectorizes aio_read() and aio_write() methods to prepare for
    collapsing all aio & vectored operations into one interface - which is
    aio_read()/aio_write().

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Christoph Hellwig
    Cc: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Make it possible to disable the block layer. Not all embedded devices require
    it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
    the block layer to be present.

    This patch does the following:

    (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
    support.

    (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
    an item that uses the block layer. This includes:

    (*) Block I/O tracing.

    (*) Disk partition code.

    (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

    (*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
    block layer to do scheduling. Some drivers that use SCSI facilities -
    such as USB storage - end up disabled indirectly from this.

    (*) Various block-based device drivers, such as IDE and the old CDROM
    drivers.

    (*) MTD blockdev handling and FTL.

    (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
    taking a leaf out of JFFS2's book.

    (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
    linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
    however, still used in places, and so is still available.

    (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
    parts of linux/fs.h.

    (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

    (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

    (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
    is not enabled.

    (*) fs/no-block.c is created to hold out-of-line stubs and things that are
    required when CONFIG_BLOCK is not set:

    (*) Default blockdev file operations (to give error ENODEV on opening).

    (*) Makes some /proc changes:

    (*) /proc/devices does not list any blockdevs.

    (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

    (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

    (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
    given command other than Q_SYNC or if a special device is specified.

    (*) In init/do_mounts.c, no reference is made to the blockdev routines if
    CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

    (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
    error ENOSYS by way of cond_syscall if so).

    (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
    CONFIG_BLOCK is not set, since they can't then happen.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

29 Sep, 2006

1 commit


28 Sep, 2006

19 commits