29 May, 2011

1 commit

  • Replace the arbitrary calculation of an initial io struct mempool size
    with a constant.

    The code calculated the number of reserved structures based on the request
    size and used a "magic" multiplication constant of 4. This patch changes
    it to reserve a fixed number - itself still chosen quite arbitrarily.
    Further testing might show if there is a better number to choose.

    Note that if there is no memory pressure, we can still allocate an
    arbitrary number of "struct io" structures. One structure is enough to
    process the whole request.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

10 Mar, 2011

1 commit

  • With the plugging now being explicitly controlled by the
    submitter, callers need not pass down unplugging hints
    to the block layer. If they want to unplug, it's because they
    manually plugged on their own - in which case, they should just
    unplug at will.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

10 Sep, 2010

1 commit

  • This patch converts bio-based dm to support REQ_FLUSH/FUA instead of
    now deprecated REQ_HARDBARRIER.

    * -EOPNOTSUPP handling logic dropped.

    * Preflush is handled as before but postflush is dropped and replaced
    with passing down REQ_FUA to member request_queues. This replaces
    one array wide cache flush w/ member specific FUA writes.

    * __split_and_process_bio() now calls __clone_and_map_flush() directly
    for flushes and guarantees all FLUSH bio's going to targets are zero
    ` length.

    * It's now guaranteed that all FLUSH bio's which are passed onto dm
    targets are zero length. bio_empty_barrier() tests are replaced
    with REQ_FLUSH tests.

    * Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes.

    * Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely
    enough to be marked with unlikely().

    * Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue
    doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA
    capability.

    * Request based dm isn't converted yet. dm_init_request_based_queue()
    resets flush support to 0 for now. To avoid disturbing request
    based dm code, dm->flush_error is added for bio based dm while
    requested based dm continues to use dm->barrier_error.

    Lightly tested linear, stripe, raid1, snap and crypt targets. Please
    proceed with caution as I'm not familiar with the code base.

    Signed-off-by: Tejun Heo
    Cc: dm-devel@redhat.com
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     

08 Aug, 2010

1 commit

  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

11 Dec, 2009

3 commits

  • Accept empty barriers in dm-io.

    dm-io will process empty write barrier requests just like the other
    read/write requests.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Remove the hack where we allocate an extra bi_io_vec to store additional
    private data. This hack prevents us from supporting barriers in
    dm-raid1 without first making another little block layer change.
    Instead of doing that, this patch eliminates the bi_io_vec abuse by
    storing the region number directly in the low bits of bi_private.

    We need to store two things for each bio, the pointer to the main io
    structure and, if parallel writes were requested, an index indicating
    which of these writes this bio belongs to. There can be at most
    BITS_PER_LONG regions - 32 or 64.

    The index (region number) was stored in the last (hidden) bio vector and
    the pointer to struct io was stored in bi_private.

    This patch now aligns "struct io" on BITS_PER_LONG bytes and stores the
    region number in the low BITS_PER_LONG bits of bi_private.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Allocate "struct io" from a slab.

    This patch changes dm-io, so that "struct io" is allocated from a slab cache.
    It used to be allocated with kmalloc. Allocating from a slab will be needed
    for the next patch, because it requires a special alignment of "struct io"
    and kmalloc cannot meet this alignment.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

22 Jun, 2009

2 commits

  • If -EOPNOTSUPP was returned and the request was a barrier request, retry it
    without barrier.

    Retry all regions for now. Barriers are submitted only for one-region requests,
    so it doesn't matter. (In the future, retries can be limited to the actual
    regions that failed.)

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Add another field, eopnotsupp_bits. It is subset of error_bits, representing
    regions that returned -EOPNOTSUPP. (The bit is set in both error_bits and
    eopnotsupp_bits).

    This value will be used in further patches.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

03 Apr, 2009

1 commit

  • If someone sends signal to a process performing synchronous dm-io call,
    the kernel may crash.

    The function sync_io attempts to exit with -EINTR if it has pending signal,
    however the structure "io" is allocated on stack, so already submitted io
    requests end up touching unallocated stack space and corrupting kernel memory.

    sync_io sets its state to TASK_UNINTERRUPTIBLE, so the signal can't break out
    of io_schedule() --- however, if the signal was pending before sync_io entered
    while (1) loop, the corruption of kernel memory will happen.

    There is no way to cancel in-progress IOs, so the best solution is to ignore
    signals at this point.

    Cc: stable@kernel.org
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

17 Mar, 2009

1 commit

  • dm-io calls bio_get_nr_vecs to get the maximum number of pages to use
    for a given device. It allocates one additional bio_vec to use
    internally but failed to respect BIO_MAX_PAGES, so fix this.

    This was the likely cause of:
    https://bugzilla.redhat.com/show_bug.cgi?id=173153

    Cc: stable@kernel.org
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

18 Feb, 2009

1 commit


29 Dec, 2008

1 commit

  • Instead of having a global bio slab cache, add a reference to one
    in each bio_set that is created. This allows for personalized slabs
    in each bio_set, so that they can have bios of different sizes.

    This means we can personalize the bios we return. File systems may
    want to embed the bio inside another structure, to avoid allocation
    more items (and stuffing them in ->bi_private) after the get a bio.
    Or we may want to embed a number of bio_vecs directly at the end
    of a bio, to avoid doing two allocations to return a bio. This is now
    possible.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

22 Oct, 2008

1 commit


25 Apr, 2008

4 commits

  • Remove an avoidable 3ms delay on some dm-raid1 and kcopyd I/O.

    It is specified that any submitted bio without BIO_RW_SYNC flag may plug the
    queue (i.e. block the requests from being dispatched to the physical device).

    The queue is unplugged when the caller calls blk_unplug() function. Usually, the
    sequence is that someone calls submit_bh to submit IO on a buffer. The IO plugs
    the queue and waits (to be possibly joined with other adjacent bios). Then, when
    the caller calls wait_on_buffer(), it unplugs the queue and submits the IOs to
    the disk.

    This was happenning:

    When doing O_SYNC writes, function fsync_buffers_list() submits a list of
    bios to dm_raid1, the bios are added to dm_raid1 write queue and kmirrord is
    woken up.

    fsync_buffers_list() calls wait_on_buffer(). That unplugs the queue, but
    there are no bios on the device queue as they are still in the dm_raid1 queue.

    wait_on_buffer() starts waiting until the IO is finished.

    kmirrord is scheduled, kmirrord takes bios and submits them to the devices.

    The submitted bio plugs the harddisk queue but there is no one to unplug it.
    (The process that called wait_on_buffer() is already sleeping.)

    So there is a 3ms timeout, after which the queues on the harddisks are
    unplugged and requests are processed.

    This 3ms timeout meant that in certain workloads (e.g. O_SYNC, 8kb writes),
    dm-raid1 is 10 times slower than md raid1.

    Every time we submit something asynchronously via dm_io, we must unplug the
    queue actually to send the request to the device.

    This patch adds an unplug call to kmirrord - while processing requests, it keeps
    the queue plugged (so that adjacent bios can be merged); when it finishes
    processing all the bios, it unplugs the queue to submit the bios.

    It also fixes kcopyd which has the same potential problem. All kcopyd requests
    are submitted with BIO_RW_SYNC.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon
    Acked-by: Jens Axboe

    Mikulas Patocka
     
  • Publish the dm-io, dm-log and dm-kcopyd headers in include/linux.

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     
  • Clean up the dm-io interface to prepare for publishing it in include/linux.

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Alasdair G Kergon

    Heinz Mauelshagen
     
  • Rename 'error' to 'error_bits' for clarity.

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     

29 Mar, 2008

1 commit


10 Oct, 2007

1 commit

  • As bi_end_io is only called once when the reqeust is complete,
    the 'size' argument is now redundant. Remove it.

    Now there is no need for bio_endio to subtract the size completed
    from bi_size. So don't do that either.

    While we are at it, change bi_end_io to return void.

    Signed-off-by: Neil Brown
    Signed-off-by: Jens Axboe

    NeilBrown
     

13 Jul, 2007

1 commit

  • bio_alloc_bioset() will return NULL if 'num_vecs' is too large.
    Use bio_get_nr_vecs() to get estimation of maximum number.

    Cc: stable@kernel.org
    Signed-off-by: "Jun'ichi Nomura"
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Linus Torvalds

    Jun'ichi Nomura
     

10 May, 2007

4 commits

  • Remove old dm-io interface.

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milan Broz
     
  • Add a new API to dm-io.c that uses a private mempool and bio_set for each
    client.

    The new functions to use are dm_io_client_create(), dm_io_client_destroy(),
    dm_io_client_resize() and dm_io().

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Alasdair G Kergon
    Cc: Milan Broz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heinz Mauelshagen
     
  • Introduce struct dm_io_client to prepare for per-client mempools and bio_sets.

    Temporary functions bios() and io_pool() choose between the per-client
    structures and the global ones so the old and new interfaces can co-exist.

    Make error_bits optional.

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Alasdair G Kergon
    Cc: Milan Broz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heinz Mauelshagen
     
  • Delay decrementing the 'struct io' reference count until after the bio has
    been freed so that a bio destructor function may reference it. Required by a
    later patch.

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Alasdair G Kergon
    Cc: Milan Broz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heinz Mauelshagen
     

30 Apr, 2007

1 commit

  • Currently we scale the mempool sizes depending on memory installed
    in the machine, except for the bio pool itself which sits at a fixed
    256 entry pre-allocation.

    There's really no point in "optimizing" this OOM path, we just need
    enough preallocated to make progress. A single unit is enough, lets
    scale it down to 2 just to be on the safe side.

    This patch saves ~150kb of pinned kernel memory on a 32-bit box.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

09 Dec, 2006

1 commit

  • The existing code allocates an extra slot in bi_io_vec[] and uses it to store
    the region number.

    This patch hides the extra slot from bio_add_page() so the region number can't
    get overwritten.

    Also remove a hard-coded SECTOR_SHIFT and fix a typo in a comment.

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Alasdair G Kergon
    Cc: Milan Broz
    Cc: dm-devel@redhat.com
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heinz Mauelshagen
     

27 Mar, 2006

1 commit


09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

08 Sep, 2005

1 commit

  • Jens:

    ->bi_set is totally unnecessary bloat of struct bio. Just define a proper
    destructor for the bio and it already knows what bio_set it belongs too.

    Peter:

    Fixed the bugs.

    Signed-off-by: Jens Axboe
    Signed-off-by: Peter Osterlund
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Osterlund
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds