01 Nov, 2011

1 commit

  • This patch introduces dm_kcopyd_zero() to make it easy to use
    kcopyd to write zeros into the requested areas instead
    instead of copying. It is implemented by passing a NULL
    copying source to dm_kcopyd_copy().

    The forthcoming thin provisioning target uses this.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

24 Oct, 2011

1 commit

  • Fix memory leak introduced by commit a6e50b409d3f9e0833e69c3c9cca822e8fa4adbb
    (dm snapshot: skip reading origin when overwriting complete chunk).

    When allocating a set of jobs from kc->job_pool, job->master_job must be
    set (to point to itself) so that the mempool item gets freed when the
    master_job completes.

    master_job was introduced by commit c6ea41fbbe08f270a8edef99dc369faf809d1bd6
    (dm kcopyd: preallocate sub jobs to avoid deadlock)

    Reported-by: Michael Leun
    Cc: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     

02 Aug, 2011

3 commits

  • If we write a full chunk in the snapshot, skip reading the origin device
    because the whole chunk will be overwritten anyway.

    This patch changes the snapshot write logic when a full chunk is written.
    In this case:
    1. allocate the exception
    2. dispatch the bio (but don't report the bio completion to device mapper)
    3. write the exception record
    4. report bio completed

    Callbacks must be done through the kcopyd thread, because callbacks must not
    race with each other. So we create two new functions:

    dm_kcopyd_prepare_callback: allocate a job structure and prepare the callback.
    (This function must not be called from interrupt context.)

    dm_kcopyd_do_callback: submit callback.
    (This function may be called from interrupt context.)

    Performance test (on snapshots with 4k chunk size):
    without the patch:
    non-direct-io sequential write (dd): 17.7MB/s
    direct-io sequential write (dd): 20.9MB/s
    non-direct-io random write (mkfs.ext2): 0.44s

    with the patch:
    non-direct-io sequential write (dd): 26.5MB/s
    direct-io sequential write (dd): 33.2MB/s
    non-direct-io random write (mkfs.ext2): 0.27s

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • The nr_pages field in struct kcopyd_job is only used temporarily in
    run_pages_job() to count the number of required pages.
    We can use a local variable instead.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • The offset field in struct kcopyd_job is always zero so remove it.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

29 May, 2011

8 commits

  • Return client directly from dm_kcopyd_client_create, not through a
    parameter, making it consistent with dm_io_client_create.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Reserve just the minimum of pages needed to process one job.

    Because we allocate pages from page allocator, we don't need to reserve
    a large number of pages. The maximum job size is SUB_JOB_SIZE and we
    calculate the number of reserved pages based on this.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Replace the arbitrary calculation of an initial io struct mempool size
    with a constant.

    The code calculated the number of reserved structures based on the request
    size and used a "magic" multiplication constant of 4. This patch changes
    it to reserve a fixed number - itself still chosen quite arbitrarily.
    Further testing might show if there is a better number to choose.

    Note that if there is no memory pressure, we can still allocate an
    arbitrary number of "struct io" structures. One structure is enough to
    process the whole request.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • This patch changes dm-kcopyd so that it allocates pages from the main
    page allocator with __GFP_NOWARN | __GFP_NORETRY flags (so that it can
    fail in case of memory pressure). If the allocation fails, dm-kcopyd
    allocates pages from its own reserve.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Introduce a parameter for gfp flags to alloc_pl() for use in following
    patches.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Remove the spinlock protecting the pages allocation. The spinlock is only
    taken on initialization or from single-threaded workqueue. Therefore, the
    spinlock is useless.

    The spinlock is taken in kcopyd_get_pages and kcopyd_put_pages.

    kcopyd_get_pages is only called from run_pages_job, which is only
    called from process_jobs called from do_work.

    kcopyd_put_pages is called from client_alloc_pages (which is initialization
    function) or from run_complete_job. run_complete_job is only called from
    process_jobs called from do_work.

    Another spinlock, kc->job_lock is taken each time someone pushes or pops
    some work for the worker thread. Once we take kc->job_lock, we
    guarantee that any written memory is visible to the other CPUs.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • There's a possible theoretical deadlock in dm-kcopyd because multiple
    allocations from the same mempool are required to finish a request.
    Avoid this by preallocating sub jobs.

    There is a mempool of 512 entries. Each request requires up to 9
    entries from the mempool. If we have at least 57 concurrent requests
    running, the mempool may overflow and mempool allocations may start
    blocking until another entry is freed to the mempool. Because the same
    thread is used to free entries to the mempool and allocate entries from
    the mempool, this may result in a deadlock.

    This patch changes it so that one mempool entry contains all 9 "struct
    kcopyd_job" required to fulfill the whole request. The allocation is
    done only once in dm_kcopyd_copy and no further mempool allocations are
    done during request processing.

    If dm_kcopyd_copy is not run in the completion thread, this
    implementation is deadlock-free.

    MIN_JOBS needs reducing accordingly and we've chosen to reduce it
    further to 8.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Don't split SUB_JOB_SIZE jobs

    If the job size equals SUB_JOB_SIZE, there is no point in splitting it.
    Splitting it just unnecessarily wastes time, because the split job size
    is SUB_JOB_SIZE too.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

10 Mar, 2011

2 commits

  • With the plugging now being explicitly controlled by the
    submitter, callers need not pass down unplugging hints
    to the block layer. If they want to unplug, it's because they
    manually plugged on their own - in which case, they should just
    unplug at will.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     

14 Jan, 2011

4 commits

  • kmirrord_wq, kcopyd_work and md->wq are created per dm instance and
    serve only a single work item from the dm instance, so non-reentrant
    workqueues would provide the same ordering guarantees as ordered ones
    while allowing CPU affinity and use of the workqueues for other
    purposes. Switch them to non-reentrant workqueues.

    Signed-off-by: Tejun Heo
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Tejun Heo
     
  • Convert all create[_singlethread]_work() users to the new
    alloc[_ordered]_workqueue(). This conversion is mechanical and
    doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Tejun Heo
     
  • Make kcopyd merge more I/O requests by using device unplugging.

    Without this patch, each I/O request is dispatched separately to the device.
    If the device supports tagged queuing, there are many small requests sent
    to the device. To improve performance, this patch will batch as many requests
    as possible, allowing the queue to merge consecutive requests, and send them
    to the device at once.

    In my tests (15k SCSI disk), this patch improves sequential write throughput:

    Sequential write throughput (chunksize of 4k, 32k, 512k)
    unpatched: 15.2, 18.5, 17.5 MB/s
    patched: 14.4, 22.6, 23.0 MB/s

    In most common uses (snapshot or two-way mirror), kcopyd is only used for
    two devices, one for reading and the other for writing, thus this optimization
    is implemented only for two devices. The optimization may be extended to n-way
    mirrors with some code complexity increase.

    We keep track of two block devices to unplug (one for read and the
    other for write) and unplug them when exiting "do_work" thread. If
    there are more devices used (in theory it could happen, in practice it
    is rare), we unplug immediately.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Remove the REQ_SYNC flag to improve write throughput when writing
    to the origin with a snapshot on the same device (using the CFQ I/O
    scheduler).

    Sequential write throughput (chunksize of 4k, 32k, 512k)
    unpatched: 8.5, 8.6, 9.3 MB/s
    patched: 15.2, 18.5, 17.5 MB/s

    Snapshot exception reallocations are triggered by writes that are
    usually async, so mark the associated dm_io_request as async as well.
    This helps when using the CFQ I/O scheduler because it has separate
    queues for sync and async I/O. Async is optimized for throughput; sync
    for latency. With this change we're consciously favoring throughput over
    latency.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

08 Aug, 2010

1 commit

  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

11 Dec, 2009

1 commit

  • dm-kcopyd: accept zero-size jobs

    This patch changes dm-kcopyd so that it accepts zero-size jobs and completes
    them immediatelly via its completion thread.

    It is needed for multisnapshots snapshot resizing. When we are writing to
    a chunk beyond origin end, no copying is done. To simplify the code, we submit
    an empty request to kcopyd and let kcopyd complete it. If we didn't submit
    a request to kcopyd and called the completion routine immediatelly, it would
    violate the principle that completion is called only from one thread and
    it would need additional locking.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

09 Apr, 2009

2 commits

  • If the thread calling dm_kcopyd_copy is delayed due to scheduling inside
    split_job/segment_complete and the subjobs complete before the loop in
    split_job completes, the kcopyd callback could be invoked from the
    thread that called dm_kcopyd_copy instead of the kcopyd workqueue.

    dm_kcopyd_copy -> split_job -> segment_complete -> job->fn()

    Snapshots depend on the fact that callbacks are called from the singlethreaded
    kcopyd workqueue and expect that there is no racing between individual
    callbacks. The racing between callbacks can lead to corruption of exception
    store and it can also mean that exception store callbacks are called twice
    for the same exception - a likely reason for crashes reported inside
    pending_complete() / remove_exception().

    This patch fixes two problems:

    1. job->fn being called from the thread that submitted the job (see above).

    - Fix: hand over the completion callback to the kcopyd thread.

    2. job->fn(read_err, write_err, job->context); in segment_complete
    reports the error of the last subjob, not the union of all errors.

    - Fix: pass job->write_err to the callback to report all error bits
    (it is done already in run_complete_job)

    Cc: stable@kernel.org
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Use a variable in segment_complete() to point to the dm_kcopyd_client
    struct and only release job->pages in run_complete_job() if any are
    defined. These changes are needed by the next patch.

    Cc: stable@kernel.org
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

18 Feb, 2009

1 commit


22 Oct, 2008

2 commits

  • Change #include "dm.h" to #include in all targets.
    Targets should not need direct access to internal DM structures.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Write throughput to LVM snapshot origin volume is an order
    of magnitude slower than those to LV without snapshots or
    snapshot target volumes, especially in the case of sequential
    writes with O_SYNC on.

    The following patch originally written by Kevin Jamieson and
    Jan Blunck and slightly modified for the current RCs by myself
    tries to improve the performance by modifying the behaviour
    of kcopyd, so that it pushes back an I/O job to the head of
    the job queue instead of the tail as process_jobs() currently
    does when it has to wait for free pages. This way, write
    requests aren't shuffled to cause extra seeks.

    I tested the patch against 2.6.27-rc5 and got the following results.
    The test is a dd command writing to snapshot origin followed by fsync
    to the file just created/updated. A couple of filesystem benchmarks
    gave me similar results in case of sequential writes, while random
    writes didn't suffer much.

    dd if=/dev/zero of= bs=4096 count=...
    [conv=notrunc when updating]

    1) linux 2.6.27-rc5 without the patch, write to snapshot origin,
    average throughput (MB/s)
    10M 100M 1000M
    create,dd 511.46 610.72 11.81
    create,dd+fsync 7.10 6.77 8.13
    update,dd 431.63 917.41 12.75
    update,dd+fsync 7.79 7.43 8.12

    compared with write throughput to LV without any snapshots,
    all dd+fsync and 1000 MiB writes perform very poorly.

    10M 100M 1000M
    create,dd 555.03 608.98 123.29
    create,dd+fsync 114.27 72.78 76.65
    update,dd 152.34 1267.27 124.04
    update,dd+fsync 130.56 77.81 77.84

    2) linux 2.6.27-rc5 with the patch, write to snapshot origin,
    average throughput (MB/s)

    10M 100M 1000M
    create,dd 537.06 589.44 46.21
    create,dd+fsync 31.63 29.19 29.23
    update,dd 487.59 897.65 37.76
    update,dd+fsync 34.12 30.07 26.85

    Although still not on par with plain LV performance -
    cannot be avoided because it's copy on write anyway -
    this simple patch successfully improves throughtput
    of dd+fsync while not affecting the rest.

    Signed-off-by: Jan Blunck
    Signed-off-by: Kazuo Ito
    Signed-off-by: Alasdair G Kergon
    Cc: stable@kernel.org

    Kazuo Ito
     

25 Apr, 2008

3 commits

  • Remove an avoidable 3ms delay on some dm-raid1 and kcopyd I/O.

    It is specified that any submitted bio without BIO_RW_SYNC flag may plug the
    queue (i.e. block the requests from being dispatched to the physical device).

    The queue is unplugged when the caller calls blk_unplug() function. Usually, the
    sequence is that someone calls submit_bh to submit IO on a buffer. The IO plugs
    the queue and waits (to be possibly joined with other adjacent bios). Then, when
    the caller calls wait_on_buffer(), it unplugs the queue and submits the IOs to
    the disk.

    This was happenning:

    When doing O_SYNC writes, function fsync_buffers_list() submits a list of
    bios to dm_raid1, the bios are added to dm_raid1 write queue and kmirrord is
    woken up.

    fsync_buffers_list() calls wait_on_buffer(). That unplugs the queue, but
    there are no bios on the device queue as they are still in the dm_raid1 queue.

    wait_on_buffer() starts waiting until the IO is finished.

    kmirrord is scheduled, kmirrord takes bios and submits them to the devices.

    The submitted bio plugs the harddisk queue but there is no one to unplug it.
    (The process that called wait_on_buffer() is already sleeping.)

    So there is a 3ms timeout, after which the queues on the harddisks are
    unplugged and requests are processed.

    This 3ms timeout meant that in certain workloads (e.g. O_SYNC, 8kb writes),
    dm-raid1 is 10 times slower than md raid1.

    Every time we submit something asynchronously via dm_io, we must unplug the
    queue actually to send the request to the device.

    This patch adds an unplug call to kmirrord - while processing requests, it keeps
    the queue plugged (so that adjacent bios can be merged); when it finishes
    processing all the bios, it unplugs the queue to submit the bios.

    It also fixes kcopyd which has the same potential problem. All kcopyd requests
    are submitted with BIO_RW_SYNC.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon
    Acked-by: Jens Axboe

    Mikulas Patocka
     
  • Publish the dm-io, dm-log and dm-kcopyd headers in include/linux.

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     
  • Rename kcopyd.[ch] to dm-kcopyd.[ch].

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon