Eric Lee / smarc-fsl-linux-kernel

01 Nov, 2011

1 commit

7f0696539 dm kcopyd: add dm_kcopyd_zero to zero an area ... Browse Code »

This patch introduces dm_kcopyd_zero() to make it easy to use
kcopyd to write zeros into the requested areas instead
instead of copying. It is implemented by passing a NULL
copying source to dm_kcopyd_copy().

The forthcoming thin provisioning target uses this.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-11-01 04:18:58 +0800

24 Oct, 2011

1 commit

d136f2efd dm kcopyd: fix job_pool leak ... Browse Code »

Fix memory leak introduced by commit a6e50b409d3f9e0833e69c3c9cca822e8fa4adbb
(dm snapshot: skip reading origin when overwriting complete chunk).

When allocating a set of jobs from kc->job_pool, job->master_job must be
set (to point to itself) so that the mempool item gets freed when the
master_job completes.

master_job was introduced by commit c6ea41fbbe08f270a8edef99dc369faf809d1bd6
(dm kcopyd: preallocate sub jobs to avoid deadlock)

Reported-by: Michael Leun
Cc: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2011-10-24 03:55:17 +0800

02 Aug, 2011

3 commits

a6e50b409 dm snapshot: skip reading origin when overwriting complete chunk ... Browse Code »

If we write a full chunk in the snapshot, skip reading the origin device
because the whole chunk will be overwritten anyway.

This patch changes the snapshot write logic when a full chunk is written.
In this case:
1. allocate the exception
2. dispatch the bio (but don't report the bio completion to device mapper)
3. write the exception record
4. report bio completed

Callbacks must be done through the kcopyd thread, because callbacks must not
race with each other. So we create two new functions:

dm_kcopyd_prepare_callback: allocate a job structure and prepare the callback.
(This function must not be called from interrupt context.)

dm_kcopyd_do_callback: submit callback.
(This function may be called from interrupt context.)

Performance test (on snapshots with 4k chunk size):
without the patch:
non-direct-io sequential write (dd): 17.7MB/s
direct-io sequential write (dd): 20.9MB/s
non-direct-io random write (mkfs.ext2): 0.44s

with the patch:
non-direct-io sequential write (dd): 26.5MB/s
direct-io sequential write (dd): 33.2MB/s
non-direct-io random write (mkfs.ext2): 0.27s

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:04 +0800
5bf45a3dc dm kcopyd: remove nr_pages field from job structure ... Browse Code »

The nr_pages field in struct kcopyd_job is only used temporarily in
run_pages_job() to count the number of required pages.
We can use a local variable instead.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:02 +0800
4622afb3f dm kcopyd: remove offset field from job structure ... Browse Code »

The offset field in struct kcopyd_job is always zero so remove it.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:02 +0800

27 Jul, 2011

1 commit

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800

29 May, 2011

8 commits

fa34ce730 dm kcopyd: return client directly and not through a pointer ... Browse Code »

Return client directly from dm_kcopyd_client_create, not through a
parameter, making it consistent with dm_io_client_create.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-05-29 20:03:13 +0800
5f43ba295 dm kcopyd: reserve fewer pages ... Browse Code »

Reserve just the minimum of pages needed to process one job.

Because we allocate pages from page allocator, we don't need to reserve
a large number of pages. The maximum job size is SUB_JOB_SIZE and we
calculate the number of reserved pages based on this.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-05-29 20:03:11 +0800
bda8efec5 dm io: use fixed initial mempool size ... Browse Code »

Replace the arbitrary calculation of an initial io struct mempool size
with a constant.

The code calculated the number of reserved structures based on the request
size and used a "magic" multiplication constant of 4. This patch changes
it to reserve a fixed number - itself still chosen quite arbitrarily.
Further testing might show if there is a better number to choose.

Note that if there is no memory pressure, we can still allocate an
arbitrary number of "struct io" structures. One structure is enough to
process the whole request.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-05-29 20:03:09 +0800
d04714580 dm kcopyd: alloc pages from the main page allocator ... Browse Code »

This patch changes dm-kcopyd so that it allocates pages from the main
page allocator with __GFP_NOWARN | __GFP_NORETRY flags (so that it can
fail in case of memory pressure). If the allocation fails, dm-kcopyd
allocates pages from its own reserve.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-05-29 20:03:07 +0800
f99b55eec dm kcopyd: add gfp parm to alloc_pl ... Browse Code »

Introduce a parameter for gfp flags to alloc_pl() for use in following
patches.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-05-29 20:03:04 +0800
4cc1b4cff dm kcopyd: remove superfluous page allocation spinlock ... Browse Code »

Remove the spinlock protecting the pages allocation. The spinlock is only
taken on initialization or from single-threaded workqueue. Therefore, the
spinlock is useless.

The spinlock is taken in kcopyd_get_pages and kcopyd_put_pages.

kcopyd_get_pages is only called from run_pages_job, which is only
called from process_jobs called from do_work.

kcopyd_put_pages is called from client_alloc_pages (which is initialization
function) or from run_complete_job. run_complete_job is only called from
process_jobs called from do_work.

Another spinlock, kc->job_lock is taken each time someone pushes or pops
some work for the worker thread. Once we take kc->job_lock, we
guarantee that any written memory is visible to the other CPUs.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-05-29 20:03:02 +0800
c6ea41fbb dm kcopyd: preallocate sub jobs to avoid deadlock ... Browse Code »

There's a possible theoretical deadlock in dm-kcopyd because multiple
allocations from the same mempool are required to finish a request.
Avoid this by preallocating sub jobs.

There is a mempool of 512 entries. Each request requires up to 9
entries from the mempool. If we have at least 57 concurrent requests
running, the mempool may overflow and mempool allocations may start
blocking until another entry is freed to the mempool. Because the same
thread is used to free entries to the mempool and allocate entries from
the mempool, this may result in a deadlock.

This patch changes it so that one mempool entry contains all 9 "struct
kcopyd_job" required to fulfill the whole request. The allocation is
done only once in dm_kcopyd_copy and no further mempool allocations are
done during request processing.

If dm_kcopyd_copy is not run in the completion thread, this
implementation is deadlock-free.

MIN_JOBS needs reducing accordingly and we've chosen to reduce it
further to 8.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-05-29 20:03:00 +0800
a705a34a5 dm kcopyd: avoid pointless job splitting ... Browse Code »

Don't split SUB_JOB_SIZE jobs

If the job size equals SUB_JOB_SIZE, there is no point in splitting it.
Splitting it just unnecessarily wastes time, because the split job size
is SUB_JOB_SIZE too.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-05-29 20:02:58 +0800

10 Mar, 2011

2 commits

721a9602e block: kill off REQ_UNPLUG ... Browse Code »

With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:27 +0800
7eaceacca block: remove per-queue plugging ... Browse Code »
86

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800

14 Jan, 2011

4 commits

9c4376de9 dm: use non reentrant workqueues if equivalent ... Browse Code »

kmirrord_wq, kcopyd_work and md->wq are created per dm instance and
serve only a single work item from the dm instance, so non-reentrant
workqueues would provide the same ordering guarantees as ordered ones
while allowing CPU affinity and use of the workqueues for other
purposes. Switch them to non-reentrant workqueues.

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:58 +0800
4d4d66ab5 dm: convert workqueues to alloc_ordered ... Browse Code »

Convert all create[_singlethread]_work() users to the new
alloc[_ordered]_workqueue(). This conversion is mechanical and
doesn't introduce any behavior change.

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:57 +0800
8d35d3e37 dm kcopyd: delay unplugging ... Browse Code »

Make kcopyd merge more I/O requests by using device unplugging.

Without this patch, each I/O request is dispatched separately to the device.
If the device supports tagged queuing, there are many small requests sent
to the device. To improve performance, this patch will batch as many requests
as possible, allowing the queue to merge consecutive requests, and send them
to the device at once.

In my tests (15k SCSI disk), this patch improves sequential write throughput:

Sequential write throughput (chunksize of 4k, 32k, 512k)
unpatched: 15.2, 18.5, 17.5 MB/s
patched: 14.4, 22.6, 23.0 MB/s

In most common uses (snapshot or two-way mirror), kcopyd is only used for
two devices, one for reading and the other for writing, thus this optimization
is implemented only for two devices. The optimization may be extended to n-way
mirrors with some code complexity increase.

We keep track of two block devices to unplug (one for read and the
other for write) and unplug them when exiting "do_work" thread. If
there are more devices used (in theory it could happen, in practice it
is rare), we unplug immediately.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-01-14 03:59:50 +0800
d9bf0b508 dm io: remove BIO_RW_SYNCIO flag from kcopyd ... Browse Code »

Remove the REQ_SYNC flag to improve write throughput when writing
to the origin with a snapshot on the same device (using the CFQ I/O
scheduler).

Sequential write throughput (chunksize of 4k, 32k, 512k)
unpatched: 8.5, 8.6, 9.3 MB/s
patched: 15.2, 18.5, 17.5 MB/s

Snapshot exception reallocations are triggered by writes that are
usually async, so mark the associated dm_io_request as async as well.
This helps when using the CFQ I/O scheduler because it has separate
queues for sync and async I/O. Async is optimized for throughput; sync
for latency. With this change we're consciously favoring throughput over
latency.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-01-14 03:59:47 +0800

08 Aug, 2010

1 commit

7b6d91dae block: unify flags for struct bio and struct request ... Browse Code »

Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-08-08 00:20:39 +0800

11 Dec, 2009

1 commit

9ca170a3c dm kcopyd: accept zero size jobs ... Browse Code »

dm-kcopyd: accept zero-size jobs

This patch changes dm-kcopyd so that it accepts zero-size jobs and completes
them immediatelly via its completion thread.

It is needed for multisnapshots snapshot resizing. When we are writing to
a chunk beyond origin end, no copying is done. To simplify the code, we submit
an empty request to kcopyd and let kcopyd complete it. If we didn't submit
a request to kcopyd and called the completion routine immediatelly, it would
violate the principle that completion is called only from one thread and
it would need additional locking.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:52:13 +0800

09 Apr, 2009

2 commits

340cd4445 dm kcopyd: fix callback race ... Browse Code »

If the thread calling dm_kcopyd_copy is delayed due to scheduling inside
split_job/segment_complete and the subjobs complete before the loop in
split_job completes, the kcopyd callback could be invoked from the
thread that called dm_kcopyd_copy instead of the kcopyd workqueue.

dm_kcopyd_copy -> split_job -> segment_complete -> job->fn()

Snapshots depend on the fact that callbacks are called from the singlethreaded
kcopyd workqueue and expect that there is no racing between individual
callbacks. The racing between callbacks can lead to corruption of exception
store and it can also mean that exception store callbacks are called twice
for the same exception - a likely reason for crashes reported inside
pending_complete() / remove_exception().

This patch fixes two problems:

1. job->fn being called from the thread that submitted the job (see above).

- Fix: hand over the completion callback to the kcopyd thread.

2. job->fn(read_err, write_err, job->context); in segment_complete
reports the error of the last subjob, not the union of all errors.

- Fix: pass job->write_err to the callback to report all error bits
(it is done already in run_complete_job)

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-09 07:27:17 +0800
73830857b dm kcopyd: prepare for callback race fix ... Browse Code »

Use a variable in segment_complete() to point to the dm_kcopyd_client
struct and only release job->pages in run_complete_job() if any are
defined. These changes are needed by the next patch.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-09 07:27:16 +0800

18 Feb, 2009

1 commit

93dbb3935 block: fix bad definition of BIO_RW_SYNC ... Browse Code »

We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO
and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before
213d9417fec62ef4c3675621b9364a667954d4dd.

Signed-off-by: Jens Axboe

Jens Axboe
2009-02-18 17:32:00 +0800

22 Oct, 2008

2 commits

586e80e6e dm: remove dm header from targets ... Browse Code »

Change #include "dm.h" to #include in all targets.
Targets should not need direct access to internal DM structures.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-10-22 00:44:59 +0800
b673c3a81 dm kcopyd: avoid queue shuffle ... Browse Code »

Write throughput to LVM snapshot origin volume is an order
of magnitude slower than those to LV without snapshots or
snapshot target volumes, especially in the case of sequential
writes with O_SYNC on.

The following patch originally written by Kevin Jamieson and
Jan Blunck and slightly modified for the current RCs by myself
tries to improve the performance by modifying the behaviour
of kcopyd, so that it pushes back an I/O job to the head of
the job queue instead of the tail as process_jobs() currently
does when it has to wait for free pages. This way, write
requests aren't shuffled to cause extra seeks.

I tested the patch against 2.6.27-rc5 and got the following results.
The test is a dd command writing to snapshot origin followed by fsync
to the file just created/updated. A couple of filesystem benchmarks
gave me similar results in case of sequential writes, while random
writes didn't suffer much.

dd if=/dev/zero of= bs=4096 count=...
[conv=notrunc when updating]

1) linux 2.6.27-rc5 without the patch, write to snapshot origin,
average throughput (MB/s)
10M 100M 1000M
create,dd 511.46 610.72 11.81
create,dd+fsync 7.10 6.77 8.13
update,dd 431.63 917.41 12.75
update,dd+fsync 7.79 7.43 8.12

compared with write throughput to LV without any snapshots,
all dd+fsync and 1000 MiB writes perform very poorly.

10M 100M 1000M
create,dd 555.03 608.98 123.29
create,dd+fsync 114.27 72.78 76.65
update,dd 152.34 1267.27 124.04
update,dd+fsync 130.56 77.81 77.84

2) linux 2.6.27-rc5 with the patch, write to snapshot origin,
average throughput (MB/s)

10M 100M 1000M
create,dd 537.06 589.44 46.21
create,dd+fsync 31.63 29.19 29.23
update,dd 487.59 897.65 37.76
update,dd+fsync 34.12 30.07 26.85

Although still not on par with plain LV performance -
cannot be avoided because it's copy on write anyway -
this simple patch successfully improves throughtput
of dd+fsync while not affecting the rest.

Signed-off-by: Jan Blunck
Signed-off-by: Kazuo Ito
Signed-off-by: Alasdair G Kergon
Cc: stable@kernel.org

Kazuo Ito
2008-10-22 00:44:50 +0800

25 Apr, 2008

3 commits

7ff14a361 dm: unplug queues in threads ... Browse Code »

Remove an avoidable 3ms delay on some dm-raid1 and kcopyd I/O.

It is specified that any submitted bio without BIO_RW_SYNC flag may plug the
queue (i.e. block the requests from being dispatched to the physical device).

The queue is unplugged when the caller calls blk_unplug() function. Usually, the
sequence is that someone calls submit_bh to submit IO on a buffer. The IO plugs
the queue and waits (to be possibly joined with other adjacent bios). Then, when
the caller calls wait_on_buffer(), it unplugs the queue and submits the IOs to
the disk.

This was happenning:

When doing O_SYNC writes, function fsync_buffers_list() submits a list of
bios to dm_raid1, the bios are added to dm_raid1 write queue and kmirrord is
woken up.

fsync_buffers_list() calls wait_on_buffer(). That unplugs the queue, but
there are no bios on the device queue as they are still in the dm_raid1 queue.

wait_on_buffer() starts waiting until the IO is finished.

kmirrord is scheduled, kmirrord takes bios and submits them to the devices.

The submitted bio plugs the harddisk queue but there is no one to unplug it.
(The process that called wait_on_buffer() is already sleeping.)

So there is a 3ms timeout, after which the queues on the harddisks are
unplugged and requests are processed.

This 3ms timeout meant that in certain workloads (e.g. O_SYNC, 8kb writes),
dm-raid1 is 10 times slower than md raid1.

Every time we submit something asynchronously via dm_io, we must unplug the
queue actually to send the request to the device.

This patch adds an unplug call to kmirrord - while processing requests, it keeps
the queue plugged (so that adjacent bios can be merged); when it finishes
processing all the bios, it unplugs the queue to submit the bios.

It also fixes kcopyd which has the same potential problem. All kcopyd requests
are submitted with BIO_RW_SYNC.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon
Acked-by: Jens Axboe

Mikulas Patocka
2008-04-25 20:26:57 +0800
a765e20ee dm: move include files ... Browse Code »

Publish the dm-io, dm-log and dm-kcopyd headers in include/linux.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2008-04-25 20:26:55 +0800
2d1e580af dm kcopyd: rename ... Browse Code »

Rename kcopyd.[ch] to dm-kcopyd.[ch].

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2008-04-25 20:26:54 +0800