Eric Lee / smarc-fsl-linux-kernel

24 Jul, 2009

3 commits

5dea271b6 dm table: pass correct dev area size to device_area_is_valid ... Browse Code »

Incorrect device area lengths are being passed to device_area_is_valid().

The regression appeared in 2.6.31-rc1 through commit
754c5fc7ebb417b23601a6222a6005cc2e7f2913.

With the dm-stripe target, the size of the target (ti->len) was used
instead of the stripe_width (ti->len/#stripes). An example of a
consequent incorrect error message is:

device-mapper: table: 254:0: sdb too small for target

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-07-24 03:30:42 +0800
a732c207d dm: remove queue next_ordered workaround for barriers ... Browse Code »

This patch removes DM's bio-based vs request-based conditional setting
of next_ordered. For bio-based DM the next_ordered check is no longer a
concern (as that check is now in the __make_request path). For
request-based DM the default of QUEUE_ORDERED_NONE is now appropriate.

bio-based DM was changed to work-around the previously misplaced
next_ordered check with this commit:
99360b4c18f7675b50d283301d46d755affe75fd

request-based DM does not yet support barriers but reacted to the above
bio-based DM change with this commit:
5d67aa2366ccb8257d103d0b43df855605c3c086

The above changes are no longer needed given Neil Brown's recent fix to
put the next_ordered check in the __make_request path:
db64f680ba4b5c56c4be59f0698000df89ff0281

Signed-off-by: Mike Snitzer
Cc: Jun'ichi Nomura
Cc: NeilBrown
Acked-by: Kiyoshi Ueda
Acked-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-07-24 03:30:40 +0800
69885683d dm raid1: wake kmirrord when requeueing delayed bios after remote recovery ... Browse Code »

The recent commit 7513c2a761d69d2a93f17146b3563527d3618ba0 (dm raid1:
add is_remote_recovering hook for clusters) changed do_writes() to
update the ms->writes list but forgot to wake up kmirrord to process it.

The rule is that when anything is being added on ms->reads, ms->writes
or ms->failures and the list was empty before we must call
wakeup_mirrord (for immediate processing) or delayed_wake (for delayed
processing). Otherwise the bios could sit on the list indefinitely.

Signed-off-by: Mikulas Patocka
CC: stable@kernel.org
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-07-24 03:30:37 +0800

11 Jul, 2009

1 commit

8aa7e847d Fix congestion_wait() sync/async vs read/write confusion ... Browse Code »

Commit 1faa16d22877f4839bd433547d770c676d1d964c accidentally broke
the bdi congestion wait queue logic, causing us to wait on congestion
for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead.

Signed-off-by: Jens Axboe

Jens Axboe
2009-07-11 02:31:53 +0800

09 Jul, 2009

1 commit

ad361c988 Remove multiple KERN_ prefixes from printk formats ... Browse Code »

Commit 5fd29d6ccbc98884569d6f3105aeca70858b3e0f ("printk: clean up
handling of log-levels and newlines") changed printk semantics. printk
lines with multiple KERN_ prefixes are no longer emitted as
before the patch.

is now included in the output on each additional use.

Remove all uses of multiple KERN_s in formats.

Signed-off-by: Joe Perches
Signed-off-by: Linus Torvalds

Joe Perches
2009-07-09 01:30:03 +0800

02 Jul, 2009

2 commits

2027bd9f9 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: remove redundant check for NULL cfqq in cfq_set_request()
blocK: Restore barrier support for md and probably other virtual devices.
block: get rid of queue-private command filter
block: Create bip slabs with embedded integrity vectors
cfq-iosched: get rid of the need for __GFP_NOFAIL in cfq_find_alloc_queue()
cfq-iosched: move cfqq initialization out of cfq_find_alloc_queue()
Trivial typo fixes in Documentation/block/data-integrity.txt.

Linus Torvalds
2009-07-02 01:41:09 +0800
544ae5f96 Merge branch 'for-linus' of git://neil.brown.name/md ... Browse Code »

* 'for-linus' of git://neil.brown.name/md:
md: use interruptible wait when duration is controlled by userspace.
md/raid5: suspend shouldn't affect read requests.
md: tidy up error paths in md_alloc
md: fix error path when duplicate name is found on md device creation.
md: avoid dereferencing NULL pointer when accessing suspend_* sysfs attributes.
md: Use new topology calls to indicate alignment and I/O sizes

Linus Torvalds
2009-07-02 01:31:26 +0800

01 Jul, 2009

7 commits

7878cba9f block: Create bip slabs with embedded integrity vectors ... Browse Code »

This patch restores stacking ability to the block layer integrity
infrastructure by creating a set of dedicated bip slabs. Each bip slab
has an embedded bio_vec array at the end. This cuts down on memory
allocations and also simplifies the code compared to the original bvec
version. Only the largest bip slab is backed by a mempool. The pool is
contained in the bio_set so stacking drivers can ensure forward
progress.

Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Martin K. Petersen
2009-07-01 16:56:25 +0800
e62e58a5f md: use interruptible wait when duration is controlled by userspace. ... Browse Code »

User space can set various limits on an md array so that resync waits
when it gets to a certain point, or so that I/O is blocked for a short
while.
When md is waiting against one of these limit, it should use an
interruptible wait so as not to add to the load average, and so are
not to trigger a warning if the wait goes on for too long.

Signed-off-by: NeilBrown

NeilBrown
2009-07-01 11:15:35 +0800
a5c308d4d md/raid5: suspend shouldn't affect read requests. ... Browse Code »

md allows write to regions on an array to be suspended temporarily.
This allows user-space to participate is aspects of reshape.
In particular, data can be copied with not risk of a race.
We should not be blocking read requests though, so don't.

Cc: stable@kernel.org
Signed-off-by: NeilBrown

NeilBrown
2009-07-01 11:15:35 +0800
0909dc448 md: tidy up error paths in md_alloc ... Browse Code »

As the recent bug in md_alloc showed, having a single exit path for
unlocking and putting is a good idea. So restructure md_alloc to have
a single mutex_unlock and mddev_put, and use gotos where necessary.

Found-by: Jiri Slaby
Signed-off-by: NeilBrown

NeilBrown
2009-07-01 10:27:21 +0800
1ec22eb2b md: fix error path when duplicate name is found on md device creation. ... Browse Code »

When an md device is created by name (rather than number) we need to
check that the name is not already in use. If this check finds a
duplicate, we return an error without dropping the lock or freeing
the newly create mddev.
This patch fixes that.

Cc: stable@kernel.org
Found-by: Jiri Slaby
Signed-off-by: NeilBrown

NeilBrown
2009-07-01 10:27:21 +0800
b8d966efd md: avoid dereferencing NULL pointer when accessing suspend_* sysfs attributes. ... Browse Code »

If we try to modify one of the md/ sysfs files
suspend_lo or suspend_hi
when the array is not active, we dereference a NULL.
Protect against that.

Cc: stable@kernel.org
Signed-off-by: NeilBrown

NeilBrown
2009-07-01 09:14:04 +0800
8f6c2e4b3 md: Use new topology calls to indicate alignment and I/O sizes ... Browse Code »

Switch MD over to the new disk_stack_limits() function which checks for
aligment and adjusts preferred I/O sizes when stacking.

Also indicate preferred I/O sizes where applicable.

Signed-off-by: Martin K. Petersen
Signed-off-by: Mike Snitzer
Signed-off-by: NeilBrown

Martin K. Petersen
2009-07-01 09:13:45 +0800

30 Jun, 2009

2 commits

ea9df47cc dm table: fix blk_stack_limits arg to use bytes not sectors ... Browse Code »

The offset passed to blk_stack_limits() must be in bytes not sectors.
Fixes false warnings like the following:
device-mapper: table: 254:1: target device sda6 is misaligned

Signed-off-by: Mike Snitzer
Reported-by: Frans Pop
Tested-by: Frans Pop
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-06-30 22:18:17 +0800
874d2f61d dm exception store: really fix type lookup ... Browse Code »

Fix exception store name handling.

We need to reference exception store by zero terminated string.

Fixes regression introduced in commit f6bd4eb73cdf2a5bf954e497972842f39cabb7e3

Cc: Yi Yang
Cc: Jonathan Brassow
Cc: stable@kernel.org
Cc: Andrew Morton
Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2009-06-30 22:18:14 +0800

22 Jun, 2009

24 commits

f40c67f0f dm mpath: change to be request based ... Browse Code »

This patch converts dm-multipath target to request-based from bio-based.

Basically, the patch just converts the I/O unit from struct bio
to struct request.
In the course of the conversion, it also changes the I/O queueing
mechanism. The change in the I/O queueing is described in details
as follows.

I/O queueing mechanism change
-----------------------------
In I/O submission, map_io(), there is no mechanism change from
bio-based, since the clone request is ready for retry as it is.
However, in I/O complition, do_end_io(), there is a mechanism change
from bio-based, since the clone request is not ready for retry.

In do_end_io() of bio-based, the clone bio has all needed memory
for resubmission. So the target driver can queue it and resubmit
it later without memory allocations.
The mechanism has almost no overhead.

On the other hand, in do_end_io() of request-based, the clone request
doesn't have clone bios, so the target driver can't resubmit it
as it is. To resubmit the clone request, memory allocation for
clone bios is needed, and it takes some overheads.
To avoid the overheads just for queueing, the target driver doesn't
queue the clone request inside itself.
Instead, the target driver asks dm core for queueing and remapping
the original request of the clone request, since the overhead for
queueing is just a freeing memory for the clone request.

As a result, the target driver doesn't need to record/restore
the information of the original request for resubmitting
the clone request. So dm_bio_details in dm_mpath_io is removed.

multipath_busy()
---------------------
The target driver returns "busy", only when the following case:
o The target driver will map I/Os, if map() function is called
and
o The mapped I/Os will wait on underlying device's queue due to
their congestions, if map() function is called now.

In other cases, the target driver doesn't return "busy".
Otherwise, dm core will keep the I/Os and the target driver can't
do what it wants.
(e.g. the target driver can't map I/Os now, so wants to kill I/Os.)

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Acked-by: Hannes Reinecke
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:37 +0800
523d9297d dm: disable interrupt when taking map_lock ... Browse Code »

This patch disables interrupt when taking map_lock to avoid
lockdep warnings in request-based dm.

request-based dm takes map_lock after taking queue_lock with
disabling interrupt:
spin_lock_irqsave(queue_lock)
q->request_fn() == dm_request_fn()
=> dm_get_table()
=> read_lock(map_lock)
while queue_lock could be (but isn't) taken in interrupt context.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Acked-by: Christof Schmitt
Acked-by: Hannes Reinecke
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:37 +0800
5d67aa236 dm: do not set QUEUE_ORDERED_DRAIN if request based ... Browse Code »

Request-based dm doesn't have barrier support yet.
So we need to set QUEUE_ORDERED_DRAIN only for bio-based dm.
Since the device type is decided at the first table loading time,
the flag set is deferred until then.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Acked-by: Hannes Reinecke
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:36 +0800
e6ee8c0b7 dm: enable request based option ... Browse Code »

This patch enables request-based dm.

o Request-based dm and bio-based dm coexist, since there are
some target drivers which are more fitting to bio-based dm.
Also, there are other bio-based devices in the kernel
(e.g. md, loop).
Since bio-based device can't receive struct request,
there are some limitations on device stacking between
bio-based and request-based.

type of underlying device
bio-based request-based
----------------------------------------------
bio-based OK OK
request-based -- OK

The device type is recognized by the queue flag in the kernel,
so dm follows that.

o The type of a dm device is decided at the first table binding time.
Once the type of a dm device is decided, the type can't be changed.

o Mempool allocations are deferred to at the table loading time, since
mempools for request-based dm are different from those for bio-based
dm and needed mempool type is fixed by the type of table.

o Currently, request-based dm supports only tables that have a single
target. To support multiple targets, we need to support request
splitting or prevent bio/request from spanning multiple targets.
The former needs lots of changes in the block layer, and the latter
needs that all target drivers support merge() function.
Both will take a time.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:36 +0800
cec47e3d4 dm: prepare for request based option ... Browse Code »

This patch adds core functions for request-based dm.

When struct mapped device (md) is initialized, md->queue has
an I/O scheduler and the following functions are used for
request-based dm as the queue functions:
make_request_fn: dm_make_request()
pref_fn: dm_prep_fn()
request_fn: dm_request_fn()
softirq_done_fn: dm_softirq_done()
lld_busy_fn: dm_lld_busy()
Actual initializations are done in another patch (PATCH 2).

Below is a brief summary of how request-based dm behaves, including:
- making request from bio
- cloning, mapping and dispatching request
- completing request and bio
- suspending md
- resuming md

bio to request
==============
md->queue->make_request_fn() (dm_make_request()) calls __make_request()
for a bio submitted to the md.
Then, the bio is kept in the queue as a new request or merged into
another request in the queue if possible.

Cloning and Mapping
===================
Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()),
when requests are dispatched after they are sorted by the I/O scheduler.

dm_request_fn() checks busy state of underlying devices using
target's busy() function and stops dispatching requests to keep them
on the dm device's queue if busy.
It helps better I/O merging, since no merge is done for a request
once it is dispatched to underlying devices.

Actual cloning and mapping are done in dm_prep_fn() and map_request()
called from dm_request_fn().
dm_prep_fn() clones not only request but also bios of the request
so that dm can hold bio completion in error cases and prevent
the bio submitter from noticing the error.
(See the "Completion" section below for details.)

After the cloning, the clone is mapped by target's map_rq() function
and inserted to underlying device's queue using
blk_insert_cloned_request().

Completion
==========
Request completion can be hooked by rq->end_io(), but then, all bios
in the request will have been completed even error cases, and the bio
submitter will have noticed the error.
To prevent the bio completion in error cases, request-based dm clones
both bio and request and hooks both bio->bi_end_io() and rq->end_io():
bio->bi_end_io(): end_clone_bio()
rq->end_io(): end_clone_request()

Summary of the request completion flow is below:
blk_end_request() for a clone request
=> blk_update_request()
=> bio->bi_end_io() == end_clone_bio() for each clone bio
=> Free the clone bio
=> Success: Complete the original bio (blk_update_request())
Error: Don't complete the original bio
=> blk_finish_request()
=> rq->end_io() == end_clone_request()
=> blk_complete_request()
=> dm_softirq_done()
=> Free the clone request
=> Success: Complete the original request (blk_end_request())
Error: Requeue the original request

end_clone_bio() completes the original request on the size of
the original bio in successful cases.
Even if all bios in the original request are completed by that
completion, the original request must not be completed yet to keep
the ordering of request completion for the stacking.
So end_clone_bio() uses blk_update_request() instead of
blk_end_request().
In error cases, end_clone_bio() doesn't complete the original bio.
It just frees the cloned bio and gives over the error handling to
end_clone_request().

end_clone_request(), which is called with queue lock held, completes
the clone request and the original request in a softirq context
(dm_softirq_done()), which has no queue lock, to avoid a deadlock
issue on submission of another request during the completion:
- The submitted request may be mapped to the same device
- Request submission requires queue lock, but the queue lock
has been held by itself and it doesn't know that

The clone request has no clone bio when dm_softirq_done() is called.
So target drivers can't resubmit it again even error cases.
Instead, they can ask dm core for requeueing and remapping
the original request in that cases.

suspend
=======
Request-based dm uses stopping md->queue as suspend of the md.
For noflush suspend, just stops md->queue.

For flush suspend, inserts a marker request to the tail of md->queue.
And dispatches all requests in md->queue until the marker comes to
the front of md->queue. Then, stops dispatching request and waits
for the all dispatched requests to complete.
After that, completes the marker request, stops md->queue and
wake up the waiter on the suspend queue, md->wait.

resume
======
Starts md->queue.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:35 +0800
f5db4af46 dm raid1: add userspace log ... Browse Code »

This patch contains a device-mapper mirror log module that forwards
requests to userspace for processing.

The structures used for communication between kernel and userspace are
located in include/linux/dm-log-userspace.h. Due to the frequency,
diversity, and 2-way communication nature of the exchanges between
kernel and userspace, 'connector' was chosen as the interface for
communication.

The first log implementations written in userspace - "clustered-disk"
and "clustered-core" - support clustered shared storage. A userspace
daemon (in the LVM2 source code repository) uses openAIS/corosync to
process requests in an ordered fashion with the rest of the nodes in the
cluster so as to prevent log state corruption. Other implementations
with no association to LVM or openAIS/corosync, are certainly possible.

(Imagine if two machines are writing to the same region of a mirror.
They would both mark the region dirty, but you need a cluster-aware
entity that can handle properly marking the region clean when they are
done. Otherwise, you might clear the region when the first machine is
done, not the second.)

Signed-off-by: Jonathan Brassow
Cc: Evgeniy Polyakov
Signed-off-by: Alasdair G Kergon

Jonthan Brassow
2009-06-22 17:12:35 +0800
754c5fc7e dm: calculate queue limits during resume not load ... Browse Code »

Currently, device-mapper maintains a separate instance of 'struct
queue_limits' for each table of each device. When the configuration of
a device is to be changed, first its table is loaded and this structure
is populated, then the device is 'resumed' and the calculated
queue_limits are applied.

This places restrictions on how userspace may process related devices,
where it is often advantageous to 'load' tables for several devices
at once before 'resuming' them together. As the new queue_limits
only take effect after the 'resume', if they are changing and one
device uses another, the latter must be 'resumed' before the former
may be 'loaded'.

This patch moves the calculation of these queue_limits out of
the 'load' operation into 'resume'. Since we are no longer
pre-calculating this struct, we no longer need to maintain copies
within our dm structs.

dm_set_device_limits() now passes the 'start' of the device's
data area (aka pe_start) as the 'offset' to blk_stack_limits().

init_valid_queue_limits() is replaced by blk_set_default_limits().

Signed-off-by: Mike Snitzer
Cc: martin.petersen@oracle.com
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-06-22 17:12:34 +0800
18d8594dd dm log: fix create_log_context to use logical_block_size of log device ... Browse Code »

create_log_context() must use the logical_block_size from the log disk,
where the I/O happens, not the target's logical_block_size.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-06-22 17:12:33 +0800
af4874e03 dm target:s introduce iterate devices fn ... Browse Code »

Add .iterate_devices to 'struct target_type' to allow a function to be
called for all devices in a DM target. Implemented it for all targets
except those in dm-snap.c (origin and snapshot).

(The raid1 version number jumps to 1.12 because we originally reserved
1.1 to 1.11 for 'block_on_error' but ended up using 'handle_errors'
instead.)

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon
Cc: martin.petersen@oracle.com

Mike Snitzer
2009-06-22 17:12:33 +0800
1197764e4 dm table: establish queue limits by copying table limits ... Browse Code »

Copy the table's queue_limits to the DM device's request_queue. This
properly initializes the queue's topology limits and also avoids having
to track the evolution of 'struct queue_limits' in
dm_table_set_restrictions()

Also fixes a bug that was introduced in dm_table_set_restrictions() via
commit ae03bf639a5027d27270123f5f6e3ee6a412781d. In addition to
establishing 'bounce_pfn' in the queue's limits blk_queue_bounce_limit()
also performs an allocation to setup the ISA DMA pool. This allocation
resulted in "sleeping function called from invalid context" when called
from dm_table_set_restrictions().

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-06-22 17:12:32 +0800
5ab97588f dm table: replace struct io_restrictions with struct queue_limits ... Browse Code »

Use blk_stack_limits() to stack block limits (including topology) rather
than duplicate the equivalent within Device Mapper.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-06-22 17:12:32 +0800
be6d4305d dm table: validate device logical_block_size ... Browse Code »

Impose necessary and sufficient conditions on a devices's table such
that any incoming bio which respects its logical_block_size can be
processed successfully.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-06-22 17:12:31 +0800
02acc3a4f dm table: ensure targets are aligned to logical_block_size ... Browse Code »

Ensure I/O is aligned to the logical block size of target devices.

Rename check_device_area() to device_area_is_valid() for clarity and
establish the device limits including the logical block size prior to
calling it.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-06-22 17:12:30 +0800
60935eb21 dm ioctl: support cookies for udev ... Browse Code »

Add support for passing a 32 bit "cookie" into the kernel with the
DM_SUSPEND, DM_DEV_RENAME and DM_DEV_REMOVE ioctls. The (unsigned)
value of this cookie is returned to userspace alongside the uevents
issued by these ioctls in the variable DM_COOKIE.

This means the userspace process issuing these ioctls can be notified
by udev after udev has completed any actions triggered.

To minimise the interface extension, we pass the cookie into the
kernel in the event_nr field which is otherwise unused when calling
these ioctls. Incrementing the version number allows userspace to
determine in advance whether or not the kernel supports the cookie.
If the kernel does support this but userspace does not, there should
be no impact as the new variable will just get ignored.

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2009-06-22 17:12:30 +0800
486d220fe dm: sysfs add suspended attribute ... Browse Code »

Add a file named 'suspended' to each device-mapper device directory in
sysfs. It holds the value 1 while the device is suspended. Otherwise
it holds 0.

Signed-off-by: Peter Rajnoha
Signed-off-by: Alasdair G Kergon

Peter Rajnoha
2009-06-22 17:12:29 +0800
1b6da7545 dm table: improve warning message when devices not freed before destruction ... Browse Code »

Report any devices forgotten to be freed before a table is destroyed.

Signed-off-by: Jonathan Brassow
Signed-off-by: Alasdair G Kergon

Jonthan Brassow
2009-06-22 17:12:29 +0800
f392ba889 dm mpath: add service time load balancer ... Browse Code »

This patch adds a service time oriented dynamic load balancer,
dm-service-time, which selects the path with the shortest estimated
service time for the incoming I/O.
The service time is estimated by dividing the in-flight I/O size
by a performance value of each path.

The performance value can be given as a table argument at the table
loading time. If no performance value is given, all paths are
considered equal.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:28 +0800
fd5e03390 dm mpath: add queue length load balancer ... Browse Code »

This patch adds a dynamic load balancer, dm-queue-length, which
balances the number of in-flight I/Os across the paths.

The code is based on the patch posted by Stefan Bader:
https://www.redhat.com/archives/dm-devel/2005-October/msg00050.html

Signed-off-by: Stefan Bader
Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:27 +0800
02ab823fd dm mpath: add start_io and nr_bytes to path selectors ... Browse Code »

This patch makes two additions to the dm path selector interface for
dynamic load balancers:
o a new hook, start_io()
o a new parameter 'nr_bytes' to select_path()/start_io()/end_io()
to pass the size of the I/O

start_io() is called when a target driver actually submits I/O
to the selected path.
Path selectors can use it to start accounting of the I/O.
(e.g. counting the number of in-flight I/Os.)
The start_io hook is based on the patch posted by Stefan Bader:
https://www.redhat.com/archives/dm-devel/2005-October/msg00050.html

nr_bytes, the size of the I/O, is so path selectors can take the
size of the I/O into account when deciding which path to use.
dm-service-time uses it to estimate service time, for example.
(Added the nr_bytes member to dm_mpath_io instead of using existing
details.bi_size, since request-based dm patch deletes it.)

Signed-off-by: Stefan Bader
Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:27 +0800
2bd023452 dm snapshot: use barrier when writing exception store ... Browse Code »

Send barrier requests when updating the exception area.

Exception area updates need to be ordered w.r.t. data writes, so that
the writes are not reordered in hardware disk cache.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:26 +0800
51aa32284 dm io: retry after barrier error ... Browse Code »

If -EOPNOTSUPP was returned and the request was a barrier request, retry it
without barrier.

Retry all regions for now. Barriers are submitted only for one-region requests,
so it doesn't matter. (In the future, retries can be limited to the actual
regions that failed.)

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:26 +0800
5af443a7e dm io: record eopnotsupp ... Browse Code »

Add another field, eopnotsupp_bits. It is subset of error_bits, representing
regions that returned -EOPNOTSUPP. (The bit is set in both error_bits and
eopnotsupp_bits).

This value will be used in further patches.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:25 +0800
494b3ee7d dm snapshot: support barriers ... Browse Code »

Flush support for dm-snapshot target.

This patch just forwards the flush request to either the origin or the snapshot
device. (It doesn't flush exception store metadata.)

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:25 +0800
8627921fa dm mpath: support barriers ... Browse Code »

Flush support for dm-multipath target.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:24 +0800