Doug / smarc-fsl-linux-kernel | Embedian Git Server

12 Aug, 2010

3 commits

56a67df76 dm: factor out max_io_len_target_boundary ... Browse Code »

Split max_io_len_target_boundary out of max_io_len so that the discard
support can make use of it without duplicating max_io_len code.

Avoiding max_io_len's split_io logic enables DM's discard support to
submit the entire discard request to a target. But discards must still
be split on target boundaries.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-08-12 11:14:10 +0800
5ae89a872 dm: linear support discard ... Browse Code »

Allow discards to be passed through to linear mappings if at least one
underlying device supports it. Discards will be forwarded only to
devices that support them.

A target that supports discards should set num_discard_requests to
indicate how many times each discard request must be submitted to it.

Verify table's underlying devices support discards prior to setting the
associated DM device as capable of discards (via QUEUE_FLAG_DISCARD).

Signed-off-by: Mike Snitzer
Signed-off-by: Mikulas Patocka
Reviewed-by: Joe Thornber
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-08-12 11:14:08 +0800
57cba5d36 dm: rename map_info flush_request to target_request_nr ... Browse Code »

'target_request_nr' is a more generic name that reflects the fact that
it will be used for both flush and discard support.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-08-12 11:14:04 +0800

06 Mar, 2010

1 commit

8215d6ec5 dm table: remove unused dm_get_device range parameters ... Browse Code »

Remove unused parameters(start and len) of dm_get_device()
and fix the callers.

Signed-off-by: Nikanth Karthikesan
Signed-off-by: Alasdair G Kergon

Nikanth Karthikesan
2010-03-06 10:32:27 +0800

11 Dec, 2009

4 commits

64dbce580 dm: export suspended state to targets ... Browse Code »

This patch adds the exported dm_suspended() function so that targets
can check whether or not they are suspended.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Cc: Mike Anderson
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-12-11 07:52:27 +0800
4f186f8bb dm: rename dm_suspended to dm_suspended_md ... Browse Code »

This patch renames dm_suspended() to dm_suspended_md() and
keeps it internal to dm.
No functional change.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Cc: Mike Anderson
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-12-11 07:52:26 +0800
042d2a9bc dm: keep old table until after resume succeeded ... Browse Code »

When swapping a new table into place, retain the old table until
its replacement is in place.

An old check for an empty table is removed because this is enforced
in populate_table().

__unbind() becomes redundant when followed by __bind().

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2009-12-11 07:52:24 +0800
7c6664114 dm: rename dm_get_table to dm_get_live_table ... Browse Code »

Rename dm_get_table to dm_get_live_table.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2009-12-11 07:52:19 +0800

05 Sep, 2009

1 commit

40bea4312 dm stripe: expose correct io hints ... Browse Code »

Set sensible I/O hints for striped DM devices in the topology
infrastructure added for 2.6.31 for userspace tools to
obtain via sysfs.

Add .io_hints to 'struct target_type' to allow the I/O hints portion
(io_min and io_opt) of the 'struct queue_limits' to be set by each
target and implement this for dm-stripe.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-09-05 03:40:25 +0800

24 Jul, 2009

1 commit

5dea271b6 dm table: pass correct dev area size to device_area_is_valid ... Browse Code »

Incorrect device area lengths are being passed to device_area_is_valid().

The regression appeared in 2.6.31-rc1 through commit
754c5fc7ebb417b23601a6222a6005cc2e7f2913.

With the dm-stripe target, the size of the target (ti->len) was used
instead of the stripe_width (ti->len/#stripes). An example of a
consequent incorrect error message is:

device-mapper: table: 254:0: sdb too small for target

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-07-24 03:30:42 +0800

22 Jun, 2009

5 commits

cec47e3d4 dm: prepare for request based option ... Browse Code »

This patch adds core functions for request-based dm.

When struct mapped device (md) is initialized, md->queue has
an I/O scheduler and the following functions are used for
request-based dm as the queue functions:
make_request_fn: dm_make_request()
pref_fn: dm_prep_fn()
request_fn: dm_request_fn()
softirq_done_fn: dm_softirq_done()
lld_busy_fn: dm_lld_busy()
Actual initializations are done in another patch (PATCH 2).

Below is a brief summary of how request-based dm behaves, including:
- making request from bio
- cloning, mapping and dispatching request
- completing request and bio
- suspending md
- resuming md

bio to request
==============
md->queue->make_request_fn() (dm_make_request()) calls __make_request()
for a bio submitted to the md.
Then, the bio is kept in the queue as a new request or merged into
another request in the queue if possible.

Cloning and Mapping
===================
Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()),
when requests are dispatched after they are sorted by the I/O scheduler.

dm_request_fn() checks busy state of underlying devices using
target's busy() function and stops dispatching requests to keep them
on the dm device's queue if busy.
It helps better I/O merging, since no merge is done for a request
once it is dispatched to underlying devices.

Actual cloning and mapping are done in dm_prep_fn() and map_request()
called from dm_request_fn().
dm_prep_fn() clones not only request but also bios of the request
so that dm can hold bio completion in error cases and prevent
the bio submitter from noticing the error.
(See the "Completion" section below for details.)

After the cloning, the clone is mapped by target's map_rq() function
and inserted to underlying device's queue using
blk_insert_cloned_request().

Completion
==========
Request completion can be hooked by rq->end_io(), but then, all bios
in the request will have been completed even error cases, and the bio
submitter will have noticed the error.
To prevent the bio completion in error cases, request-based dm clones
both bio and request and hooks both bio->bi_end_io() and rq->end_io():
bio->bi_end_io(): end_clone_bio()
rq->end_io(): end_clone_request()

Summary of the request completion flow is below:
blk_end_request() for a clone request
=> blk_update_request()
=> bio->bi_end_io() == end_clone_bio() for each clone bio
=> Free the clone bio
=> Success: Complete the original bio (blk_update_request())
Error: Don't complete the original bio
=> blk_finish_request()
=> rq->end_io() == end_clone_request()
=> blk_complete_request()
=> dm_softirq_done()
=> Free the clone request
=> Success: Complete the original request (blk_end_request())
Error: Requeue the original request

end_clone_bio() completes the original request on the size of
the original bio in successful cases.
Even if all bios in the original request are completed by that
completion, the original request must not be completed yet to keep
the ordering of request completion for the stacking.
So end_clone_bio() uses blk_update_request() instead of
blk_end_request().
In error cases, end_clone_bio() doesn't complete the original bio.
It just frees the cloned bio and gives over the error handling to
end_clone_request().

end_clone_request(), which is called with queue lock held, completes
the clone request and the original request in a softirq context
(dm_softirq_done()), which has no queue lock, to avoid a deadlock
issue on submission of another request during the completion:
- The submitted request may be mapped to the same device
- Request submission requires queue lock, but the queue lock
has been held by itself and it doesn't know that

The clone request has no clone bio when dm_softirq_done() is called.
So target drivers can't resubmit it again even error cases.
Instead, they can ask dm core for requeueing and remapping
the original request in that cases.

suspend
=======
Request-based dm uses stopping md->queue as suspend of the md.
For noflush suspend, just stops md->queue.

For flush suspend, inserts a marker request to the tail of md->queue.
And dispatches all requests in md->queue until the marker comes to
the front of md->queue. Then, stops dispatching request and waits
for the all dispatched requests to complete.
After that, completes the marker request, stops md->queue and
wake up the waiter on the suspend queue, md->wait.

resume
======
Starts md->queue.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:35 +0800
754c5fc7e dm: calculate queue limits during resume not load ... Browse Code »

Currently, device-mapper maintains a separate instance of 'struct
queue_limits' for each table of each device. When the configuration of
a device is to be changed, first its table is loaded and this structure
is populated, then the device is 'resumed' and the calculated
queue_limits are applied.

This places restrictions on how userspace may process related devices,
where it is often advantageous to 'load' tables for several devices
at once before 'resuming' them together. As the new queue_limits
only take effect after the 'resume', if they are changing and one
device uses another, the latter must be 'resumed' before the former
may be 'loaded'.

This patch moves the calculation of these queue_limits out of
the 'load' operation into 'resume'. Since we are no longer
pre-calculating this struct, we no longer need to maintain copies
within our dm structs.

dm_set_device_limits() now passes the 'start' of the device's
data area (aka pe_start) as the 'offset' to blk_stack_limits().

init_valid_queue_limits() is replaced by blk_set_default_limits().

Signed-off-by: Mike Snitzer
Cc: martin.petersen@oracle.com
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-06-22 17:12:34 +0800
af4874e03 dm target:s introduce iterate devices fn ... Browse Code »

Add .iterate_devices to 'struct target_type' to allow a function to be
called for all devices in a DM target. Implemented it for all targets
except those in dm-snap.c (origin and snapshot).

(The raid1 version number jumps to 1.12 because we originally reserved
1.1 to 1.11 for 'block_on_error' but ended up using 'handle_errors'
instead.)

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon
Cc: martin.petersen@oracle.com

Mike Snitzer
2009-06-22 17:12:33 +0800
5ab97588f dm table: replace struct io_restrictions with struct queue_limits ... Browse Code »

Use blk_stack_limits() to stack block limits (including topology) rather
than duplicate the equivalent within Device Mapper.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-06-22 17:12:32 +0800
f9ab94cee dm: introduce num_flush_requests ... Browse Code »

Introduce num_flush_requests for a target to set to say how many flush
instructions (empty barriers) it wants to receive. These are sent by
__clone_and_map_empty_barrier with map_info->flush_request going from 0
to (num_flush_requests - 1).

Old targets without flush support won't receive any flush requests.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-06-22 17:12:20 +0800

23 May, 2009

1 commit

e1defc4ff block: Do away with the notion of hardsect_size ... Browse Code »

Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case. The
sector size will be 4KB but the logical block size will remain
512-bytes. Hence we need to distinguish between the physical block size
and the logical ditto.

This patch renames hardsect_size to logical_block_size.

Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Martin K. Petersen
2009-05-23 05:22:54 +0800

09 Apr, 2009

1 commit

692d0eb9e dm: remove limited barrier support ... Browse Code »

Prepare for full barrier implementation: first remove the restricted support.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-09 07:27:13 +0800

03 Apr, 2009

1 commit

45194e4f8 dm target: remove struct tt_internal ... Browse Code »

The tt_internal is really just a list_head to manage registered target_type
in a double linked list,

Here embed the list_head into target_type directly,
1. to avoid kmalloc/kfree;
2. then tt_internal is really unneeded;

Cc: stable@kernel.org
Signed-off-by: Cheng Renquan
Signed-off-by: Alasdair G Kergon
Reviewed-by: Alasdair G Kergon

Cheng Renquan
2009-04-03 02:55:28 +0800

06 Jan, 2009

3 commits

ab4c14248 dm: support barriers on simple devices ... Browse Code »

Implement barrier support for single device DM devices

This patch implements barrier support in DM for the common case of dm linear
just remapping a single underlying device. In this case we can safely
pass the barrier through because there can be no reordering between
devices.

NB. Any DM device might cease to support barriers if it gets
reconfigured so code must continue to allow for a possible
-EOPNOTSUPP on every barrier bio submitted. - agk

Signed-off-by: Andi Kleen
Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Andi Kleen
2009-01-06 11:05:09 +0800
7d76345da dm request: extend target interface ... Browse Code »

This patch adds the following target interfaces for request-based dm.

map_rq : for mapping a request

rq_end_io : for finishing a request

busy : for avoiding performance regression from bio-based dm.
Target can tell dm core not to map requests now, and
that may help requests in the block layer queue to be
bigger by I/O merging.
In bio-based dm, this behavior is done by device
drivers managing the block layer queue.
But in request-based dm, dm core has to do that
since dm core manages the block layer queue.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-01-06 11:05:07 +0800
10d3bd09a dm: consolidate target deregistration error handling ... Browse Code »

Change dm_unregister_target to return void and use BUG() for error
reporting.

dm_unregister_target can only fail because of programming bug in the
target driver. It can't fail because of user's behavior or disk errors.

This patch changes unregister_target to return void and use BUG if
someone tries to unregister non-registered target or unregister target
that is in use.

This patch removes code duplication (testing of error codes in all dm
targets) and reports bugs in just one place, in dm_unregister_target. In
some target drivers, these return codes were ignored, which could lead
to a situation where bugs could be missed.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-01-06 11:04:58 +0800

24 Oct, 2008

1 commit

224848564 Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
[PATCH] kill the rest of struct file propagation in block ioctls
[PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
[PATCH] get rid of blkdev_locked_ioctl()
[PATCH] get rid of blkdev_driver_ioctl()
[PATCH] sanitize blkdev_get() and friends
[PATCH] remember mode of reiserfs journal
[PATCH] propagate mode through swsusp_close()
[PATCH] propagate mode through open_bdev_excl/close_bdev_excl
[PATCH] pass fmode_t to blkdev_put()
[PATCH] kill the unused bsize on the send side of /dev/loop
[PATCH] trim file propagation in block/compat_ioctl.c
[PATCH] end of methods switch: remove the old ones
[PATCH] switch sr
[PATCH] switch sd
[PATCH] switch ide-scsi
[PATCH] switch tape_block
[PATCH] switch dcssblk
[PATCH] switch dasd
[PATCH] switch mtd_blkdevs
[PATCH] switch mmc
...

Linus Torvalds
2008-10-24 01:23:07 +0800

22 Oct, 2008

1 commit

d63a5ce3c dm: publish array_too_big ... Browse Code »

Move array_too_big to include/linux/device-mapper.h because it is
used by targets.

Remove the test from dm-raid1 as the number of mirror legs is limited
such that it can never fail. (Even for stripes it seems rather
unlikely.)

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-10-22 00:44:57 +0800

21 Oct, 2008

2 commits

647b3d008 [PATCH] lose unused arguments in dm ioctl callbacks ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-10-21 19:47:18 +0800
aeb5d7270 [PATCH] introduce fmode_t, do annotations ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-10-21 19:47:06 +0800

10 Oct, 2008

4 commits

541609042 dm: publish dm_vcalloc ... Browse Code »

Publish dm_vcalloc in include/linux/device-mapper.h because this function is
used by targets.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-10-10 20:37:12 +0800
ea0ec6409 dm: publish dm_table_unplug_all ... Browse Code »

Publish dm_table_unplug_all in include/linux/device-mapper.h because this
function is used by targets.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-10-10 20:37:11 +0800
89343da07 dm: publish dm_get_mapinfo ... Browse Code »

Publish dm_get_mapinfo in include/linux/device-mapper.h because this function
is used by targets.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-10-10 20:37:10 +0800
82b1519b3 dm: export struct dm_dev ... Browse Code »

Split struct dm_dev in two and publish the part that other targets need in
include/linux/device-mapper.h.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-10-10 20:37:09 +0800

21 Jul, 2008

1 commit

f6fccb121 dm: introduce merge_bvec_fn ... Browse Code »

Introduce a bvec merge function for device mapper devices
for dynamic size restrictions.

This code ensures the requested biovec lies within a single
target and then calls a target-specific function to check
against any constraints imposed by underlying devices.

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2008-07-21 19:00:37 +0800

30 Apr, 2008

1 commit

735643ee6 Remove "#ifdef __KERNEL__" checks from unexported headers ... Browse Code »

Remove the "#ifdef __KERNEL__" tests from unexported header files in
linux/include whose entire contents are wrapped in that preprocessor
test.

Signed-off-by: Robert P. J. Day
Cc: David Woodhouse
Cc: Sam Ravnborg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robert P. J. Day
2008-04-30 23:29:54 +0800

25 Apr, 2008

3 commits

4fdfe401e dm table: remove unused dm_create_error_table ... Browse Code »

dm_create_error_table() was added in kernel 2.6.18 and never used...

Signed-off-by: Adrian Bunk
Signed-off-by: Alasdair G Kergon

Adrian Bunk
2008-04-25 20:27:00 +0800
0da336e5f dm: expose macros ... Browse Code »

Make dm.h macros and inlines available in include/linux/device-mapper.h

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2008-04-25 20:26:53 +0800
416cd17b1 dm log: clean interface ... Browse Code »

Clean up the dm-log interface to prepare for publishing it in include/linux.

Signed-off-by: Heinz Mauelshagen
Signed-off-by: Alasdair G Kergon

Heinz Mauelshagen
2008-04-25 20:26:46 +0800

08 Feb, 2008

1 commit

4f41b09f8 dm: table remove unused variable ... Browse Code »

Save some bytes.

Signed-off-by: Vasily Averin
Signed-off-by: Alasdair G Kergon

Vasily Averin
2008-02-08 10:10:01 +0800

21 Dec, 2007

1 commit

91212507f dm: merge max_hw_sector ... Browse Code »

Make sure dm honours max_hw_sectors of underlying devices

We still have no firm testing evidence in support of this patch but
believe it may help to resolve some bug reports. - agk

Signed-off-by: Neil Brown
Signed-off-by: Alasdair G Kergon

Neil Brown
2007-12-21 01:32:12 +0800

02 Nov, 2007

1 commit

5ec140e60 dm: bounce_pfn limit added ... Browse Code »

Device mapper uses its own bounce_pfn that may differ from one on underlying
device. In that way dm can build incorrect requests that contain sg elements
greater than underlying device is able to handle.

This is the cause of slab corruption in i2o layer, occurred on i386 arch when
very long direct IO requests are addressed to dm-over-i2o device.

Signed-off-by: Vasily Averin
Cc:
Cc: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe

Vasily Averin
2007-11-02 15:47:25 +0800

20 Oct, 2007

2 commits

7a8c3d3b9 dm: uevent generate events ... Browse Code »

This patch adds support for the dm_path_event dm_send_event functions which
create and send udev events.

Signed-off-by: Mike Anderson
Signed-off-by: Alasdair G Kergon

Mike Anderson
2007-10-20 09:01:26 +0800
96a1f7dba dm: export name and uuid ... Browse Code »

This patch adds a function to obtain a copy of a mapped device's name and uuid.

Signed-off-by: Mike Anderson
Signed-off-by: Alasdair G Kergon

Mike Anderson
2007-10-20 09:01:23 +0800

09 Dec, 2006

1 commit

2e93ccc19 [PATCH] dm: suspend: add noflush pushback ... Browse Code »

In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.

This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.

DMF_NOFLUSH_SUSPENDING
----------------------

If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.

The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.

Target drivers can check this flag by calling dm_noflush_suspending().

DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------

A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.

Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.

The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.

dec_pending
-----------

dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.

It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.

pushdback list and pushback_lock
--------------------------------

The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.

md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.

Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.

Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.

The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.

Other notes on the current patch
--------------------------------

- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.

- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.

- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())

Test results
------------

I have tested using multipath target with the next patch.

The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.

The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.

The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kiyoshi Ueda
2006-12-09 00:29:09 +0800