Eric Lee / smarc-fsl-linux-kernel

01 Nov, 2011

1 commit

36a0456fb dm table: add immutable feature ... Browse Code »

Introduce DM_TARGET_IMMUTABLE to indicate that the target type cannot be mixed
with any other target type, and once loaded into a device, it cannot be
replaced with a table containing a different type.

The thin provisioning pool device will use this.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2011-11-01 04:19:04 +0800

02 Aug, 2011

1 commit

d5b9dd04b dm: ignore merge_bvec for snapshots when safe ... Browse Code »

Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate
whether the device can accept bios larger than the size its merge
function returns. When set, use this to send large bios to snapshots
which can split them if necessary. Snapshot I/O may be significantly
fragmented and this approach seems to improve peformance.

Before the patch, dm_set_device_limits restricted bio size to page size
if the underlying device had a merge function and the target didn't
provide a merge function. After the patch, dm_set_device_limits
restricts bio size to page size if the underlying device has a merge
function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't
provide a merge function.

The snapshot target can't provide a merge function because when the merge
function is called, it is impossible to determine where the bio will be
remapped. Previously this led us to impose a 4k limit, which we can
now remove if the snapshot store is located on a device without a merge
function. Together with another patch for optimizing full chunk writes,
it improves performance from 29MB/s to 40MB/s when writing to the
filesystem on snapshot store.

If the snapshot store is placed on a non-dm device with a merge function
(such as md-raid), device mapper still limits all bios to page size.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-08-02 19:32:04 +0800

17 Mar, 2011

1 commit

a91a2785b block: Require subsystems to explicitly allocate bio_set integrity mempool ... Browse Code »

MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.

Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.

Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.

Signed-off-by: Martin K. Petersen
Reported-by: Zdenek Kabelac
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe

Martin K. Petersen
2011-03-17 18:11:05 +0800

12 Aug, 2010

5 commits

5ae89a872 dm: linear support discard ... Browse Code »

Allow discards to be passed through to linear mappings if at least one
underlying device supports it. Discards will be forwarded only to
devices that support them.

A target that supports discards should set num_discard_requests to
indicate how many times each discard request must be submitted to it.

Verify table's underlying devices support discards prior to setting the
associated DM device as capable of discards (via QUEUE_FLAG_DISCARD).

Signed-off-by: Mike Snitzer
Signed-off-by: Mikulas Patocka
Reviewed-by: Joe Thornber
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-08-12 11:14:08 +0800
26803b9f0 dm ioctl: refactor dm_table_complete ... Browse Code »

This change unifies the various checks and finalization that occurs on a
table prior to use. By doing so, it allows table construction without
traversing the dm-ioctl interface.

Signed-off-by: Will Drewry
Signed-off-by: Alasdair G Kergon

Will Drewry
2010-08-12 11:14:03 +0800
4a0b4ddf2 dm: do not initialise full request queue when bio based ... Browse Code »

Change bio-based mapped devices no longer to have a fully initialized
request_queue (request_fn, elevator, etc). This means bio-based DM
devices no longer register elevator sysfs attributes ('iosched/' tree
or 'scheduler' other than "none").

In contrast, a request-based DM device will continue to have a full
request_queue and will register elevator sysfs attributes. Therefore
a user can determine a DM device's type by checking if elevator sysfs
attributes exist.

First allocate a minimalist request_queue structure for a DM device
(needed for both bio and request-based DM).

Initialization of a full request_queue is deferred until it is known
that the DM device is request-based, at the end of the table load
sequence.

Factor DM device's request_queue initialization:
- common to both request-based and bio-based into dm_init_md_queue().
- specific to request-based into dm_init_request_based_queue().

The md->type_lock mutex is used to protect md->queue, in addition to
md->type, during table_load().

A DM device's first table_load will establish the immutable md->type.
But md->queue initialization, based on md->type, may fail at that time
(because blk_init_allocated_queue cannot allocate memory). Therefore
any subsequent table_load must (re)try dm_setup_md_queue independently of
establishing md->type.

Signed-off-by: Mike Snitzer
Acked-by: Kiyoshi Ueda
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-08-12 11:14:02 +0800
a5664dad7 dm ioctl: make bio or request based device type immutable ... Browse Code »

Determine whether a mapped device is bio-based or request-based when
loading its first (inactive) table and don't allow that to be changed
later.

This patch performs different device initialisation in each of the two
cases. (We don't think it's necessary to add code to support changing
between the two types.)

Allowed md->type transitions:
DM_TYPE_NONE to DM_TYPE_BIO_BASED
DM_TYPE_NONE to DM_TYPE_REQUEST_BASED

We now prevent table_load from replacing the inactive table with a
conflicting type of table even after an explicit table_clear.

Introduce 'type_lock' into the struct mapped_device to protect md->type
and to prepare for the next patch that will change the queue
initialization and allocate memory while md->type_lock is held.

Signed-off-by: Mike Snitzer
Acked-by: Kiyoshi Ueda
Signed-off-by: Alasdair G Kergon

drivers/md/dm-ioctl.c | 15 +++++++++++++++
drivers/md/dm.c | 37 ++++++++++++++++++++++++++++++-------
drivers/md/dm.h | 5 +++++
include/linux/dm-ioctl.h | 4 ++--
4 files changed, 52 insertions(+), 9 deletions(-)

Mike Snitzer
2010-08-12 11:14:01 +0800
3f77316de dm: separate device deletion from dm_put ... Browse Code »

This patch separates the device deletion code from dm_put()
to make sure the deletion happens in the process context.

By this patch, device deletion always occurs in an ioctl (process)
context and dm_put() can be called in interrupt context.
As a result, the request-based dm's bad dm_put() usage pointed out
by Mikulas below disappears.
http://marc.info/?l=dm-devel&m=126699981019735&w=2

Without this patch, I confirmed there is a case to crash the system:
dm_put() => dm_table_destroy() => vfree() => BUG_ON(in_interrupt())

Some more backgrounds and details:
In request-based dm, a device opener can remove a mapped_device
while the last request is still completing, because bios in the last
request complete first and then the device opener can close and remove
the mapped_device before the last request completes:
CPU0 CPU1
=================================================================
<>
blk_end_request_all(clone_rq)
blk_update_request(clone_rq)
bio_endio(clone_bio) == end_clone_bio
blk_update_request(orig_rq)
bio_endio(orig_bio)
<>
dm_blk_close()
dev_remove()
dm_put(md)
<>
blk_finish_request(clone_rq)
....
dm_end_request(clone_rq)
free_rq_clone(clone_rq)
blk_end_request_all(orig_rq)
rq_completed(md)

So request-based dm used dm_get()/dm_put() to hold md for each I/O
until its request completion handling is fully done.
However, the final dm_put() can call the device deletion code which
must not be run in interrupt context and may cause kernel panic.

To solve the problem, this patch moves the device deletion code,
dm_destroy(), to predetermined places that is actually deleting
the mapped_device in ioctl (process) context, and changes dm_put()
just to decrement the reference count of the mapped_device.
By this change, dm_put() can be used in any context and the symmetric
model below is introduced:
dm_create(): create a mapped_device
dm_destroy(): destroy a mapped_device
dm_get(): increment the reference count of a mapped_device
dm_put(): decrement the reference count of a mapped_device

dm_destroy() waits for all references of the mapped_device to disappear,
then deletes the mapped_device.

dm_destroy() uses active waiting with msleep(1), since deleting
the mapped_device isn't performance-critical task.
And since at this point, nobody opens the mapped_device and no new
reference will be taken, the pending counts are just for racing
completing activity and will eventually decrease to zero.

For the unlikely case of the forced module unload, dm_destroy_immediate(),
which doesn't wait and forcibly deletes the mapped_device, is also
introduced and used in dm_hash_remove_all(). Otherwise, "rmmod -f"
may be stuck and never return.
And now, because the mapped_device is deleted at this point, subsequent
accesses to the mapped_device may cause NULL pointer references.

Cc: stable@kernel.org
Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2010-08-12 11:13:56 +0800

06 Mar, 2010

1 commit

3abf85b5b dm ioctl: introduce flag indicating uevent was generated ... Browse Code »

Set a new DM_UEVENT_GENERATED_FLAG when returning from ioctls to
indicate that a uevent was actually generated. This tells the userspace
caller that it may need to wait for the event to be processed.

Signed-off-by: Peter Rajnoha
Signed-off-by: Alasdair G Kergon

Peter Rajnoha
2010-03-06 10:32:31 +0800

11 Dec, 2009

3 commits

4f186f8bb dm: rename dm_suspended to dm_suspended_md ... Browse Code »

This patch renames dm_suspended() to dm_suspended_md() and
keeps it internal to dm.
No functional change.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Cc: Mike Anderson
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-12-11 07:52:26 +0800

432a212c0 dm: add dm_deleting_md function ... Browse Code »

Add dm_deleting_md to check whether or not a given mapped
device is currently being deleted.

Signed-off-by: Mike Anderson
Signed-off-by: Alasdair G Kergon

Mike Anderson
2009-12-11 07:52:20 +0800

952b35576 dm io: use slab for struct io ... Browse Code »

Allocate "struct io" from a slab.

This patch changes dm-io, so that "struct io" is allocated from a slab cache.
It used to be allocated with kmalloc. Allocating from a slab will be needed
for the next patch, because it requires a special alignment of "struct io"
and kmalloc cannot meet this alignment.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:51:57 +0800

24 Jul, 2009

1 commit

a732c207d dm: remove queue next_ordered workaround for barriers ... Browse Code »

This patch removes DM's bio-based vs request-based conditional setting
of next_ordered. For bio-based DM the next_ordered check is no longer a
concern (as that check is now in the __make_request path). For
request-based DM the default of QUEUE_ORDERED_NONE is now appropriate.

bio-based DM was changed to work-around the previously misplaced
next_ordered check with this commit:
99360b4c18f7675b50d283301d46d755affe75fd

request-based DM does not yet support barriers but reacted to the above
bio-based DM change with this commit:
5d67aa2366ccb8257d103d0b43df855605c3c086

The above changes are no longer needed given Neil Brown's recent fix to
put the next_ordered check in the __make_request path:
db64f680ba4b5c56c4be59f0698000df89ff0281

Signed-off-by: Mike Snitzer
Cc: Jun'ichi Nomura
Cc: NeilBrown
Acked-by: Kiyoshi Ueda
Acked-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-07-24 03:30:40 +0800

22 Jun, 2009

5 commits

5d67aa236 dm: do not set QUEUE_ORDERED_DRAIN if request based ... Browse Code »

Request-based dm doesn't have barrier support yet.
So we need to set QUEUE_ORDERED_DRAIN only for bio-based dm.
Since the device type is decided at the first table loading time,
the flag set is deferred until then.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Acked-by: Hannes Reinecke
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:36 +0800

e6ee8c0b7 dm: enable request based option ... Browse Code »

This patch enables request-based dm.

o Request-based dm and bio-based dm coexist, since there are
some target drivers which are more fitting to bio-based dm.
Also, there are other bio-based devices in the kernel
(e.g. md, loop).
Since bio-based device can't receive struct request,
there are some limitations on device stacking between
bio-based and request-based.

type of underlying device
bio-based request-based
----------------------------------------------
bio-based OK OK
request-based -- OK

The device type is recognized by the queue flag in the kernel,
so dm follows that.

o The type of a dm device is decided at the first table binding time.
Once the type of a dm device is decided, the type can't be changed.

o Mempool allocations are deferred to at the table loading time, since
mempools for request-based dm are different from those for bio-based
dm and needed mempool type is fixed by the type of table.

o Currently, request-based dm supports only tables that have a single
target. To support multiple targets, we need to support request
splitting or prevent bio/request from spanning multiple targets.
The former needs lots of changes in the block layer, and the latter
needs that all target drivers support merge() function.
Both will take a time.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:36 +0800

cec47e3d4 dm: prepare for request based option ... Browse Code »

This patch adds core functions for request-based dm.

When struct mapped device (md) is initialized, md->queue has
an I/O scheduler and the following functions are used for
request-based dm as the queue functions:
make_request_fn: dm_make_request()
pref_fn: dm_prep_fn()
request_fn: dm_request_fn()
softirq_done_fn: dm_softirq_done()
lld_busy_fn: dm_lld_busy()
Actual initializations are done in another patch (PATCH 2).

Below is a brief summary of how request-based dm behaves, including:
- making request from bio
- cloning, mapping and dispatching request
- completing request and bio
- suspending md
- resuming md

bio to request
==============
md->queue->make_request_fn() (dm_make_request()) calls __make_request()
for a bio submitted to the md.
Then, the bio is kept in the queue as a new request or merged into
another request in the queue if possible.

Cloning and Mapping
===================
Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()),
when requests are dispatched after they are sorted by the I/O scheduler.

dm_request_fn() checks busy state of underlying devices using
target's busy() function and stops dispatching requests to keep them
on the dm device's queue if busy.
It helps better I/O merging, since no merge is done for a request
once it is dispatched to underlying devices.

Actual cloning and mapping are done in dm_prep_fn() and map_request()
called from dm_request_fn().
dm_prep_fn() clones not only request but also bios of the request
so that dm can hold bio completion in error cases and prevent
the bio submitter from noticing the error.
(See the "Completion" section below for details.)

After the cloning, the clone is mapped by target's map_rq() function
and inserted to underlying device's queue using
blk_insert_cloned_request().

Completion
==========
Request completion can be hooked by rq->end_io(), but then, all bios
in the request will have been completed even error cases, and the bio
submitter will have noticed the error.
To prevent the bio completion in error cases, request-based dm clones
both bio and request and hooks both bio->bi_end_io() and rq->end_io():
bio->bi_end_io(): end_clone_bio()
rq->end_io(): end_clone_request()

Summary of the request completion flow is below:
blk_end_request() for a clone request
=> blk_update_request()
=> bio->bi_end_io() == end_clone_bio() for each clone bio
=> Free the clone bio
=> Success: Complete the original bio (blk_update_request())
Error: Don't complete the original bio
=> blk_finish_request()
=> rq->end_io() == end_clone_request()
=> blk_complete_request()
=> dm_softirq_done()
=> Free the clone request
=> Success: Complete the original request (blk_end_request())
Error: Requeue the original request

end_clone_bio() completes the original request on the size of
the original bio in successful cases.
Even if all bios in the original request are completed by that
completion, the original request must not be completed yet to keep
the ordering of request completion for the stacking.
So end_clone_bio() uses blk_update_request() instead of
blk_end_request().
In error cases, end_clone_bio() doesn't complete the original bio.
It just frees the cloned bio and gives over the error handling to
end_clone_request().

end_clone_request(), which is called with queue lock held, completes
the clone request and the original request in a softirq context
(dm_softirq_done()), which has no queue lock, to avoid a deadlock
issue on submission of another request during the completion:
- The submitted request may be mapped to the same device
- Request submission requires queue lock, but the queue lock
has been held by itself and it doesn't know that

The clone request has no clone bio when dm_softirq_done() is called.
So target drivers can't resubmit it again even error cases.
Instead, they can ask dm core for requeueing and remapping
the original request in that cases.

suspend
=======
Request-based dm uses stopping md->queue as suspend of the md.
For noflush suspend, just stops md->queue.

For flush suspend, inserts a marker request to the tail of md->queue.
And dispatches all requests in md->queue until the marker comes to
the front of md->queue. Then, stops dispatching request and waits
for the all dispatched requests to complete.
After that, completes the marker request, stops md->queue and
wake up the waiter on the suspend queue, md->wait.

resume
======
Starts md->queue.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2009-06-22 17:12:35 +0800

754c5fc7e dm: calculate queue limits during resume not load ... Browse Code »

Currently, device-mapper maintains a separate instance of 'struct
queue_limits' for each table of each device. When the configuration of
a device is to be changed, first its table is loaded and this structure
is populated, then the device is 'resumed' and the calculated
queue_limits are applied.

This places restrictions on how userspace may process related devices,
where it is often advantageous to 'load' tables for several devices
at once before 'resuming' them together. As the new queue_limits
only take effect after the 'resume', if they are changing and one
device uses another, the latter must be 'resumed' before the former
may be 'loaded'.

This patch moves the calculation of these queue_limits out of
the 'load' operation into 'resume'. Since we are no longer
pre-calculating this struct, we no longer need to maintain copies
within our dm structs.

dm_set_device_limits() now passes the 'start' of the device's
data area (aka pe_start) as the 'offset' to blk_stack_limits().

init_valid_queue_limits() is replaced by blk_set_default_limits().

Signed-off-by: Mike Snitzer
Cc: martin.petersen@oracle.com
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-06-22 17:12:34 +0800

60935eb21 dm ioctl: support cookies for udev ... Browse Code »

Add support for passing a 32 bit "cookie" into the kernel with the
DM_SUSPEND, DM_DEV_RENAME and DM_DEV_REMOVE ioctls. The (unsigned)
value of this cookie is returned to userspace alongside the uevents
issued by these ioctls in the variable DM_COOKIE.

This means the userspace process issuing these ioctls can be notified
by udev after udev has completed any actions triggered.

To minimise the interface extension, we pass the cookie into the
kernel in the event_nr field which is otherwise unused when calling
these ioctls. Incrementing the version number allows userspace to
determine in advance whether or not the kernel supports the cookie.
If the kernel does support this but userspace does not, there should
be no impact as the new variable will just get ignored.

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2009-06-22 17:12:30 +0800

09 Apr, 2009

1 commit

692d0eb9e dm: remove limited barrier support ... Browse Code »

Prepare for full barrier implementation: first remove the restricted support.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-09 07:27:13 +0800

03 Apr, 2009

1 commit

45194e4f8 dm target: remove struct tt_internal ... Browse Code »

The tt_internal is really just a list_head to manage registered target_type
in a double linked list,

Here embed the list_head into target_type directly,
1. to avoid kmalloc/kfree;
2. then tt_internal is really unneeded;

Cc: stable@kernel.org
Signed-off-by: Cheng Renquan
Signed-off-by: Alasdair G Kergon
Reviewed-by: Alasdair G Kergon

Cheng Renquan
2009-04-03 02:55:28 +0800

06 Jan, 2009

3 commits

784aae735 dm: add name and uuid to sysfs ... Browse Code »

Implement simple read-only sysfs entry for device-mapper block device.

This patch adds a simple sysfs directory named "dm" under block device
properties and implements
- name attribute (string containing mapped device name)
- uuid attribute (string containing UUID, or empty string if not set)

The kobject is embedded in mapped_device struct, so no additional
memory allocation is needed for initializing sysfs entry.

During the processing of sysfs attribute we need to lock mapped device
which is done by a new function dm_get_from_kobj, which returns the md
associated with kobject and increases the usage count.

Each 'show attribute' function is responsible for its own locking.

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2009-01-06 11:05:12 +0800

d58168763 dm table: rework reference counting ... Browse Code »

Rework table reference counting.

The existing code uses a reference counter. When the last reference is
dropped and the counter reaches zero, the table destructor is called.
Table reference counters are acquired/released from upcalls from other
kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all).
If the reference counter reaches zero in one of the upcalls, the table
destructor is called from almost random kernel code.

This leads to various problems:
* dm_any_congested being called under a spinlock, which calls the
destructor, which calls some sleeping function.
* the destructor attempting to take a lock that is already taken by the
same process.
* stale reference from some other kernel code keeps the table
constructed, which keeps some devices open, even after successful
return from "dmsetup remove". This can confuse lvm and prevent closing
of underlying devices or reusing device minor numbers.

The patch changes reference counting so that the table destructor can be
called only at predetermined places.

The table has always exactly one reference from either mapped_device->map
or hash_cell->new_map. After this patch, this reference is not counted
in table->holders. A pair of dm_create_table/dm_destroy_table functions
is used for table creation/destruction.

Temporary references from the other code increase table->holders. A pair
of dm_table_get/dm_table_put functions is used to manipulate it.

When the table is about to be destroyed, we wait for table->holders to
reach 0. Then, we call the table destructor. We use active waiting with
msleep(1), because the situation happens rarely (to one user in 5 years)
and removing the device isn't performance-critical task: the user doesn't
care if it takes one tick more or not.

This way, the destructor is called only at specific points
(dm_table_destroy function) and the above problems associated with lazy
destruction can't happen.

Finally remove the temporary protection added to dm_any_congested().

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-01-06 11:05:10 +0800

ab4c14248 dm: support barriers on simple devices ... Browse Code »

Implement barrier support for single device DM devices

This patch implements barrier support in DM for the common case of dm linear
just remapping a single underlying device. In this case we can safely
pass the barrier through because there can be no reordering between
devices.

NB. Any DM device might cease to support barriers if it gets
reconfigured so code must continue to allow for a possible
-EOPNOTSUPP on every barrier bio submitted. - agk

Signed-off-by: Andi Kleen
Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Andi Kleen
2009-01-06 11:05:09 +0800

22 Oct, 2008

1 commit

d63a5ce3c dm: publish array_too_big ... Browse Code »

Move array_too_big to include/linux/device-mapper.h because it is
used by targets.

Remove the test from dm-raid1 as the number of mirror legs is limited
such that it can never fail. (Even for stripes it seems rather
unlikely.)

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-10-22 00:44:57 +0800

10 Oct, 2008

4 commits

541609042 dm: publish dm_vcalloc ... Browse Code »

Publish dm_vcalloc in include/linux/device-mapper.h because this function is
used by targets.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-10-10 20:37:12 +0800

ea0ec6409 dm: publish dm_table_unplug_all ... Browse Code »

Publish dm_table_unplug_all in include/linux/device-mapper.h because this
function is used by targets.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-10-10 20:37:11 +0800

89343da07 dm: publish dm_get_mapinfo ... Browse Code »

Publish dm_get_mapinfo in include/linux/device-mapper.h because this function
is used by targets.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-10-10 20:37:10 +0800

82b1519b3 dm: export struct dm_dev ... Browse Code »

Split struct dm_dev in two and publish the part that other targets need in
include/linux/device-mapper.h.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-10-10 20:37:09 +0800

21 Jul, 2008

1 commit

c8da2f8dd dm log: make dm_dirty_log init and exit static ... Browse Code »

dm_dirty_log_{init,exit}() can now become static.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Alasdair G Kergon

Adrian Bunk
2008-07-21 19:00:27 +0800

25 Apr, 2008

3 commits

0da336e5f dm: expose macros ... Browse Code »

Make dm.h macros and inlines available in include/linux/device-mapper.h

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2008-04-25 20:26:53 +0800

945fa4d28 dm kcopyd: remove redundant client counting ... Browse Code »

Remove client counting code that is no longer needed.

Initialization and destruction is made globally from dm_init and dm_exit and is
not based on client counts. Initialization allocates only one empty slab cache,
so there is no negative impact from performing the initialization always,
regardless of whether some client uses kcopyd or not.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2008-04-25 20:26:52 +0800

416cd17b1 dm log: clean interface ... Browse Code »

Clean up the dm-log interface to prepare for publishing it in include/linux.

Signed-off-by: Heinz Mauelshagen
Signed-off-by: Alasdair G Kergon

Heinz Mauelshagen
2008-04-25 20:26:46 +0800

21 Dec, 2007

2 commits

69267a30b dm: trigger change uevent on rename ... Browse Code »

Insert a missing KOBJ_CHANGE notification when a device is renamed.

Cc: Scott James Remnant
Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2007-12-21 01:32:11 +0800

512875bd9 dm: table detect io beyond device ... Browse Code »

This patch fixes a panic on shrinking a DM device if there is
outstanding I/O to the part of the device that is being removed.
(Normally this doesn't happen - a filesystem would be resized first,
for example.)

The bug is that __clone_and_map() assumes dm_table_find_target()
always returns a valid pointer. It may fail if a bio arrives from the
block layer but its target sector is no longer included in the DM
btree.

This patch appends an empty entry to table->targets[] which will
be returned by a lookup beyond the end of the device.

After calling dm_table_find_target(), __clone_and_map() and target_message()
check for this condition using
dm_target_is_valid().

Sample test script to trigger oops:

Jun'ichi Nomura
2007-12-21 01:32:08 +0800

16 Oct, 2007

1 commit

fd5d80626 block: convert blkdev_issue_flush() to use empty barriers ... Browse Code »

Then we can get rid of ->issue_flush_fn() and all the driver private
implementations of that.

Signed-off-by: Jens Axboe

Jens Axboe
2007-10-16 17:05:02 +0800

13 Jul, 2007

1 commit

d0d444c7d dm: add ratelimit logging macros ... Browse Code »

Add ratelimit extension to dm logging macros.

Signed-off-by: Jonathan Brassow
Signed-off-by: Alasdair G Kergon
Signed-off-by: Linus Torvalds

Jonathan Brassow
2007-07-13 06:01:08 +0800

09 Dec, 2006

4 commits

2e93ccc19 [PATCH] dm: suspend: add noflush pushback ... Browse Code »

In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.

This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.

DMF_NOFLUSH_SUSPENDING
----------------------

If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.

The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.

Target drivers can check this flag by calling dm_noflush_suspending().

DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------

A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.

Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.

The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.

dec_pending
-----------

dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.

It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.

pushdback list and pushback_lock
--------------------------------

The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.

md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.

Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.

Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.

The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.

Other notes on the current patch
--------------------------------

- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.

- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.

- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())

Test results
------------

I have tested using multipath target with the next patch.

The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.

The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.

The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kiyoshi Ueda
2006-12-09 00:29:09 +0800

81fdb096d [PATCH] dm: ioctl: add noflush suspend ... Browse Code »

Provide a dm ioctl option to request noflush suspending. (See next patch for
what this is for.) As the interface is extended, the version number is
incremented.

Other than accepting the new option through the interface, There is no change
to existing behaviour.

Test results:
Confirmed the option is given from user-space correctly.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kiyoshi Ueda
2006-12-09 00:29:09 +0800

45cbcd798 [PATCH] dm: map and endio return code clarification ... Browse Code »

Tighten the use of return values from the target map and end_io functions.
Values of 2 and above are now explictly reserved for future use. There are no
existing targets using such values.

The patch has no effect on existing behaviour.

o Reserve return values of 2 and above from target map functions.
Any positive value currently indicates "mapping complete", but all
existing drivers use the value 1. We now make that a requirement
so we can assign new meaning to higher values in future.

The new definition of return values from target map functions is:
< 0 : error
= 0 : The target will handle the io (DM_MAPIO_SUBMITTED).
= 1 : Mapping completed (DM_MAPIO_REMAPPED).
> 1 : Reserved (undefined). Previously this was the same as '= 1'.

o Reserve return values of 2 and above from target end_io functions
for similar reasons.
DM_ENDIO_INCOMPLETE is introduced for a return value of 1.

Test results:

I have tested by using the multipath target.

I/Os succeed when valid paths exist.

I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set.

I/Os fail when there are no valid paths and queue_if_no_path is not set.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kiyoshi Ueda
2006-12-09 00:29:09 +0800

a3d77d35b [PATCH] dm: suspend: parameter change ... Browse Code »

Change the interface of dm_suspend() so that we can pass several options
without increasing the number of parameters. The existing 'do_lockfs' integer
parameter is replaced by a flag DM_SUSPEND_LOCKFS_FLAG.

There is no functional change to the code.

Test results:
I have tested 'dmsetup suspend' command with/without the '--nolockfs'
option and confirmed the do_lockfs value is correctly set.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kiyoshi Ueda
2006-12-09 00:29:09 +0800