Eric Lee / smarc-fsl-linux-kernel

29 May, 2011

3 commits

fa34ce730 dm kcopyd: return client directly and not through a pointer ... Browse Code »

Return client directly from dm_kcopyd_client_create, not through a
parameter, making it consistent with dm_io_client_create.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-05-29 20:03:13 +0800
5f43ba295 dm kcopyd: reserve fewer pages ... Browse Code »

Reserve just the minimum of pages needed to process one job.

Because we allocate pages from page allocator, we don't need to reserve
a large number of pages. The maximum job size is SUB_JOB_SIZE and we
calculate the number of reserved pages based on this.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-05-29 20:03:11 +0800
bda8efec5 dm io: use fixed initial mempool size ... Browse Code »

Replace the arbitrary calculation of an initial io struct mempool size
with a constant.

The code calculated the number of reserved structures based on the request
size and used a "magic" multiplication constant of 4. This patch changes
it to reserve a fixed number - itself still chosen quite arbitrarily.
Further testing might show if there is a better number to choose.

Note that if there is no memory pressure, we can still allocate an
arbitrary number of "struct io" structures. One structure is enough to
process the whole request.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2011-05-29 20:03:09 +0800

10 Mar, 2011

1 commit

7eaceacca block: remove per-queue plugging ... Browse Code »
86

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800

14 Jan, 2011

4 commits

9c4376de9 dm: use non reentrant workqueues if equivalent ... Browse Code »

kmirrord_wq, kcopyd_work and md->wq are created per dm instance and
serve only a single work item from the dm instance, so non-reentrant
workqueues would provide the same ordering guarantees as ordered ones
while allowing CPU affinity and use of the workqueues for other
purposes. Switch them to non-reentrant workqueues.

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:58 +0800
4d4d66ab5 dm: convert workqueues to alloc_ordered ... Browse Code »

Convert all create[_singlethread]_work() users to the new
alloc[_ordered]_workqueue(). This conversion is mechanical and
doesn't introduce any behavior change.

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:57 +0800
d5ffa387e dm: dont use flush_scheduled_work ... Browse Code »

flush_scheduled_work() is being deprecated. Flush the used work
directly instead. In all dm targets, the only work which uses
system_wq is ->trigger_event.

Signed-off-by: Tejun Heo
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Tejun Heo
2011-01-14 03:59:56 +0800
5fc2ffeab dm raid1: support discard ... Browse Code »
87

Enable discard support in the DM mirror target.
Also change an existing use of 'bvec' to 'addr' in the union.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2011-01-14 03:59:48 +0800

10 Sep, 2010

1 commit

d87f4c14f dm: implement REQ_FLUSH/FUA support for bio-based dm ... Browse Code »

This patch converts bio-based dm to support REQ_FLUSH/FUA instead of
now deprecated REQ_HARDBARRIER.

* -EOPNOTSUPP handling logic dropped.

* Preflush is handled as before but postflush is dropped and replaced
with passing down REQ_FUA to member request_queues. This replaces
one array wide cache flush w/ member specific FUA writes.

* __split_and_process_bio() now calls __clone_and_map_flush() directly
for flushes and guarantees all FLUSH bio's going to targets are zero
` length.

* It's now guaranteed that all FLUSH bio's which are passed onto dm
targets are zero length. bio_empty_barrier() tests are replaced
with REQ_FLUSH tests.

* Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes.

* Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely
enough to be marked with unlikely().

* Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue
doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA
capability.

* Request based dm isn't converted yet. dm_init_request_based_queue()
resets flush support to 0 for now. To avoid disturbing request
based dm code, dm->flush_error is added for bio based dm while
requested based dm continues to use dm->barrier_error.

Lightly tested linear, stripe, raid1, snap and crypt targets. Please
proceed with caution as I'm not familiar with the code base.

Signed-off-by: Tejun Heo
Cc: dm-devel@redhat.com
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe

Tejun Heo
2010-09-10 18:35:38 +0800

12 Aug, 2010

1 commit

b441a262e dm: use dm_target_offset macro ... Browse Code »

Use new dm_target_offset() macro to avoid most references to ti->begin
in dm targets.

Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2010-08-12 11:14:11 +0800

08 Aug, 2010

1 commit

7b6d91dae block: unify flags for struct bio and struct request ... Browse Code »

Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-08-08 00:20:39 +0800

06 Mar, 2010

3 commits

f07030409 dm raid1: fix deadlock when suspending failed device ... Browse Code »

To prevent deadlock, bios in the hold list should be flushed before
dm_rh_stop_recovery() is called in mirror_suspend().

The recovery can't start because there are pending bios and therefore
dm_rh_stop_recovery deadlocks.

When there are pending bios in the hold list, the recovery waits for
the completion of the bios after recovery_count is acquired.
The recovery_count is released when the recovery finished, however,
the bios in the hold list are processed after dm_rh_stop_recovery() in
mirror_presuspend(). dm_rh_stop_recovery() also acquires recovery_count,
then deadlock occurs.

Signed-off-by: Takahiro Yasui
Signed-off-by: Alasdair G Kergon
Reviewed-by: Mikulas Patocka

Takahiro Yasui
2010-03-06 10:32:35 +0800
8215d6ec5 dm table: remove unused dm_get_device range parameters ... Browse Code »

Remove unused parameters(start and len) of dm_get_device()
and fix the callers.

Signed-off-by: Nikanth Karthikesan
Signed-off-by: Alasdair G Kergon

Nikanth Karthikesan
2010-03-06 10:32:27 +0800
ede5ea0b8 dm raid1: always return error if all legs fail ... Browse Code »

If all mirror legs fail, always return an error instead of holding the
bio, even if the handle_errors option was set. At present it is the
responsibility of the driver underneath us to deal with retries,
multipath etc.

The patch adds the bio to the failures list instead of holding it
directly. do_failures tests first if all legs failed and, if so,
returns the bio with -EIO. If any leg is still alive and handle_errors
is set, do_failures calls hold_bio.

Reviewed-by: Takahiro Yasui
Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2010-03-06 10:32:22 +0800

17 Feb, 2010

1 commit

5528d17de dm raid1: fail writes if errors are not handled and log fails ... Browse Code »

If the mirror log fails when the handle_errors option was not selected
and there is no remaining valid mirror leg, writes return success even
though they weren't actually written to any device. This patch
completes them with EIO instead.

This code path is taken:
do_writes:
bio_list_merge(&ms->failures, &sync);
do_failures:
if (!get_valid_mirror(ms)) (false)
else if (errors_handled(ms)) (false)
else bio_endio(bio, 0);

The logic in do_failures is based on presuming that the write was already
tried: if it succeeded at least on one leg (without handle_errors) it
is reported as success.

Reference: https://bugzilla.redhat.com/show_bug.cgi?id=555197

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2010-02-17 02:42:55 +0800

11 Dec, 2009

11 commits

5339fc2d4 dm raid1: explicitly initialise bio_lists ... Browse Code »

Explicitly initialize bio lists instead of relying on kzalloc.

Signed-off-by: Mikulas Patocka
Reviewed-by: Takahiro Yasui
Tested-by: Takahiro Yasui
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:52:06 +0800
929be8fcb dm raid1: hold all write bios when leg fails ... Browse Code »

Hold all write bios when leg fails and errors are handled

When using a userspace daemon such as dmeventd to handle errors, we must
delay completing bios until it has done its job.
This patch prevents the following race:
- primary leg fails
- write "1" fail, the write is held, secondary leg is set default
- write "2" goes straight to the secondary leg

Signed-off-by: Mikulas Patocka
Reviewed-by: Takahiro Yasui
Tested-by: Takahiro Yasui
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:52:06 +0800
60f355ead dm raid1: hold write bios when errors are handled ... Browse Code »

Hold all write bios when errors are handled.

Previously the failures list was used only when handling errors with
a userspace daemon such as dmeventd. Now, it is always used for all bios.
The regions where some writes failed must be marked as nosync. This can only
be done in process context (i.e. in raid1 workqueue), not in the
write_callback function.

Previously the write would succeed if writing to at least one leg
succeeded. This is wrong because data from the failed leg may be
replicated to the correct leg. Now, if using a userspace daemon, the
write with some failures will be held until the daemon has done its job
and reconfigured the array. If not using a daemon, the write still
succeeds if at least one leg succeeds. This is bad, but it is consistent
with current behavior.

Signed-off-by: Mikulas Patocka
Reviewed-by: Takahiro Yasui
Tested-by: Takahiro Yasui
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:52:05 +0800
c58098be9 dm raid1: remove bio_endio from dm_rh_mark_nosync ... Browse Code »

Move bio completion out of dm_rh_mark_nosync in preparation for the
next patch.

Signed-off-by: Mikulas Patocka
Reviewed-by: Takahiro Yasui
Tested-by: Takahiro Yasui
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:52:05 +0800
87968ddd2 dm raid1: abstract get_valid_mirror function ... Browse Code »

Move the logic to get a valid mirror leg into a function for re-use
in a later patch.

Signed-off-by: Mikulas Patocka
Reviewed-by: Takahiro Yasui
Tested-by: Takahiro Yasui
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:52:04 +0800
0f398a840 dm raid1: use hold framework in do_failures ... Browse Code »

Use the hold framework in do_failures.

This patch doesn't change the bio processing logic, it just simplifies
failure handling and avoids periodically polling the failures list.

Signed-off-by: Mikulas Patocka
Reviewed-by: Takahiro Yasui
Tested-by: Takahiro Yasui
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:52:04 +0800
047885076 dm raid1: add framework to hold bios during suspend ... Browse Code »

Add framework to delay bios until a suspend and then resubmit them with
either DM_ENDIO_REQUEUE (if the suspend was noflush) or complete them
with -EIO. I/O barrier support will use this.

Signed-off-by: Mikulas Patocka
Reviewed-by: Takahiro Yasui
Tested-by: Takahiro Yasui
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:52:03 +0800
64b30c46e dm raid1: report flush errors separately in status ... Browse Code »

Report flush errors as 'F' instead of 'D' for log and mirror devices.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:52:02 +0800
c0da3748b dm raid1: implement mirror_flush ... Browse Code »

Implement flush callee. It uses dm_io to send zero-size barrier synchronously
and concurrently to all the mirror legs.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:52:02 +0800
87a8f240e dm log: add flush callback fn ... Browse Code »

Introduce a callback pointer from the log to dm-raid1 layer.

Before some region is set as "in-sync", we need to flush hardware cache on
all the disks. But the log module doesn't have access to the mirror_set
structure. So it will use this callback.

So far the callback is unused, it will be used in further patches.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:52:01 +0800
4184153f9 dm raid1: support flush ... Browse Code »

Flush support for dm-raid1.

When it receives an empty barrier, submit it to all the devices via dm-io.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-12-11 07:51:59 +0800

11 Sep, 2009

1 commit

1f98a13f6 bio: first step in sanitizing the bio->bi_rw flag testing ... Browse Code »

Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.

Signed-off-by: Jens Axboe

Jens Axboe
2009-09-11 20:33:31 +0800

05 Sep, 2009

1 commit

d2b698644 dm raid1: do not allow log_failure variable to unset after being set ... Browse Code »

This patch fixes a bug which was triggering a case where the primary leg
could not be changed on failure even when the mirror was in-sync.

The case involves the failure of the primary device along with
the transient failure of the log device. The problem is that
bios can be put on the 'failures' list (due to log failure)
before 'fail_mirror' is called due to the primary device failure.
Normally, this is fine, but if the log device failure is transient,
a subsequent iteration of the work thread, 'do_mirror', will
reset 'log_failure'. The 'do_failures' function then resets
the 'in_sync' variable when processing bios on the failures list.
The 'in_sync' variable is what is used to determine if the
primary device can be switched in the event of a failure. Since
this has been reset, the primary device is incorrectly assumed
to be not switchable.

The case has been seen in the cluster mirror context, where one
machine realizes the log device is dead before the other machines.
As the responsibilities of the server migrate from one node to
another (because the mirror is being reconfigured due to the failure),
the new server may think for a moment that the log device is fine -
thus resetting the 'log_failure' variable.

In any case, it is inappropiate for us to reset the 'log_failure'
variable. The above bug simply illustrates that it can actually
hurt us.

Cc: stable@kernel.org
Signed-off-by: Jonathan Brassow
Signed-off-by: Alasdair G Kergon

Jonathan Brassow
2009-09-05 03:40:32 +0800

24 Jul, 2009

2 commits

5dea271b6 dm table: pass correct dev area size to device_area_is_valid ... Browse Code »

Incorrect device area lengths are being passed to device_area_is_valid().

The regression appeared in 2.6.31-rc1 through commit
754c5fc7ebb417b23601a6222a6005cc2e7f2913.

With the dm-stripe target, the size of the target (ti->len) was used
instead of the stripe_width (ti->len/#stripes). An example of a
consequent incorrect error message is:

device-mapper: table: 254:0: sdb too small for target

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2009-07-24 03:30:42 +0800
69885683d dm raid1: wake kmirrord when requeueing delayed bios after remote recovery ... Browse Code »

The recent commit 7513c2a761d69d2a93f17146b3563527d3618ba0 (dm raid1:
add is_remote_recovering hook for clusters) changed do_writes() to
update the ms->writes list but forgot to wake up kmirrord to process it.

The rule is that when anything is being added on ms->reads, ms->writes
or ms->failures and the list was empty before we must call
wakeup_mirrord (for immediate processing) or delayed_wake (for delayed
processing). Otherwise the bios could sit on the list indefinitely.

Signed-off-by: Mikulas Patocka
CC: stable@kernel.org
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-07-24 03:30:37 +0800

22 Jun, 2009

1 commit

af4874e03 dm target:s introduce iterate devices fn ... Browse Code »

Add .iterate_devices to 'struct target_type' to allow a function to be
called for all devices in a DM target. Implemented it for all targets
except those in dm-snap.c (origin and snapshot).

(The raid1 version number jumps to 1.12 because we originally reserved
1.1 to 1.11 for 'block_on_error' but ended up using 'handle_errors'
instead.)

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon
Cc: martin.petersen@oracle.com

Mike Snitzer
2009-06-22 17:12:33 +0800

15 Apr, 2009

1 commit

8f3d8ba20 block: move bio list helpers into bio.h ... Browse Code »

It's used by DM and MD and generally useful, so move the bio list
helpers into bio.h.

Signed-off-by: Christoph Hellwig
Acked-by: Alasdair G Kergon
Signed-off-by: Jens Axboe

Christoph Hellwig
2009-04-15 14:28:09 +0800

03 Apr, 2009

2 commits

7513c2a76 dm raid1: add is_remote_recovering hook for clusters ... Browse Code »

The logging API needs an extra function to make cluster mirroring
possible. This new function allows us to check whether a mirror
region is being recovered on another machine in the cluster. This
helps us prevent simultaneous recovery I/O and process I/O to the
same locations on disk.

Cluster-aware log modules will implement this function. Single
machine log modules will not. So, there is no performance
penalty for single machine mirrors.

Signed-off-by: Jonathan Brassow
Acked-by: Heinz Mauelshagen
Signed-off-by: Alasdair G Kergon

Jonathan Brassow
2009-04-03 02:55:30 +0800
95f8fac8d dm raid1: switch read_record from kmalloc to slab to save memory ... Browse Code »

With my previous patch to save bi_io_vec, the size of dm_raid1_read_record
is significantly increased (the vector list takes 3072 bytes on 32-bit machines
and 4096 bytes on 64-bit machines).

The structure dm_raid1_read_record used to be allocated with kmalloc,
but kmalloc aligns the size on the next power-of-two so an object
slightly greater than 4096 will allocate 8192 bytes of memory and half of
that memory will be wasted.

This patch turns kmalloc into a slab cache which doesn't have this
padding so it will reduce the memory consumed.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-04-03 02:55:24 +0800

06 Jan, 2009

3 commits

2045e88ed dm log: move region_size validation ... Browse Code »

Move log size validation from mirror target to log constructor.

Removed PAGE_SIZE restriction we no longer think necessary.

Signed-off-by: Milan Broz
Signed-off-by: Alasdair G Kergon

Milan Broz
2009-01-06 11:05:01 +0800
10d3bd09a dm: consolidate target deregistration error handling ... Browse Code »

Change dm_unregister_target to return void and use BUG() for error
reporting.

dm_unregister_target can only fail because of programming bug in the
target driver. It can't fail because of user's behavior or disk errors.

This patch changes unregister_target to return void and use BUG if
someone tries to unregister non-registered target or unregister target
that is in use.

This patch removes code duplication (testing of error codes in all dm
targets) and reports bugs in just one place, in dm_unregister_target. In
some target drivers, these return codes were ignored, which could lead
to a situation where bugs could be missed.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2009-01-06 11:04:58 +0800
d460c65a6 dm raid1: fix error count ... Browse Code »

Always increase the error count when I/O on a leg of a mirror fails.

The error count is used to decide whether to select an alternative
mirror leg. If the target doesn't use the "handle_errors" feature, the
error count is not updated and the bio can get requeued forever by the
read callback.

Fix it by increasing error_count before the handle_errors feature
checking.

Cc: stable@kernel.org
Signed-off-by: Milan Broz
Signed-off-by: Jonathan Brassow
Signed-off-by: Alasdair G Kergon

Jonathan Brassow
2009-01-06 11:04:57 +0800

14 Nov, 2008

1 commit

18776c731 dm raid1: flush workqueue before destruction ... Browse Code »

We queue work on keventd queue --- so this queue must be flushed in the
destructor. Otherwise, keventd could access mirror_set after it was freed.

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon
Cc: stable@kernel.org

Mikulas Patocka
2008-11-14 07:38:52 +0800

30 Oct, 2008

1 commit

b34578a48 dm raid1: fix do_failures ... Browse Code »

Missing braces. Commit 1f965b1943 (dm raid1: separate region_hash interface
part1) broke it.

Signed-off-by: Ilpo Jarvinen
Signed-off-by: Alasdair G Kergon
Cc: Heinz Mauelshagen

Ilpo Jarvinen
2008-10-30 21:33:07 +0800

22 Oct, 2008

1 commit

1f965b194 dm raid1: separate region_hash interface part1 ... Browse Code »

Separate the region hash code from raid1 so it can be shared by forthcoming
targets. Use BUG_ON() for failed async dm_io() calls.

Signed-off-by: Heinz Mauelshagen
Signed-off-by: Alasdair G Kergon

Heinz Mauelshagen
2008-10-22 00:45:06 +0800