Eric Lee / smarc-fsl-linux-kernel

24 Feb, 2011

1 commit

f0b4f7e2f md: Fix - again - partition detection when array becomes active ... Browse Code »

Revert
b821eaa572fd737faaf6928ba046e571526c36c6
and
f3b99be19ded511a1bf05a148276239d9f13eefa

When I wrote the first of these I had a wrong idea about the
lifetime of 'struct block_device'. It can disappear at any time that
the block device is not open if it falls out of the inode cache.

So relying on the 'size' recorded with it to detect when the
device size has changed and so we need to revalidate, is wrong.

Rather, we really do need the 'changed' attribute stored directly in
the mddev and set/tested as appropriate.

Without this patch, a sequence of:
mknod / open / close / unlink

(which can cause a block_device to be created and then destroyed)
will result in a rescan of the partition table and consequence removal
and addition of partitions.
Several of these in a row can get udev racing to create and unlink and
other code can get confused.

With the patch, the rescan is only performed when needed and so there
are no races.

This is suitable for any stable kernel from 2.6.35.

Reported-by: "Wojcik, Krzysztof"
Signed-off-by: NeilBrown
Cc: stable@kernel.org

NeilBrown
2011-02-24 14:26:41 +0800

21 Feb, 2011

1 commit

da9cf5050 md: avoid spinlock problem in blk_throtl_exit ... Browse Code »

blk_throtl_exit assumes that ->queue_lock still exists,
so make sure that it does.
To do this, we stop redirecting ->queue_lock to conf->device_lock
and leave it pointing where it is initialised - __queue_lock.

As the blk_plug functions check the ->queue_lock is held, we now
take that spin_lock explicitly around the plug functions. We don't
need the locking, just the warning removal.

This is needed for any kernel with the blk_throtl code, which is
which is 2.6.37 and later.

Cc: stable@kernel.org
Signed-off-by: NeilBrown

NeilBrown
2011-02-21 15:25:57 +0800

16 Feb, 2011

2 commits

8f5f02c46 md: correctly handle probe of an 'mdp' device. ... Browse Code »

'mdp' devices are md devices with preallocated device numbers
for partitions. As such it is possible to mknod and open a partition
before opening the whole device.

this causes md_probe() to be called with a device number of a
partition, which in-turn calls mddev_find with such a number.

However mddev_find expects the number of a 'whole device' and
does the wrong thing with partition numbers.

So add code to mddev_find to remove the 'partition' part of
a device number and just work with the 'whole device'.

This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=28652

Reported-by: hkmaly@bigfoot.com
Signed-off-by: NeilBrown
Cc:

NeilBrown
2011-02-16 10:58:51 +0800
cbe6ef1d2 md: don't set_capacity before array is active. ... Browse Code »

If the desired size of an array is set (via sysfs) before the array is
active (which is the normal sequence), we currrently call set_capacity
immediately.
This means that a subsequent 'open' (as can be caused by some
udev-triggers program) will notice the new size and try to probe for
partitions. However as the array isn't quite ready yet the read will
fail. Then when the array is read, as the size doesn't change again
we don't try to re-probe.

So when setting array size via sysfs, only call set_capacity if the
array is already active.

Signed-off-by: NeilBrown

NeilBrown
2011-02-16 10:58:38 +0800

14 Feb, 2011

1 commit

f7bee8094 md: Fix raid1->raid0 takeover ... Browse Code »

Takeover raid1->raid0 not succeded. Kernel message is shown:
"md/raid0:md126: too few disks (1 of 2) - aborting!"

Problem was that we weren't updating ->raid_disks for that
takeover, unlike all the others.

Signed-off-by: Krzysztof Wojcik
Signed-off-by: NeilBrown

Krzysztof Wojcik
2011-02-14 07:01:41 +0800

08 Feb, 2011

2 commits

02214dc54 FIX: md: process hangs at wait_barrier after 0->10 takeover ... Browse Code »

Following symptoms were observed:
1. After raid0->raid10 takeover operation we have array with 2
missing disks.
When we add disk for rebuild, recovery process starts as expected
but it does not finish- it stops at about 90%, md126_resync process
hangs in "D" state.
2. Similar behavior is when we have mounted raid0 array and we
execute takeover to raid10. After this when we try to unmount array-
it causes process umount hangs in "D"

In scenarios above processes hang at the same function- wait_barrier
in raid10.c.
Process waits in macro "wait_event_lock_irq" until the
"!conf->barrier" condition will be true.
In scenarios above it never happens.

Reason was that at the end of level_store, after calling pers->run,
we call mddev_resume. This calls pers->quiesce(mddev, 0) with
RAID10, that calls lower_barrier.
However raise_barrier hadn't been called on that 'conf' yet,
so conf->barrier becomes negative, which is bad.

This patch introduces setting conf->barrier=1 after takeover
operation. It prevents to become barrier negative after call
lower_barrier().

Signed-off-by: Krzysztof Wojcik
Signed-off-by: NeilBrown

Krzysztof Wojcik
2011-02-08 08:49:02 +0800
e91ece559 md_make_request: don't touch the bio after calling make_request ... Browse Code »

md_make_request was calling bio_sectors() for part_stat_add
after it was calling the make_request function. This is
bad because the make_request function can free the bio and
because the bi_size field can change around.

The fix here was suggested by Jens Axboe. It saves the
sector count before the make_request call. I hit this
with CONFIG_DEBUG_PAGEALLOC turned on while trying to break
his pretty fusionio card.

Cc:
Signed-off-by: Chris Mason
Signed-off-by: NeilBrown

Chris Mason
2011-02-08 06:53:28 +0800

02 Feb, 2011

1 commit

c6751b2bd md: Don't allow slot_store while resync/recovery is happening. ... Browse Code »

Activating a spare in an array while resync/recovery is already
happening can lead the that spare being marked in-sync when it isn't
really.
So don't allow the 'slot' to be set (this activating the device)
while resync/recovery is happening.

Signed-off-by: NeilBrown

NeilBrown
2011-02-02 08:57:13 +0800

31 Jan, 2011

8 commits

7281f8129 md: don't clear curr_resync_completed at end of resync. ... Browse Code »

There is no need to set this to zero at this point. It will be
set to zero by remove_and_add_spares or at the start of
md_do_sync at the latest.
And setting it to zero before MD_RECOVERY_RUNNING is cleared can
make a 'zero' appear briefly in the 'sync_completed' sysfs attribute
just as resync is finishing.

So simply remove this setting to zero.

Signed-off-by: NeilBrown

NeilBrown
2011-01-31 11:30:27 +0800
a8c42c7f4 md: Don't use remove_and_add_spares to remove failed devices from a read-only array ... Browse Code »

remove_and_add_spares is called in two places where the needs really
are very different.
remove_and_add_spares should not be called on an array which is about
to be reshaped as some extra devices might have been manually added
and that would remove them. However if the array is 'read-auto',
that will currently happen, which is bad.

So in the 'ro != 0' case don't call remove_and_add_spares but simply
remove the failed devices as the comment suggests is needed.

Signed-off-by: NeilBrown

NeilBrown
2011-01-31 10:47:13 +0800
fc3a08b85 Add raid1->raid0 takeover support ... Browse Code »

This patch introduces raid 1 to raid0 takeover operation
in kernel space.

Signed-off-by: Krzysztof Wojcik
Signed-off-by: Neil Brown

Krzysztof Wojcik
2011-01-31 10:47:13 +0800
f21e9ff7f md: Remove the AllReserved flag for component devices. ... Browse Code »

This flag is not needed and is used badly.

Devices that are included in a native-metadata array are reserved
exclusively for that array - and currently have AllReserved set.
They all are bd_claimed for the rdev and so cannot be shared.

Devices that are included in external-metadata arrays can be shared
among multiple arrays - providing there is no overlap.
These are bd_claimed for md in general - not for a particular rdev.

When changing the amount of a device that is used in an array we need
to check for overlap. This currently includes a check on AllReserved
So even without overlap, sharing with an AllReserved device is not
allowed.
However the bd_claim usage already precludes sharing with these
devices, so the test on AllReserved is not needed. And in fact it is
wrong.

As this is the only use of AllReserved, simply remove all usage and
definition of AllReserved.

Signed-off-by: NeilBrown

NeilBrown
2011-01-31 09:10:09 +0800
50da08409 md: don't abort checking spares as soon as one cannot be added. ... Browse Code »

As spares can be added manually before a reshape starts, we need to
find them all to mark some of them as in_sync.

Previously we would abort looking for spares when we found an
unallocated spare what could not be added to the array (implying there
was no room for new spares). However already-added spares could be
later in the list, so we need to keep searching.

Signed-off-by: NeilBrown

NeilBrown
2011-01-31 08:57:43 +0800
469518a34 md: fix the test for finding spares in raid5_start_reshape. ... Browse Code »

As spares can be added to the array before the reshape is started,
we need to find and count them when checking there are enough.
The array could have been degraded, so we need to check all devices,
no just those out side of the range of devices in the array before
the reshape.

So instead of checking the index, check the In_sync flag as that
reliably tells if the device is a spare or this purpose.

Signed-off-by: NeilBrown

NeilBrown
2011-01-31 08:57:43 +0800
87a8dec91 md: simplify some 'if' conditionals in raid5_start_reshape. ... Browse Code »

There are two consecutive 'if' statements.

if (mddev->delta_disks >= 0)
....
if (mddev->delta_disks > 0)

The code in the second is equally valid if delta_disks == 0, and these
two statements are the only place that 'added_devices' is used.

So make them a single if statement, make added_devices a local
variable, and re-indent it all.

No functional change.

Signed-off-by: NeilBrown

NeilBrown
2011-01-31 08:57:43 +0800
de171cb9a md: revert change to raid_disks on failure. ... Browse Code »

If we try to update_raid_disks and it fails, we should put
'delta_disks' back to zero. This is important because some code,
such as slot_store, assumes that delta_disks has been validated.

Signed-off-by: NeilBrown

NeilBrown
2011-01-31 08:57:42 +0800

15 Jan, 2011

1 commit

49731baa4 block: restore multiple bd_link_disk_holder() support ... Browse Code »

Commit e09b457b (block: simplify holder symlink handling) incorrectly
assumed that there is only one link at maximum. dm may use multiple
links and expects block layer to track reference count for each link,
which is different from and unrelated to the exclusive device holder
identified by @holder when the device is opened.

Remove the single holder assumption and automatic removal of the link
and revive the per-link reference count tracking. The code
essentially behaves the same as before commit e09b457b sans the
unnecessary kobject reference count dancing.

While at it, note that this facility should not be used by anyone else
than the current ones. Sysfs symlinks shouldn't be abused like this
and the whole thing doesn't belong in the block layer at all.

Signed-off-by: Tejun Heo
Reported-by: Milan Broz
Cc: Jun'ichi Nomura
Cc: Neil Brown
Cc: linux-raid@vger.kernel.org
Cc: Kay Sievers
Signed-off-by: Jens Axboe

Tejun Heo
2011-01-15 01:44:22 +0800

14 Jan, 2011

23 commits

f6bcfd94c Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (32 commits)
dm: raid456 basic support
dm: per target unplug callback support
dm: introduce target callbacks and congestion callback
dm mpath: delay activate_path retry on SCSI_DH_RETRY
dm: remove superfluous irq disablement in dm_request_fn
dm log: use PTR_ERR value instead of ENOMEM
dm snapshot: avoid storing private suspended state
dm snapshot: persistent make metadata_wq multithreaded
dm: use non reentrant workqueues if equivalent
dm: convert workqueues to alloc_ordered
dm stripe: switch from local workqueue to system_wq
dm: dont use flush_scheduled_work
dm snapshot: remove unused dm_snapshot queued_bios_work
dm ioctl: suppress needless warning messages
dm crypt: add loop aes iv generator
dm crypt: add multi key capability
dm crypt: add post iv call to iv generator
dm crypt: use io thread for reads only if mempool exhausted
dm crypt: scale to multiple cpus
dm crypt: simplify compatible table output
...

Linus Torvalds
2011-01-14 09:30:47 +0800
509e4aef4 Merge branch 'for-linus' of git://neil.brown.name/md ... Browse Code »

* 'for-linus' of git://neil.brown.name/md:
md: Fix removal of extra drives when converting RAID6 to RAID5
md: range check slot number when manually adding a spare.
md/raid5: handle manually-added spares in start_reshape.
md: fix sync_completed reporting for very large drives (>2TB)
md: allow suspend_lo and suspend_hi to decrease as well as increase.
md: Don't let implementation detail of curr_resync leak out through sysfs.
md: separate meta and data devs
md-new-param-to_sync_page_io
md-new-param-to-calc_dev_sboffset
md: Be more careful about clearing flags bit in ->recovery
md: md_stop_writes requires mddev_lock.
md/raid5: use sysfs_notify_dirent_safe to avoid NULL pointer
md: Ensure no IO request to get md device before it is properly initialised.
md: Fix single printks with multiple KERN_s
md: fix regression resulting in delays in clearing bits in a bitmap
md: fix regression with re-adding devices to arrays with no metadata

Linus Torvalds
2011-01-14 09:30:20 +0800
bf2cb0dab md: Fix removal of extra drives when converting RAID6 to RAID5 ... Browse Code »

When a RAID6 is converted to a RAID5, the extra drive should
be discarded. However it isn't due to a typo in a comparison.

This bug was introduced in commit e93f68a1fc6 in 2.6.35-rc4
and is suitable for any -stable since than.

As the extra drive is not removed, the 'degraded' counter is wrong and
so the RAID5 will not respond correctly to a subsequent failure.

Cc: stable@kernel.org
Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:14:34 +0800
ba1b41b6b md: range check slot number when manually adding a spare. ... Browse Code »

When adding a spare to an active array, we should check the slot
number, but allow it to be larger than raid_disks if a reshape
is being prepared.

Apply the same test when adding a device to an
array-under-construction. It already had most of the test in place,
but not quite all.

Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:14:34 +0800
1a940fcee md/raid5: handle manually-added spares in start_reshape. ... Browse Code »

It is possible to manually add spares to specific slots before
starting a reshape.
raid5_start_reshape should recognised this possibility and include
it in the accounting.

Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:14:34 +0800
13ae864bc md: fix sync_completed reporting for very large drives (>2TB) ... Browse Code »

The values exported in the sync_completed file are unsigned long, which
overflows with very large drives, resulting in wrong values reported.

Since sync_completed uses sectors as unit, we'll start getting wrong
values with components larger than 2TB.

This patch simply replaces the use of unsigned long by unsigned long long.

Signed-off-by: Rémi Rérolle
Signed-off-by: NeilBrown

Rémi Rérolle
2011-01-14 06:14:34 +0800
23ddff379 md: allow suspend_lo and suspend_hi to decrease as well as increase. ... Browse Code »

The sysfs attributes 'suspend_lo' and 'suspend_hi' describe a region
to which read/writes are suspended so that the under lying data can be
manipulated without user-space noticing.
Currently the window they describe can only move forwards along the
device. However this is an unnecessary restriction which will cause
problems with planned developments.
So relax this restriction and allow these endpoints to move
arbitrarily.

Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:14:34 +0800
75d3da43c md: Don't let implementation detail of curr_resync leak out through sysfs. ... Browse Code »

mddev->curr_resync has artificial values of '1' and '2' which are used
by the code which ensures only one resync is happening at a time on
any given device.

These values are internal and should never be exposed to user-space
(except when translated appropriately as in the 'pending' status in
/proc/mdstat).

Unfortunately they are as ->curr_resync is assigned to
->curr_resync_completed and that value is directly visible through
sysfs.

So change the assignments to ->curr_resync_completed to get the same
valued from elsewhere in a form that doesn't have the magic '1' or '2'
values.

Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:14:34 +0800
a6ff7e089 md: separate meta and data devs ... Browse Code »

Allow the metadata to be on a separate device from the
data.

This doesn't mean the data and metadata will by on separate
physical devices - it simply gives device-mapper and userspace
tools more flexibility.

Signed-off-by: NeilBrown

Jonathan Brassow
2011-01-14 06:14:34 +0800
ccebd4c41 md-new-param-to_sync_page_io ... Browse Code »

Add new parameter to 'sync_page_io'.

The new parameter allows us to distinguish between metadata and data
operations. This becomes important later when we add the ability to
use separate devices for data and metadata.

Signed-off-by: Jonathan Brassow

Jonathan Brassow
2011-01-14 06:14:33 +0800
57b2caa39 md-new-param-to-calc_dev_sboffset ... Browse Code »

When we allow for separate devices for data and metadata
in a later patch, we will need to be able to calculate
the superblock offset based on more than the bdev.

Signed-off-by: Jonathan Brassow

Jonathan Brassow
2011-01-14 06:14:33 +0800
7ebc0be7f md: Be more careful about clearing flags bit in ->recovery ... Browse Code »

Setting ->recovery to 0 is generally not a good idea as it could clear
bits that shouldn't be cleared. In particular, MD_RECOVERY_FROZEN
should only be cleared on explicit request from user-space.

So when we need to clear things, just clear the bits that need
clearing.

As there are a few different places which reap a resync process - and
some do an incomplte job - factor out the code for doing the from
md_check_recovery and call that function instead of open coding part
of it.

Signed-off-by: NeilBrown
Reported-by: Jonathan Brassow

NeilBrown
2011-01-14 06:14:33 +0800
defad61a5 md: md_stop_writes requires mddev_lock. ... Browse Code »

As md_stop_writes manipulates the sync_thread and calls md_update_sb,
it need to be called with mddev_lock held.

In all internal cases it is, but the symbol is exported for dm-raid to
call and in that case the lock won't be help.
Do make an exported version which takes the lock, and an internal
version which does not.

Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:14:33 +0800
43c73ca43 md/raid5: use sysfs_notify_dirent_safe to avoid NULL pointer ... Browse Code »

With the module parameter 'start_dirty_degraded' set,
raid5_spare_active() previously called sysfs_notify_dirent() with a NULL
argument (rdev->sysfs_state) when a rebuild finished.

Signed-off-by: Jonathan Brassow
Signed-off-by: Mike Snitzer

Jonathan Brassow
2011-01-14 06:14:33 +0800
0ca69886a md: Ensure no IO request to get md device before it is properly initialised. ... Browse Code »

When an md device is in the process of coming on line it is possible
for an IO request (typically a partition table probe) to get through
before the array is fully initialised, which can cause unexpected
behaviour (e.g. a crash).

So explicitly record when the array is ready for IO and don't allow IO
through until then.

There is no possibility for a similar problem when the array is going
off-line as there must only be one 'open' at that time, and it is busy
off-lining the array and so cannot send IO requests. So no memory
barrier is needed in md_stop()

This has been a bug since commit 409c57f3801 in 2.6.30 which
introduced md_make_request. Before then, each personality would
register its own make_request_fn when it was ready.
This is suitable for any stable kernel from 2.6.30.y onwards.

Cc:
Signed-off-by: NeilBrown
Reported-by: "Hawrylewicz Czarnowski, Przemyslaw"

NeilBrown
2011-01-14 06:14:33 +0800
067032bc6 md: Fix single printks with multiple KERN_<level>s ... Browse Code »

Noticed-by: Russell King
Signed-off-by: Joe Perches
Signed-off-by: NeilBrown

Joe Perches
2011-01-14 06:14:33 +0800
6c9879101 md: fix regression resulting in delays in clearing bits in a bitmap ... Browse Code »

commit 589a594be1fb (2.6.37-rc4) fixed a problem were md_thread would
sometimes call the ->run function at a bad time.

If an error is detected during array start up after the md_thread has
been started, the md_thread is killed. This resulted in the ->run
function being called once. However the array may not be in a state
that it is safe to call ->run.

However the fix imposed meant that ->run was not called on a timeout.
This means that when an array goes idle, bitmap bits do not get
cleared promptly. While the array is busy the bits will still be
cleared when appropriate so this is not very serious. There is no
risk to data.

Change the test so that we only avoid calling ->run when the thread
is being stopped. This more explicitly addresses the problem situation.

This is suitable for 2.6.37-stable and any -stable kernel to which
589a594be1fb was applied.

Cc: stable@kernel.org
Signed-off-by: NeilBrown

NeilBrown
2011-01-14 06:13:53 +0800
9d09e663d dm: raid456 basic support ... Browse Code »

This patch is the skeleton for the DM target that will be
the bridge from DM to MD (initially RAID456 and later RAID1). It
provides a way to use device-mapper interfaces to the MD RAID456
drivers.

As with all device-mapper targets, the nominal public interfaces are the
constructor (CTR) tables and the status outputs (both STATUSTYPE_INFO
and STATUSTYPE_TABLE). The CTR table looks like the following:

1: raid \
2: \
3: ..

Line 1 contains the standard first three arguments to any device-mapper
target - the start, length, and target type fields. The target type in
this case is "raid".

Line 2 contains the arguments that define the particular raid
type/personality/level, the required arguments for that raid type, and
any optional arguments. Possible raid types include: raid4, raid5_la,
raid5_ls, raid5_rs, raid6_zr, raid6_nr, and raid6_nc. (again, raid1 is
planned for the future.) The list of required and optional parameters
is the same for all the current raid types. The required parameters are
positional, while the optional parameters are given as key/value pairs.
The possible parameters are as follows:
Chunk size in sectors.
[[no]sync] Force/Prevent RAID initialization
[rebuild ] Rebuild the drive indicated by the index
[daemon_sleep ] Time between bitmap daemon work to clear bits
[min_recovery_rate ] Throttle RAID initialization
[max_recovery_rate ] Throttle RAID initialization
[max_write_behind ] See '-write-behind=' (man mdadm)
[stripe_cache ] Stripe cache size for higher RAIDs

Line 3 contains the list of devices that compose the array in
metadata/data device pairs. If the metadata is stored separately, a '-'
is given for the metadata device position. If a drive has failed or is
missing at creation time, a '-' can be given for both the metadata and
data drives for a given position.

Examples:
# RAID4 - 4 data drives, 1 parity
# No metadata devices specified to hold superblock/bitmap info
# Chunk size of 1MiB
# (Lines separated for easy reading)
0 1960893648 raid \
raid4 1 2048 \
5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81

# RAID4 - 4 data drives, 1 parity (no metadata devices)
# Chunk size of 1MiB, force RAID initialization,
# min recovery rate at 20 kiB/sec/disk
0 1960893648 raid \
raid4 4 2048 min_recovery_rate 20 sync\
5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81

Performing a 'dmsetup table' should display the CTR table used to
construct the mapping (with possible reordering of optional
parameters).

Performing a 'dmsetup status' will yield information on the state and
health of the array. The output is as follows:
1: raid \
2:

Line 1 is standard DM output. Line 2 is best shown by example:
0 1960893648 raid raid4 5 AAAAA 2/490221568
Here we can see the RAID type is raid4, there are 5 devices - all of
which are 'A'live, and the array is 2/490221568 complete with recovery.

Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown
Signed-off-by: Jonathan Brassow
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

NeilBrown
2011-01-14 04:00:02 +0800
99d03c141 dm: per target unplug callback support ... Browse Code »

Add per-target unplug callback support.

Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown
Signed-off-by: Jonathan Brassow
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

NeilBrown
2011-01-14 04:00:02 +0800
9d357b078 dm: introduce target callbacks and congestion callback ... Browse Code »

DM currently implements congestion checking by checking on congestion
in each component device. For raid456 we need to also check if the
stripe cache is congested.

Add per-target congestion checker callback support.

Extending the target_callbacks structure with additional callback
functions allows for establishing multiple callbacks per-target (a
callback is also needed for unplug).

Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown
Signed-off-by: Jonathan Brassow
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

NeilBrown
2011-01-14 04:00:01 +0800
4e2d19e46 dm mpath: delay activate_path retry on SCSI_DH_RETRY ... Browse Code »

This patch adds a user-configurable 'pg_init_delay_msecs' feature. Use
this feature to specify the number of milliseconds to delay before
retrying scsi_dh_activate, when SCSI_DH_RETRY is returned.

SCSI Device Handlers return SCSI_DH_IMM_RETRY if we could retry
activation immediately and SCSI_DH_RETRY in cases where it is better to
retry after some delay.

Currently we immediately retry scsi_dh_activate irrespective of
SCSI_DH_IMM_RETRY and SCSI_DH_RETRY.

The 'pg_init_delay_msecs' feature may be provided during table create or
load, e.g.:
dmsetup create --table "0 20971520 multipath 3 queue_if_no_path \
pg_init_delay_msecs 2500 ..." mpatha

The default for 'pg_init_delay_msecs' is 2000 milliseconds.
Maximum configurable delay is 60000 milliseconds. Specifying a
'pg_init_delay_msecs' of 0 will cause immediate retry.

Signed-off-by: Nikanth Karthikesan
Signed-off-by: Chandra Seetharaman
Acked-by: Mike Christie
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Chandra Seetharaman
2011-01-14 04:00:01 +0800
052189a2e dm: remove superfluous irq disablement in dm_request_fn ... Browse Code »

This patch changes spin_lock_irq() to spin_lock() in dm_request_fn().
This patch is just a clean-up and no functional change.

The spin_lock_irq() was leftover from the early request-based dm code,
where map_request() used to enable interrupts.
Since current map_request() never enables interrupts, we can change it
to spin_lock() to match the prior spin_unlock().

Auditing through the dm and block-layer code called from
map_request(), I confirmed all functions save/restore interrupt
status, so no function returning with interrupts enabled.
Also I haven't observed any problem on my test environment which
uses scsi and lpfc driver after heavy I/O testing with occasional
path down/up.

Added BUG_ON() to detect breakage in future.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2011-01-14 04:00:00 +0800
dbc883f15 dm log: use PTR_ERR value instead of ENOMEM ... Browse Code »

It's nicer to return the PTR_ERR() value instead of just returning
-ENOMEM. In the current code the PTR_ERR() value is always equal to
-ENOMEM so this doesn't actually affect anything, but still...

In addition, dm_dirty_log_create() doesn't check for a specific -ENOMEM
return. So this change is safe relative to potential for a non -ENOMEM
return in the future.

Signed-off-by: Dan Carpenter
Acked-by: Jonathan Brassow
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Dan Carpenter
2011-01-14 04:00:00 +0800