Doug / smarc-fsl-linux-kernel | Embedian Git Server

28 Jul, 2011

1 commit

2699b6722 md: load/store badblock list from v1.x metadata ... Browse Code »

Space must have been allocated when array was created.
A feature flag is set when the badblock list is non-empty, to
ensure old kernels don't load and trust the whole device.

We only update the on-disk badblocklist when it has changed.
If the badblocklist (or other metadata) is stored on a bad block, we
don't cope very well.

If metadata has no room for bad block, flag bad-blocks as disabled,
and do the same for 0.90 metadata.

Signed-off-by: NeilBrown

NeilBrown
2011-07-28 09:31:47 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

12 Aug, 2010

1 commit

d5302fe41 Make lib/raid6/test build correctly. ... Browse Code »

Some bit-rot needs to be cleaned out.

Signed-off-by: NeilBrown

NeilBrown
2010-08-12 04:38:24 +0800

14 Dec, 2009

1 commit

7820f9e1d md: remove sparse warning:symbol XXX was not declared. ... Browse Code »

Signed-off-by: NeilBrown

NeilBrown
2009-12-14 09:49:47 +0800

18 Jun, 2009

1 commit

cdc2ae6d6 md: fix some comments. ... Browse Code »

1/ Raid5 has learned to take over also raid4 and raid6 arrays.
2/ new_chunk in mdp_superblock_1 is in sectors, not bytes.

Signed-off-by: NeilBrown

Andre Noll
2009-06-18 06:46:47 +0800

31 Mar, 2009

7 commits

f701d589a md/raid6: move raid6 data processing to raid6_pq.ko ... Browse Code »

Move the raid6 data processing routines into a standalone module
(raid6_pq) to prepare them to be called from async_tx wrappers and other
non-md drivers/modules. This precludes a circular dependency of raid456
needing the async modules for data processing while those modules in
turn depend on raid456 for the base level synchronous raid6 routines.

To support this move:
1/ The exportable definitions in raid6.h move to include/linux/raid/pq.h
2/ The raid6_call, recovery calls, and table symbols are exported
3/ Extra #ifdef __KERNEL__ statements to enable the userspace raid6test to
compile

Signed-off-by: Dan Williams
Signed-off-by: NeilBrown

Dan Williams
2009-03-31 12:09:39 +0800
43b2e5d86 md: move md_k.h from include/linux/raid/ to drivers/md/ ... Browse Code »

It really is nicer to keep related code together..

Signed-off-by: NeilBrown

NeilBrown
2009-03-31 11:33:13 +0800
bff61975b md: move lots of #include lines out of .h files and into .c ... Browse Code »

This makes the includes more explicit, and is preparation for moving
md_k.h to drivers/md/md.h

Remove include/raid/md.h as its only remaining use was to #include
other files.

Signed-off-by: NeilBrown

NeilBrown
2009-03-31 11:33:13 +0800
92022950c md: move most content from md.h to md_k.h ... Browse Code »

The extern function definitions are kernel-internal definitions, so
they belong in md_k.h

The MD_*_VERSION values could reasonably go in a number of places,
but md_u.h seems most reasonable.

This leaves almost nothing in md.h. It will go soon.

Signed-off-by: NeilBrown

NeilBrown
2009-03-31 11:33:13 +0800
8b2b5c217 md: move LEVEL_* definition from md_k.h to md_u.h ... Browse Code »

.. as they are part of the user-space interface.
Also move MdpMinorShift into there so we can remove duplication.

Lastly move mdp_major in. It is less obviously part of the user-space
interface, but do_mounts_md.c uses it, and it is acting a bit like
user-space.

Signed-off-by: NeilBrown

NeilBrown
2009-03-31 11:27:03 +0800
ef740c372 md: move headers out of include/linux/raid/ ... Browse Code »

Move the headers with the local structures for the disciplines and
bitmap.h into drivers/md/ so that they are more easily grepable for
hacking and not far away. md.h is left where it is for now as there
are some uses from the outside.

Signed-off-by: Christoph Hellwig
Signed-off-by: NeilBrown

Christoph Hellwig
2009-03-31 11:27:03 +0800
eea1bf384 md: Fix is_mddev_idle test (again). ... Browse Code »

There are two problems with is_mddev_idle.

1/ sync_io is 'atomic_t' and hence 'int'. curr_events and all the
rest are 'long'.
So if sync_io were to wrap on a 64bit host, the value of
curr_events would go very negative suddenly, and take a very
long time to return to positive.

So do all calculations as 'int'. That gives us plenty of precision
for what we need.

2/ To initialise rdev->last_events we simply call is_mddev_idle, on
the assumption that it will make sure that last_events is in a
suitable range. It used to do this, but now it does not.
So now we need to be more explicit about initialisation.

Signed-off-by: NeilBrown

NeilBrown
2009-03-31 11:27:02 +0800

31 Jan, 2009

1 commit

bcf74582a headers_check fix: raid/md_p.h ... Browse Code »

fix the following 'make headers_check' warning:

usr/include/linux/raid/md_p.h:85: found __[us]{8,16,32,64} type without #include

Signed-off-by: Jaswinder Singh Rajput

Jaswinder Singh Rajput
2009-01-31 02:02:45 +0800

09 Jan, 2009

10 commits

4044ba58d md: don't retry recovery of raid1 that fails due to error on source drive. ... Browse Code »

If a raid1 has only one working drive and it has a sector which
gives an error on read, then an attempt to recover onto a spare will
fail, but as the single remaining drive is not removed from the
array, the recovery will be immediately re-attempted, resulting
in an infinite recovery loop.

So detect this situation and don't retry recovery once an error
on the lone remaining drive is detected.

Allow recovery to be retried once every time a spare is added
in case the problem wasn't actually a media error.

Signed-off-by: NeilBrown

NeilBrown
2009-01-09 05:31:11 +0800
efeb53c0e md: Allow md devices to be created by name. ... Browse Code »

Using sequential numbers to identify md devices is somewhat artificial.
Using names can be a lot more user-friendly.

Also, creating md devices by opening the device special file is a bit
awkward.

So this patch provides a new option for creating and naming devices.

Writing a name such as "md_home" to
/sys/modules/md_mod/parameters/new_array
will cause an array with that name to be created. It will appear in
/sys/block/ /proc/partitions and /proc/mdstat as 'md_home'.
It will have an arbitrary minor number allocated.

md devices that a created by an open are destroyed on the last
close when the device is inactive.
For named md devices, they will not be destroyed until the array
is explicitly stopped, either with the STOP_ARRAY ioctl or by
writing 'clear' to /sys/block/md_XXXX/md/array_state.

The name of the array must start 'md_' to avoid conflict with
other devices.

Signed-off-by: NeilBrown

NeilBrown
2009-01-09 05:31:10 +0800
d3374825c md: make devices disappear when they are no longer needed. ... Browse Code »

Currently md devices, once created, never disappear until the module
is unloaded. This is essentially because the gendisk holds a
reference to the mddev, and the mddev holds a reference to the
gendisk, this a circular reference.

If we drop the reference from mddev to gendisk, then we need to ensure
that the mddev is destroyed when the gendisk is destroyed. However it
is not possible to hook into the gendisk destruction process to enable
this.

So we drop the reference from the gendisk to the mddev and destroy the
gendisk when the mddev gets destroyed. However this has a
complication.
Between the call
__blkdev_get->get_gendisk->kobj_lookup->md_probe
and the call
__blkdev_get->md_open

there is no obvious way to hold a reference on the mddev any more, so
unless something is done, it will disappear and gendisk will be
destroyed prematurely.

Also, once we decide to destroy the mddev, there will be an unlockable
moment before the gendisk is unlinked (blk_unregister_region) during
which a new reference to the gendisk can be created. We need to
ensure that this reference can not be used. i.e. the ->open must
fail.

So:
1/ in md_probe we set a flag in the mddev (hold_active) which
indicates that the array should be treated as active, even
though there are no references, and no appearance of activity.
This is cleared by md_release when the device is closed if it
is no longer needed.
This ensures that the gendisk will survive between md_probe and
md_open.

2/ In md_open we check if the mddev we expect to open matches
the gendisk that we did open.
If there is a mismatch we return -ERESTARTSYS and modify
__blkdev_get to retry from the top in that case.
In the -ERESTARTSYS sys case we make sure to wait until
the old gendisk (that we succeeded in opening) is really gone so
we loop at most once.

Some udev configurations will always open an md device when it first
appears. If we allow an md device that was just created by an open
to disappear on an immediate close, then this can race with such udev
configurations and result in an infinite loop the device being opened
and closed, then re-open due to the 'ADD' even from the first open,
and then close and so on.
So we make sure an md device, once created by an open, remains active
at least until some md 'ioctl' has been made on it. This means that
all normal usage of md devices will allow them to disappear promptly
when not needed, but the worst that an incorrect usage will do it
cause an inactive md device to be left in existence (it can easily be
removed).

As an array can be stopped by writing to a sysfs attribute
echo clear > /sys/block/mdXXX/md/array_state
we need to use scheduled work for deleting the gendisk and other
kobjects. This allows us to wait for any pending gendisk deletion to
complete by simply calling flush_scheduled_work().

Signed-off-by: NeilBrown

NeilBrown
2009-01-09 05:31:10 +0800
cd2ac9321 md: need another print_sb for mdp_superblock_1 ... Browse Code »

md_print_devices is called in two code path: MD_BUG(...), and md_ioctl
with PRINT_RAID_DEBUG. it will dump out all in use md devices
information;

However, it wrongly processed two types of superblock in one:

The header file has defined two types of superblock,
struct mdp_superblock_s (typedefed with mdp_super_t) according to md with
metadata 0.90, and struct mdp_superblock_1 according to md with metadata
1.0 and later,

These two types of superblock are very different,

The md_print_devices code processed them both in mdp_super_t, that would
lead to wrong informaton dump like:

[ 6742.345877]
[ 6742.345887] md: **********************************
[ 6742.345890] md: * *
[ 6742.345892] md: **********************************
[ 6742.345896] md1:
[ 6742.345907] md: rdev ram7, SZ:00065472 F:0 S:1 DN:3
[ 6742.345909] md: rdev superblock:
[ 6742.345914] md: SB: (V:0.90.0) ID: CT:4919856d
[ 6742.345918] md: L5 S00065472 ND:4 RD:4 md1 LO:2 CS:65536
[ 6742.345922] md: UT:4919856d ST:1 AD:4 WD:4 FD:0 SD:0 CSUM:b7992907 E:00000001
[ 6742.345924] D 0: DISK
[ 6742.345930] D 1: DISK
[ 6742.345933] D 2: DISK
[ 6742.345937] D 3: DISK
[ 6742.345942] md: THIS: DISK
...
[ 6742.346058] md0:
[ 6742.346067] md: rdev ram3, SZ:00065472 F:0 S:1 DN:3
[ 6742.346070] md: rdev superblock:
[ 6742.346073] md: SB: (V:1.0.0) ID: CT:9a322a9c
[ 6742.346077] md: L-1507699579 S976570180 ND:48 RD:0 md0 LO:65536 CS:196610
[ 6742.346081] md: UT:00000018 ST:0 AD:131048 WD:0 FD:8 SD:0 CSUM:00000000 E:00000000
[ 6742.346084] D 0: DISK
[ 6742.346089] D 1: DISK
[ 6742.346092] D 2: DISK
[ 6742.346096] D 3: DISK
[ 6742.346102] md: THIS: DISK
...
[ 6742.346219] md: **********************************
[ 6742.346221]

Here md1 is metadata 0.90.0, and md0 is metadata 1.2

After some more code to distinguish these two types of superblock, in this patch,

it will generate dump information like:

[ 7906.755790]
[ 7906.755799] md: **********************************
[ 7906.755802] md: * *
[ 7906.755804] md: **********************************
[ 7906.755808] md1:
[ 7906.755819] md: rdev ram7, SZ:00065472 F:0 S:1 DN:3
[ 7906.755821] md: rdev superblock (MJ:0):
[ 7906.755826] md: SB: (V:0.90.0) ID: CT:491989f3
[ 7906.755830] md: L5 S00065472 ND:4 RD:4 md1 LO:2 CS:65536
[ 7906.755834] md: UT:491989f3 ST:1 AD:4 WD:4 FD:0 SD:0 CSUM:00fb52ad E:00000001
[ 7906.755836] D 0: DISK
[ 7906.755842] D 1: DISK
[ 7906.755845] D 2: DISK
[ 7906.755849] D 3: DISK
[ 7906.755855] md: THIS: DISK
...
[ 7906.755972] md0:
[ 7906.755981] md: rdev ram3, SZ:00065472 F:0 S:1 DN:3
[ 7906.755984] md: rdev superblock (MJ:1):
[ 7906.755989] md: SB: (V:1) (F:0) Array-ID:
[ 7906.755990] md: Name: "DG5:0" CT:1226410480
[ 7906.755998] md: L5 SZ130944 RD:4 LO:2 CS:128 DO:24 DS:131048 SO:8 RO:0
[ 7906.755999] md: Dev:00000003 UUID: 9194d744:87f7:a448:85f2:7497b84ce30a
[ 7906.756001] md: (F:0) UT:1226410480 Events:0 ResyncOffset:-1 CSUM:0dbcd829
[ 7906.756003] md: (MaxDev:384)
...
[ 7906.756113] md: **********************************
[ 7906.756116]

this md0 (metadata 1.2) information dumping is exactly according to struct
mdp_superblock_1.

Signed-off-by: Cheng Renquan
Cc: Neil Brown
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: NeilBrown

Cheng Renquan
2009-01-09 05:31:08 +0800
159ec1fc0 md: use list_for_each_entry macro directly ... Browse Code »

The rdev_for_each macro defined in is identical to
list_for_each_entry_safe, from , it should be defined to
use list_for_each_entry_safe, instead of reinventing the wheel.

But some calls to each_entry_safe don't really need a safe version,
just a direct list_for_each_entry is enough, this could save a temp
variable (tmp) in every function that used rdev_for_each.

In this patch, most rdev_for_each loops are replaced by list_for_each_entry,
totally save many tmp vars; and only in the other situations that will call
list_del to delete an entry, the safe version is used.

Signed-off-by: Cheng Renquan
Signed-off-by: NeilBrown

Cheng Renquan
2009-01-09 05:31:08 +0800
ccacc7d2c md: raid0: make hash_spacing and preshift sector-based. ... Browse Code »

This patch renames the hash_spacing and preshift members of struct
raid0_private_data to spacing and sector_shift respectively and
changes the semantics as follows:

We always have spacing = 2 * hash_spacing. In case
sizeof(sector_t) > sizeof(u32) we also have sector_shift = preshift + 1
while sector_shift = preshift = 0 otherwise.

Note that the values of nb_zone and zone are unaffected by these changes
because in the sector_div() preceeding the assignement of these two
variables both arguments double.

Signed-off-by: Andre Noll
Signed-off-by: NeilBrown

Andre Noll
2009-01-09 05:31:08 +0800
83838ed87 md: raid0: Represent the size of strip zones in sectors. ... Browse Code »

This completes the block -> sector conversion of struct strip_zone.

Signed-off-by: Andre Noll
Signed-off-by: NeilBrown

Andre Noll
2009-01-09 05:31:07 +0800
6199d3db0 md: raid0: Represent zone->zone_offset in sectors. ... Browse Code »

For the same reason as in the previous patch, rename it from zone_offset
to zone_start.

Signed-off-by: Andre Noll
Signed-off-by: NeilBrown

Andre Noll
2009-01-09 05:31:07 +0800
019c4e2f3 md: raid0: Represent device offset in sectors. ... Browse Code »

Rename zone->dev_offset to zone->dev_start to make sure all users
have been converted.

Signed-off-by: Andre Noll
Signed-off-by: NeilBrown

Andre Noll
2009-01-09 05:31:06 +0800
0c3573f19 md: use sysfs_notify_dirent to notify changes to md/sync_action. ... Browse Code »

There is no compelling need for this, but sysfs_notify_dirent is a
nicer interface and the change is good for consistency.

Signed-off-by: NeilBrown

NeilBrown
2009-01-09 05:31:05 +0800

21 Oct, 2008

2 commits

3c0ee63a6 md: use sysfs_notify_dirent to notify changes to md/dev-xxx/state ... Browse Code »

The 'state' file for a device reports, for example, when the device
has failed. Changes should be reported to userspace ASAP without
the possibility of blocking on low-memory. sysfs_notify does
have that possibility (as it takes a mutex which can be held
across a kmalloc) so use sysfs_notify_dirent instead.

Signed-off-by: NeilBrown

NeilBrown
2008-10-21 10:25:28 +0800
b62b75905 md: use sysfs_notify_dirent to notify changes to md/array_state ... Browse Code »

Now that we have sysfs_notify_dirent, use it to notify changes
to md/array_state.
As sysfs_notify_dirent can be called in atomic context, we can
remove the delayed notify and the MD_NOTIFY_ARRAY_STATE flag.

Signed-off-by: NeilBrown

NeilBrown
2008-10-21 10:25:21 +0800

13 Oct, 2008

4 commits

d710e1381 md: remove space after function name in declaration and call. ... Browse Code »

Having
function (args)
instead of
function(args)

make is harder to search for calls of particular functions.
So remove all those spaces.

Signed-off-by: NeilBrown

NeilBrown
2008-10-13 08:55:12 +0800
fb4d8c76e md: Remove unnecessary #includes, #defines, and function declarations. ... Browse Code »

A lot of cruft has gathered over the years. Time to remove it.

Signed-off-by: NeilBrown

NeilBrown
2008-10-13 08:55:12 +0800
ab5bd5cbc md: Convert remaining 1k representations in linear.c to sectors. ... Browse Code »

This patch renames hash_spacing and preshift to spacing and
sector_shift respectively with the following change of semantics:

Case 1: (sizeof(sector_t) sizeof(u32)).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(aka the shifting dance case). Here we have sector_shift = preshift +
1 and

spacing = 2 * hash_spacing

during the computation of nb_zone and curr_sector, but

spacing = hash_spacing

in which_dev() because in the last hunk of the patch for linear.c we
shift down conf->spacing (= 2 * hash_spacing) by one more bit than
in the old code.

Hence in the computation of nb_zone, sz and base have the same value
as before, so nb_zone is not affected. Also curr_sector in the next
hunk stays the same.

In which_dev() the hash table index is computed as

(sector >> sector_shift) / spacing

In view of sector_shift = preshift + 1 and spacing = hash_spacing,
this equals

((sector/2) >> preshift) / hash_spacing

which is the value computed by the old code.

Signed-off-by: Andre Noll
Signed-off-by: NeilBrown

Andre Noll
2008-10-13 08:55:12 +0800
6283815d1 md: linear: Represent dev_info->size and dev_info->offset in sectors. ... Browse Code »

Rename them to num_sectors and start_sector which is more descriptive.

Signed-off-by: Andre Noll
Signed-off-by: NeilBrown

Andre Noll
2008-10-13 08:55:12 +0800

24 Jul, 2008

1 commit

d8e64406a md: delay notification of 'active_idle' to the recovery thread ... Browse Code »

sysfs_notify might sleep, so do not call it from md_safemode_timeout.

Signed-off-by: Dan Williams

Dan Williams
2008-07-24 04:09:48 +0800

21 Jul, 2008

4 commits

4b80991c6 md: Protect access to mddev->disks list using RCU ... Browse Code »

All modifications and most access to the mddev->disks list are made
under the reconfig_mutex lock. However there are three places where
the list is walked without any locking. If a reconfig happens at this
time, havoc (and oops) can ensue.

So use RCU to protect these accesses:
- wrap them in rcu_read_{,un}lock()
- use list_for_each_entry_rcu
- add to the list with list_add_rcu
- delete from the list with list_del_rcu
- delay the 'free' with call_rcu rather than schedule_work

Note that export_rdev did a list_del_init on this list. In almost all
cases the entry was not in the list anymore so it was a no-op and so
safe. It is no longer safe as after list_del_rcu we may not touch
the list_head.
An audit shows that export_rdev is called:
- after unbind_rdev_from_array, in which case the delete has
already been done,
- after bind_rdev_to_array fails, in which case the delete isn't needed.
- before the device has been put on a list at all (e.g. in
add_new_disk where reading the superblock fails).
- and in autorun devices after a failure when the device is on a
different list.

So remove the list_del_init call from export_rdev, and add it back
immediately before the called to export_rdev for that last case.

Note also that ->same_set is sometimes used for lists other than
mddev->list (e.g. candidates). In these cases rcu is not needed.

Signed-off-by: NeilBrown

NeilBrown
2008-07-21 15:05:25 +0800
f2ea68cf4 md: only count actual openers as access which prevent a 'stop' ... Browse Code »

Open isn't the only thing that increments ->active. e.g. reading
/proc/mdstat will increment it briefly. So to avoid false positives
in testing for concurrent access, introduce a new counter that counts
just the number of times the md device it open.

Signed-off-by: NeilBrown

NeilBrown
2008-07-21 15:05:25 +0800
d6e221505 md: linear: Make array_size sector-based and rename it to array_sectors. ... Browse Code »

Signed-off-by: Andre Noll
Signed-off-by: NeilBrown

Andre Noll
2008-07-21 15:05:25 +0800
f233ea5c9 md: Make mddev->array_size sector-based. ... Browse Code »

This patch renames the array_size field of struct mddev_s to array_sectors
and converts all instances to use units of 512 byte sectors instead of 1k
blocks.

Signed-off-by: Andre Noll
Signed-off-by: NeilBrown

Andre Noll
2008-07-21 15:05:22 +0800

11 Jul, 2008

2 commits

7e93a8925 md: Remove some unused macros. ... Browse Code »

Signed-off-by: Andre Noll
Signed-off-by: Neil Brown

Andre Noll
2008-07-11 20:02:23 +0800
0f420358e md: Turn rdev->sb_offset into a sector-based quantity. ... Browse Code »

Rename it to sb_start to make sure all users have been converted.

Signed-off-by: Andre Noll
Signed-off-by: Neil Brown

Andre Noll
2008-07-11 20:02:23 +0800

01 Jul, 2008

1 commit

b5470dc5f md: resolve external metadata handling deadlock in md_allow_write ... Browse Code »

md_allow_write() marks the metadata dirty while holding mddev->lock and then
waits for the write to complete. For externally managed metadata this causes a
deadlock as userspace needs to take the lock to communicate that the metadata
update has completed.

Change md_allow_write() in the 'external' case to start the 'mark active'
operation and then return -EAGAIN. The expected side effects while waiting for
userspace to write 'active' to 'array_state' are holding off reshape (code
currently handles -ENOMEM), cause some 'stripe_cache_size' change requests to
fail, cause some GET_BITMAP_FILE ioctl requests to fall back to GFP_NOIO, and
cause updates to 'raid_disks' to fail. Except for 'stripe_cache_size' changes
these failures can be mitigated by coordinating with mdmon.

md_write_start() still prevents writes from occurring until the metadata
handler has had a chance to take action as it unconditionally waits for
MD_CHANGE_CLEAN to be cleared.

[neilb@suse.de: return -EAGAIN, try GFP_NOIO]
Signed-off-by: Dan Williams

Dan Williams
2008-07-01 08:18:19 +0800

28 Jun, 2008

3 commits

d8ee0728b md: replace R5_WantPrexor with R5_WantDrain, add 'prexor' reconstruct_states ... Browse Code »

From: Dan Williams

Currently ops_run_biodrain and other locations have extra logic to determine
which blocks are processed in the prexor and non-prexor cases. This can be
eliminated if handle_write_operations5 flags the blocks to be processed in all
cases via R5_Wantdrain. The presence of the prexor operation is tracked in
sh->reconstruct_state.

Signed-off-by: Dan Williams
Signed-off-by: Neil Brown

Dan Williams
2008-06-28 06:32:06 +0800
600aa1099 md: replace STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} with 'reconstruct_states' ... Browse Code »

From: Dan Williams

Track the state of reconstruct operations (recalculating the parity block
usually due to incoming writes, or as part of array expansion) Reduces the
scope of the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags to only tracking whether
a reconstruct operation has been requested via the ops_request field of struct
stripe_head_state.

This is the final step in the removal of ops.{pending,ack,complete,count}, i.e.
the STRIPE_OP_{BIODRAIN,PREXOR,POSTXOR} flags only request an operation and do
not track the state of the operation.

Signed-off-by: Dan Williams
Signed-off-by: Neil Brown

Dan Williams
2008-06-28 06:32:05 +0800
ecc65c9b3 md: replace STRIPE_OP_CHECK with 'check_states' ... Browse Code »

From: Dan Williams

The STRIPE_OP_* flags record the state of stripe operations which are
performed outside the stripe lock. Their use in indicating which
operations need to be run is straightforward; however, interpolating what
the next state of the stripe should be based on a given combination of
these flags is not straightforward, and has led to bugs. An easier to read
implementation with minimal degrees of freedom is needed.

Towards this goal, this patch introduces explicit states to replace what was
previously interpolated from the STRIPE_OP_* flags. For now this only converts
the handle_parity_checks5 path, removing a user of the
ops.{pending,ack,complete,count} fields of struct stripe_operations.

This conversion also found a remaining issue with the current code. There is
a small window for a drive to fail between when we schedule a repair and when
the parity calculation for that repair completes. When this happens we will
writeback to 'failed_num' when we really want to write back to 'pd_idx'.

Signed-off-by: Dan Williams
Signed-off-by: Neil Brown

Dan Williams
2008-06-28 06:31:57 +0800