Eric Lee / smarc-fsl-linux-kernel

02 May, 2006

5 commits

5e7dd2ab6 [PATCH] md: Fix 'rdev->nr_pending' count when retrying barrier requests ... Browse Code »

When retrying a failed BIO_RW_BARRIER request, we need to keep the reference
in ->nr_pending over the whole retry. Currently, we only hold the reference
if the failed request is the *last* one to finish - which is silly, because it
would normally be the first to finish.

So move the rdev_dec_pending call up into the didn't-fail branch. As the rdev
isn't used in the later code, calling rdev_dec_pending earlier doesn't hurt.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-05-02 09:17:42 +0800
62de608da [PATCH] md: Improve detection of lack of barrier support in raid1 ... Browse Code »

Move the test for 'do barrier work' down a bit so that if the first write to a
raid1 is a BIO_RW_BARRIER write, the checking done by superblock writes will
cause the right thing to happen.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-05-02 09:17:42 +0800
bea277187 [PATCH] md: Change ENOTSUPP to EOPNOTSUPP ... Browse Code »

Because that is what you get if a BIO_RW_BARRIER isn't supported!

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-05-02 09:17:42 +0800
e0a33270e [PATCH] md: Fixed refcounting/locking when attempting read error correction in raid10 ... Browse Code »

We need to hold a reference to rdevs while reading and writing to attempt to
correct read errors. This reference must be taken under an rcu lock.

Signed-off-by: Neil Brown
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-05-02 09:17:42 +0800
df30d0f4c [PATCH] md: Avoid oops when attempting to fix read errors on raid10 ... Browse Code »

We should add to the counter for the rdev *after* checking if the rdev is
NULL!!!

Signed-off-by: Neil Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-05-02 09:17:42 +0800

20 Apr, 2006

1 commit

5dc5cf7dd [PATCH] md: locking fix ... Browse Code »

- fix mddev_lock() usage bugs in md_attr_show() and md_attr_store().
[they did not anticipate the possibility of getting a signal]

- remove mddev_lock_uninterruptible() [unused]

Signed-off-by: Ingo Molnar
Acked-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-04-20 22:54:04 +0800

15 Apr, 2006

1 commit

4508a7a73 [PATCH] sysfs: Allow sysfs attribute files to be pollable ... Browse Code »

It works like this:
Open the file
Read all the contents.
Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
When poll returns,
close the file and go to top of loop.
or lseek to start of file and go back to the 'read'.

Events are signaled by an object manager calling
sysfs_notify(kobj, dir, attr);

If the dir is non-NULL, it is used to find a subdirectory which
contains the attribute (presumably created by sysfs_create_group).

This has a cost of one int per attribute, one wait_queuehead per kobject,
one int per open file.

The name "sysfs_notify" may be confused with the inotify
functionality. Maybe it would be nice to support inotify for sysfs
attributes as well?

This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
to be pollable

Signed-off-by: Neil Brown
Signed-off-by: Greg Kroah-Hartman

NeilBrown
2006-04-15 02:41:24 +0800

11 Apr, 2006

1 commit

6f91fe88e [PATCH] md: make sure 64bit fields in version-1 metadata are 64-bit aligned ... Browse Code »

reshape_position is a 64bit field that was not 64bit aligned. So swap with
new_level.

NOTE: this is a user-visible change. However:
- The bad code has not appeared in a released kernel
- This code is still marked 'experimental'
- This only affects version-1 superblock, which are not in wide use
- These field are only used (rather than simply reported) by user-space
tools in extemely rare circumstances : after a reshape crashes in the
first second of the reshape process.

So I believe that, at this stage, the change is safe. Especially if people
heed the 'help' message on use mdadm-2.4.1.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-04-11 21:18:30 +0800

02 Apr, 2006

3 commits

b63854838 BUG_ON() Conversion in md/raid10.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-02 19:34:29 +0800
43dab9bbe BUG_ON() Conversion in md/raid6main.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-02 19:33:30 +0800
78bafebd4 BUG_ON() Conversion in md/raid5.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-02 19:31:42 +0800

01 Apr, 2006

5 commits

9e77c485f BUG_ON() Conversion in md/raid1.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-01 07:08:49 +0800
543cb2a45 BUG_ON() Conversion in md/dm-target.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-01 07:08:12 +0800
ec350a7fc [PATCH] md: Raid-6 did not create sysfs entries for stripe cache ... Browse Code »

Signed-off-by: Brad Campbell
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-04-01 04:19:01 +0800
926ce2d8a [PATCH] md: Remove some code that can sleep from under a spinlock ... Browse Code »

And remove the comments that were put in inplace of a fix too....

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-04-01 04:19:01 +0800
6b1117d50 [PATCH] md: Don't clear bits in bitmap when writing to one device fails during recovery ... Browse Code »

Currently a device failure during recovery leaves bits set in the bitmap.
This normally isn't a problem as the offending device will be rejected because
of errors. However if device re-adding is being used with non-persistent
bitmaps, this can be a problem.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-04-01 04:19:01 +0800

28 Mar, 2006

24 commits

df5b89b32 [PATCH] md: Convert reconfig_sem to reconfig_mutex ... Browse Code »

... being careful that mutex_trylock is inverted wrt down_trylock

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:03 +0800
48c9c27b8 [PATCH] sem2mutex: drivers/md ... Browse Code »

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2006-03-28 00:45:03 +0800
2f889129d [PATCH] md: Restore 'remaining' count when retrying an write operation ... Browse Code »

When retrying a write due to barrier failure, we don't reset 'remaining', so
it goes negative and never hits 0 again.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:03 +0800
8ddeeae51 [PATCH] md: Fix md grow/size code to correctly find the maximum available space ... Browse Code »

An md array can be asked to change the amount of each device that it is using,
and in particular can be asked to use the maximum available space. This
currently only works if the first device is not larger than the rest. As
'size' gets changed and so 'fit' becomes wrong. So check if a 'fit' is
required early and don't corrupt it.

Signed-off-by: Doug Ledford
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:03 +0800
f6344757a [PATCH] md: Remove bi_end_io call out from under a spinlock ... Browse Code »

raid5 overloads bi_phys_segments to count the number of blocks that the
request was broken in to so that it knows when the bio is completely handled.

Accessing this must always be done under a spinlock. In one case we also call
bi_end_io under that spinlock, which probably isn't ideal as bi_end_io could
be expensive (even though it isn't allowed to sleep).

So we reducde the range of the spinlock to just accessing bi_phys_segments.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:03 +0800
b3b46be38 [PATCH] md: Remove some stray semi-colons after functions called in macro.. ... Browse Code »

wait_event_lock_irq puts a ';' after its usage of the 4th arg, so we don't
need to.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:02 +0800
df8e7f763 [PATCH] md: Improve comments about locking situation in raid5 make_request ... Browse Code »

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:02 +0800
e464eafdb [PATCH] md: Support suspending of IO to regions of an md array ... Browse Code »

This allows user-space to access data safely. This is needed for raid5
reshape as user-space needs to take a backup of the first few stripes before
allowing reshape to commence.

It will also be useful in cluster-aware raid1 configurations so that all
cluster members can leave a section of the array untouched while a
resync/recovery happens.

A 'start' and 'end' of the suspended range are written to 2 sysfs attributes.
Note that only one range can be suspended at a time.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:02 +0800
16484bf59 [PATCH] md: Make 'reshape' a possible sync_action action ... Browse Code »

This allows reshape to be triggerred via sysfs (which is the only way to start
it happening).

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:02 +0800
63c70c4f3 [PATCH] md: Split reshape handler in check_reshape and start_reshape ... Browse Code »

check_reshape checks validity and does things that can be done instantly -
like adding devices to raid1. start_reshape initiates a restriping process to
convert the whole array.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:02 +0800
b578d55fd [PATCH] md: Only checkpoint expansion progress occasionally ... Browse Code »

Instead of checkpointing at each stripe, only checkpoint when a new write
would overwrite uncheckpointed data. Block any write to the uncheckpointed
area. Arbitrarily checkpoint at least every 3Meg.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:02 +0800
f67055780 [PATCH] md: Checkpoint and allow restart of raid5 reshape ... Browse Code »

We allow the superblock to record an 'old' and a 'new' geometry, and a
position where any conversion is up to. The geometry allows for changing
chunksize, layout and level as well as number of devices.

When using verion-0.90 superblock, we convert the version to 0.91 while the
conversion is happening so that an old kernel will refuse the assemble the
array. For version-1, we use a feature bit for the same effect.

When starting an array we check for an incomplete reshape and restart the
reshape process if needed. If the reshape stopped at an awkward time (like
when updating the first stripe) we refuse to assemble the array, and let
user-space worry about it.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:01 +0800
292695531 [PATCH] md: Final stages of raid5 expand code ... Browse Code »

This patch adds raid5_reshape and end_reshape which will start and finish the
reshape processes.

raid5_reshape is only enabled in CONFIG_MD_RAID5_RESHAPE is set, to discourage
accidental use.

Read the 'help' for the CONFIG_MD_RAID5_RESHAPE entry.

and Make sure that you have backups, just in case.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:01 +0800
ccfcc3c10 [PATCH] md: Core of raid5 resize process ... Browse Code »

This patch provides the core of the resize/expand process.

sync_request notices if a 'reshape' is happening and acts accordingly.

It allocated new stripe_heads for the next chunk-wide-stripe in the target
geometry, marking them STRIPE_EXPANDING.

Then it finds which stripe heads in the old geometry can provide data needed
by these and marks them STRIPE_EXPAND_SOURCE. This causes stripe_handle to
read all blocks on those stripes.

Once all blocks on a STRIPE_EXPAND_SOURCE stripe_head are read, any that are
needed are copied into the corresponding STRIPE_EXPANDING stripe_head. Once a
STRIPE_EXPANDING stripe_head is full, it is marks STRIPE_EXPAND_READY and then
is written out and released.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:01 +0800
7ecaa1e6a [PATCH] md: Infrastructure to allow normal IO to continue while array is expanding ... Browse Code »

We need to allow that different stripes are of different effective sizes, and
use the appropriate size. Also, when a stripe is being expanded, we must
block any IO attempts until the stripe is stable again.

Key elements in this change are:
- each stripe_head gets a 'disk' field which is part of the key,
thus there can sometimes be two stripe heads of the same area of
the array, but covering different numbers of devices. One of these
will be marked STRIPE_EXPANDING and so won't accept new requests.
- conf->expand_progress tracks how the expansion is progressing and
is used to determine whether the target part of the array has been
expanded yet or not.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:01 +0800
ad01c9e37 [PATCH] md: Allow stripes to be expanded in preparation for expanding an array ... Browse Code »

Before a RAID-5 can be expanded, we need to be able to expand the stripe-cache
data structure.

This requires allocating new stripes in a new kmem_cache. If this succeeds,
we copy cache pages over and release the old stripes and kmem_cache.

We then allocate new pages. If that fails, we leave the stripe cache at it's
new size. It isn't worth the effort to shrink it back again.

Unfortuanately this means we need two kmem_cache names as we, for a short
period of time, we have two kmem_caches. So they are raid5/%s and
raid5/%s-alt

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:01 +0800
b55e6bfcd [PATCH] md: Split disks array out of raid5 conf structure so it is easier to grow ... Browse Code »

The remainder of this batch implements raid5 reshaping. Currently the only
shape change that is supported is added a device, but it is envisioned that
changing the chunksize and layout will also be supported, as well as changing
the level (e.g. 1->5, 5->6).

The reshape process naturally has to move all of the data in the array, and so
should be used with caution. It is believed to work, and some testing does
support this, but wider testing would be great for increasing my confidence.

You will need a version of mdadm newer than 2.3.1 to make use of raid5 growth.
This is because mdadm need to take a copy of a 'critical section' at the
start of the array incase there is a crash at an awkward moment. On restart,
mdadm will restore the critical section and allow reshape to continue.

I hope to release a 2.4-pre by early next week - it still needs a little more
polishing.

This patch:

Previously the array of disk information was included in the raid5 'conf'
structure which was allocated to an appropriate size. This makes it awkward
to change the size of that array. So we split it off into a separate
kmalloced array which will require a little extra indexing, but is much easier
to grow.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:01 +0800
4588b42e9 [PATCH] md: Update status_resync to handle LARGE devices ... Browse Code »

status_resync - used by /proc/mdstat to report the status of a resync, assumes
that device sizes will always fit into an 'unsigned long' This is no longer
the case...

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:01 +0800
1be7892ff [PATCH] md: Fix the 'failed' count for version-0 superblocks ... Browse Code »

We are counting failed devices twice, once of the device that is failed, and
once for the hole that has been left in the array. Remove the former so
'failed' matches 'missing'. Storing these counts in the superblock is a bit
silly anyway....

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:00 +0800
c5a10f62c [PATCH] md: Add '4' to the list of levels for which bitmaps are supported ... Browse Code »

I really should make this a function of the personality....

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:00 +0800
89e5c8b5b [PATCH] md: Make sure QUEUE_FLAG_CLUSTER is set properly for md. ... Browse Code »

This flag should be set for a virtual device iff it is set for all underlying
devices.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-03-28 00:45:00 +0800
a22c96c73 [PATCH] dm: remove unnecessary typecast ... Browse Code »

Signed-off-by: Kevin Corry
Cc: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kevin Corry
2006-03-28 00:45:00 +0800
f165921df [PATCH] dm/md dependency tree in sysfs: dm to use bd_claim_by_disk ... Browse Code »

Use bd_claim_by_disk.

Following symlinks are created if dm-0 maps to sda:
/sys/block/dm-0/slaves/sda --> /sys/block/sda
/sys/block/sda/holders/dm-0 --> /sys/block/dm-0

Signed-off-by: Jun'ichi Nomura
Cc: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jun'ichi Nomura
2006-03-28 00:45:00 +0800
5463c7904 [PATCH] dm/md dependency tree in sysfs: md to use bd_claim_by_disk ... Browse Code »

Use bd_claim_by_disk.

Following symlinks are created if md0 is built from sda and sdb
/sys/block/md0/slaves/sda --> /sys/block/sda
/sys/block/md0/slaves/sdb --> /sys/block/sdb
/sys/block/sda/holders/md0 --> /sys/block/md0
/sys/block/sdb/holders/md0 --> /sys/block/md0

Signed-off-by: Jun'ichi Nomura
Cc: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jun'ichi Nomura
2006-03-28 00:45:00 +0800