Eric Lee / smarc-fsl-linux-kernel

15 May, 2008

1 commit

e7e72bf64 Remove blkdev warning triggered by using md ... Browse Code »

As setting and clearing queue flags now requires that we hold a spinlock
on the queue, and as blk_queue_stack_limits is called without that lock,
get the lock inside blk_queue_stack_limits.

For blk_queue_stack_limits to be able to find the right lock, each md
personality needs to set q->queue_lock to point to the appropriate lock.
Those personalities which didn't previously use a spin_lock, us
q->__queue_lock. So always initialise that lock when allocated.

With this in place, setting/clearing of the QUEUE_FLAG_PLUGGED bit will no
longer cause warnings as it will be clear that the proper lock is held.

Thanks to Dan Williams for review and fixing the silly bugs.

Signed-off-by: NeilBrown
Cc: Dan Williams
Cc: Jens Axboe
Cc: Alistair John Strachan
Cc: Nick Piggin
Cc: "Rafael J. Wysocki"
Cc: Jacek Luczak
Cc: Prakash Punnoor
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Brown
2008-05-15 10:11:15 +0800

30 Apr, 2008

1 commit

6bfe0b499 md: support blocking writes to an array on device failure ... Browse Code »

Allows a userspace metadata handler to take action upon detecting a device
failure.

Based on an original patch by Neil Brown.

Changes:
-added blocked_wait waitqueue to rdev
-don't qualify Blocked with Faulty always let userspace block writes
-added md_wait_for_blocked_rdev to wait for the block device to be clear, if
userspace misses the notification another one is sent every 5 seconds
-set MD_RECOVERY_NEEDED after clearing "blocked"
-kill DoBlock flag, just test mddev->external

Signed-off-by: Dan Williams
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Williams
2008-04-30 23:29:33 +0800

28 Apr, 2008

1 commit

d7a420c94 raid: remove leading TAB on printk messages ... Browse Code »

MD drivers use one printk() call to print 2 log messages and the second line
may be prefixed by a TAB character. It may also output a trailing space
before newline. klogd (I think) turns the TAB character into the 2 characters
'^I' when logging to a file. This looks ugly.

Instead of a leading TAB to indicate continuation, prefix both output lines
with 'raid:' or similar. Also remove any trailing space in the vicinity of
the affected code and consistently end the sentences with a period.

Signed-off-by: Nick Andrew
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Andrew
2008-04-28 23:58:42 +0800

05 Mar, 2008

2 commits

1c830532f md: fix possible raid1/raid10 deadlock on read error during resync ... Browse Code »

Thanks to K.Tanaka and the scsi fault injection framework, here is a fix for
another possible deadlock in raid1/raid10 error handing.

If a read request returns an error while a resync is happening and a resync
request is pending, the attempt to fix the error will block until the resync
progresses, and the resync will block until the read request completes. Thus
a deadlock.

This patch fixes the problem.

Cc: "K.Tanaka"
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2008-03-05 08:35:18 +0800
a35e63efa md: fix deadlock in md/raid1 and md/raid10 when handling a read error ... Browse Code »

When handling a read error, we freeze the array to stop any other IO while
attempting to over-write with correct data.

This is done in the raid1d(raid10d) thread and must wait for all submitted IO
to complete (except for requests that failed and are sitting in the retry
queue - these are counted in ->nr_queue and will stay there during a freeze).

However write requests need attention from raid1d as bitmap updates might be
required. This can cause a deadlock as raid1 is waiting for requests to
finish that themselves need attention from raid1d.

So we create a new function 'flush_pending_writes' to give that attention, and
call it in freeze_array to be sure that we aren't waiting on raid1d.

Thanks to "K.Tanaka" for finding and reporting this
problem.

Cc: "K.Tanaka"
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2008-03-05 08:35:17 +0800

07 Feb, 2008

3 commits

d089c6af1 md: change ITERATE_RDEV to rdev_for_each ... Browse Code »

As this is more in line with common practice in the kernel. Also swap the
args around to be more like list_for_each.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2008-02-07 02:41:19 +0800
c62072777 md: allow a maximum extent to be set for resyncing ... Browse Code »

This allows userspace to control resync/reshape progress and synchronise it
with other activities, such as shared access in a SAN, or backing up critical
sections during a tricky reshape.

Writing a number of sectors (which must be a multiple of the chunk size if
such is meaningful) causes a resync to pause when it gets to that point.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2008-02-07 02:41:18 +0800
b47490c9b md: Update md bitmap during resync. ... Browse Code »

Currently an md array with a write-intent bitmap does not updated that bitmap
to reflect successful partial resync. Rather the entire bitmap is updated
when the resync completes.

This is because there is no guarentee that resync requests will complete in
order, and tracking each request individually is unnecessarily burdensome.

However there is value in regularly updating the bitmap, so add code to
periodically pause while all pending sync requests complete, then update the
bitmap. Doing this only every few seconds (the same as the bitmap update
time) does not notciably affect resync performance.

[snitzer@gmail.com: export bitmap_cond_end_sync]
Signed-off-by: Neil Brown
Cc: "Mike Snitzer"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2008-02-07 02:41:18 +0800

09 Nov, 2007

1 commit

2ad8b1ef1 Add UNPLUG traces to all appropriate places ... Browse Code »

Added blk_unplug interface, allowing all invocations of unplugs to result
in a generated blktrace UNPLUG.

Signed-off-by: Alan D. Brunelle
Signed-off-by: Jens Axboe

Alan D. Brunelle
2007-11-09 20:41:32 +0800

20 Oct, 2007

1 commit

96de0e252 Convert files to UTF-8 and some cleanups ... Browse Code »

* Convert files to UTF-8.

* Also correct some people's names
(one example is Eißfeldt, which was found in a source file.
Given that the author used an ß at all in a source file
indicates that the real name has in fact a 'ß' and not an 'ss',
which is commonly used as a substitute for 'ß' when limited to
7bit.)

* Correct town names (Goettingen -> Göttingen)

* Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)

Signed-off-by: Jan Engelhardt
Signed-off-by: Adrian Bunk

Jan Engelhardt
2007-10-20 05:21:04 +0800

17 Oct, 2007

1 commit

cf7a44168 md: make sure read errors are auto-corrected during a 'check' resync in raid1 ... Browse Code »

Whenever a read error is found, we should attempt to overwrite with correct
data to 'fix' it.

However when do a 'check' pass (which compares data blocks that are
successfully read, but doesn't normally overwrite) we don't do that. We
should.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-10-17 23:43:03 +0800

16 Oct, 2007

1 commit

fd5d80626 block: convert blkdev_issue_flush() to use empty barriers ... Browse Code »

Then we can get rid of ->issue_flush_fn() and all the driver private
implementations of that.

Signed-off-by: Jens Axboe

Jens Axboe
2007-10-16 17:05:02 +0800

10 Oct, 2007

1 commit

6712ecf8f Drop 'size' argument from bio_endio and bi_end_io ... Browse Code »

As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant. Remove it.

Now there is no need for bio_endio to subtract the size completed
from bi_size. So don't do that either.

While we are at it, change bi_end_io to return void.

Signed-off-by: Neil Brown
Signed-off-by: Jens Axboe

NeilBrown
2007-10-10 15:25:57 +0800

23 Aug, 2007

2 commits

a88aa7865 md: correctly update sysfs when a raid1 is reshaped ... Browse Code »

When a raid1 array is reshaped (number of drives changed), the list of devices
is compacted, so that slots for missing devices are filled with working
devices from later slots. This requires the "rd%d" symlinks in sysfs to be
updated.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-08-23 10:52:46 +0800
918f02383 md: make sure a re-add after a restart honours bitmap when resyncing ... Browse Code »

Commit 1757128438d41670ded8bc3bc735325cc07dc8f9 was slightly bad. If an array
has a write-intent bitmap, and you remove a drive, then readd it, only the
changed parts should be resynced. However after the above commit, this only
works if the array has not been shut down and restarted.

This is because it sets 'fullsync' at little more often than it should. This
patch is more careful.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-08-23 10:52:46 +0800

24 Jul, 2007

1 commit

165125e1e [BLOCK] Get rid of request_queue_t typedef ... Browse Code »

Some of the code has been gradually transitioned to using the proper
struct request_queue, but there's lots left. So do a full sweet of
the kernel and get rid of this typedef and replace its uses with
the proper type.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-24 15:28:11 +0800

18 Jul, 2007

1 commit

4ad136637 md: change bitmap_unplug and others to void functions ... Browse Code »

bitmap_unplug only ever returns 0, so it may as well be void. Two callers try
to print a message if it returns non-zero, but that message is already printed
by bitmap_file_kick.

write_page returns an error which is not consistently checked. It always
causes BITMAP_WRITE_ERROR to be set on an error, and that can more
conveniently be checked.

When the return of write_page is checked, an error causes bitmap_file_kick to
be called - so move that call into write_page - and protect against recursive
calls into bitmap_file_kick.

bitmap_update_sb returns an error that is never checked.

So make these 'void' and be consistent about checking the bit.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-07-18 01:23:15 +0800

17 Jun, 2007

1 commit

ed4566627 md: fix bug in error handling during raid1 repair ... Browse Code »

If raid1/repair (which reads all block and fixes any differences it finds)
hits a read error, it doesn't reset the bio for writing before writing
correct data back, so the read error isn't fixed, and the device probably
gets a zero-length write which it might complain about.

Signed-off-by: Neil Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Accetta
2007-06-17 04:16:15 +0800

11 May, 2007

1 commit

dd00a99e7 md: avoid a possibility that a read error can wrongly propagate through md/raid1 to a filesystem. ... Browse Code »

When a raid1 has only one working drive, we want read error to propagate up
to the filesystem as there is no point failing the last drive in an array.

Currently the code perform this check is racy. If a write and a read a
both submitted to a device on a 2-drive raid1, and the write fails followed
by the read failing, the read will see that there is only one working drive
and will pass the failure up, even though the one working drive is actually
the *other* one.

So, tighten up the locking.

Signed-off-by: Neil Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-05-11 00:26:53 +0800

10 May, 2007

2 commits

44ce6294d Revert "md: improve partition detection in md array" ... Browse Code »

This reverts commit 5b479c91da90eef605f851508744bfe8269591a0.

Quoth Neil Brown:

"It causes an oops when auto-detecting raid arrays, and it doesn't
seem easy to fix.

The array may not be 'open' when do_md_run is called, so
bdev->bd_disk might be NULL, so bd_set_size can oops.

This whole approach of opening an md device before it has been
assembled just seems to get more and more painful. I think I'm going
to have to come up with something clever to provide both backward
comparability with usage expectation, and sane integration into the
rest of the kernel."

Signed-off-by: Linus Torvalds

Linus Torvalds
2007-05-10 09:51:36 +0800
5b479c91d md: improve partition detection in md array ... Browse Code »

md currently uses ->media_changed to make sure rescan_partitions
is call on md array after they are assembled.

However that doesn't happen until the array is opened, which is later
than some people would like.

So use blkdev_ioctl to do the rescan immediately that the
array has been assembled.

This means we can remove all the ->change infrastructure as it was only used
to trigger a partition rescan.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-05-10 03:30:57 +0800

27 Jan, 2007

2 commits

2a2275d63 [PATCH] md: fix potential memalloc deadlock in md ... Browse Code »

If a GFP_KERNEL allocation is attempted in md while the mddev_lock is held,
it is possible for a deadlock to eventuate.

This happens if the array was marked 'clean', and the memalloc triggers a
write-out to the md device.

For the writeout to succeed, the array must be marked 'dirty', and that
requires getting the mddev_lock.

So, before attempting a GFP_KERNEL allocation while holding the lock, make
sure the array is marked 'dirty' (unless it is currently read-only).

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-01-27 05:51:00 +0800
3eda22d19 [PATCH] md: make 'repair' actually work for raid1 ... Browse Code »

When 'repair' finds a block that is different one the various parts of the
mirror. it is meant to write a chosen good version to the others. However it
currently writes out the original data to each. The memcpy to make all the
data the same is missing.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-01-27 05:50:59 +0800

12 Jan, 2007

1 commit

e3881a681 [PATCH] md: pass down BIO_RW_SYNC in raid{1,10} ... Browse Code »

md raidX make_request functions strip off the BIO_RW_SYNC flag, thus
introducing additional latency.

Fixing this in raid1 and raid10 seems to be straightforward enough.

For our particular usage case in DRBD, passing this flag improved some
initialization time from ~5 minutes to ~5 seconds.

Acked-by: NeilBrown
Signed-off-by: Lars Ellenberg
Acked-by: Jens Axboe
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lars Ellenberg
2007-01-12 10:18:21 +0800

14 Dec, 2006

1 commit

802ba064c [PATCH] md: Don't assume that READ==0 and WRITE==1 - use the names explicitly ... Browse Code »

Thanks Jens for alerting me to this.

Cc: Jens Axboe
Cc:
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-12-14 01:05:48 +0800

11 Dec, 2006

1 commit

175712843 [PATCH] md: assorted md and raid1 one-liners ... Browse Code »

Fix few bugs that meant that:
- superblocks weren't alway written at exactly the right time (this
could show up if the array was not written to - writting to the array
causes lots of superblock updates and so hides these errors).

- restarting device recovery after a clean shutdown (version-1 metadata
only) didn't work as intended (or at all).

1/ Ensure superblock is updated when a new device is added.
2/ Remove an inappropriate test on MD_RECOVERY_SYNC in md_do_sync.
The body of this if takes one of two branches depending on whether
MD_RECOVERY_SYNC is set, so testing it in the clause of the if
is wrong.
3/ Flag superblock for updating after a resync/recovery finishes.
4/ If we find the neeed to restart a recovery in the middle (version-1
metadata only) make sure a full recovery (not just as guided by
bitmaps) does get done.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-12-11 01:57:21 +0800

29 Oct, 2006

1 commit

969b755aa [PATCH] md: fix printk format warnings, seen on powerpc64: ... Browse Code »

drivers/md/raid1.c:1479: warning: long long unsigned int format, long unsigned int arg (arg 4)
drivers/md/raid10.c:1475: warning: long long unsigned int format, long unsigned int arg (arg 4)

Signed-off-by: Randy Dunlap
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2006-10-29 02:30:52 +0800

03 Oct, 2006

5 commits

0d1292282 [PATCH] md: define ->congested_fn for raid1, raid10, and multipath ... Browse Code »

raid1, raid10 and multipath don't report their 'congested' status through
bdi_*_congested, but should.

This patch adds the appropriate functions which just check the 'congested'
status of all active members (with appropriate locking).

raid1 read_balance should be modified to prefer devices where
bdi_read_congested returns false. Then we could use the '&' branch rather
than the '|' branch. However that should would need some benchmarking first
to make sure it is actually a good idea.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-03 23:04:18 +0800
c04be0aa8 [PATCH] md: Improve locking around error handling ... Browse Code »

The error handling routines don't use proper locking, and so two concurrent
errors could trigger a problem.

So:
- use test-and-set and test-and-clear to synchonise
the In_sync bits with the ->degraded count
- use the spinlock to protect updates to the
degraded count (could use an atomic_t but that
would be a bigger change in code, and isn't
really justified)
- remove un-necessary locking in raid5

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-03 23:04:18 +0800
11ce99e62 [PATCH] md: Remove working_disks from raid1 state data ... Browse Code »

It is equivalent to conf->raid_disks - conf->mddev->degraded.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-03 23:04:17 +0800
867868fb5 [PATCH] md: Factor out part of raid1d into a separate function ... Browse Code »

raid1d has toooo many nested block, so take the fix_read_error functionality
out into a separate function.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-03 23:04:17 +0800
850b2b420 [PATCH] md: replace magic numbers in sb_dirty with well defined bit flags ... Browse Code »

Instead of magic numbers (0,1,2,3) in sb_dirty, we have
some flags instead:
MD_CHANGE_DEVS
Some device state has changed requiring superblock update
on all devices.
MD_CHANGE_CLEAN
The array has transitions from 'clean' to 'dirty' or back,
requiring a superblock update on active devices, but possibly
not on spares
MD_CHANGE_PENDING
A superblock update is underway.

We wait for an update to complete by waiting for all flags to be clear. A
flag can be set at any time, even during an update, without risk that the
change will be lost.

Stop exporting md_update_sb - isn't needed.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-03 23:04:17 +0800

02 Sep, 2006

1 commit

ddac7c7e3 [PATCH] md: Fix issues with referencing rdev in md/raid1 ... Browse Code »

We need to be careful when referencing mirrors[i].rdev. It can disappear
under us at various times.

So:
fix a couple of problem places.
comment a couple of non-problem places
move an 'atomic_add' which deferences rdev down a little
way to some where where it is sure to not be NULL.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-09-02 02:39:08 +0800

28 Aug, 2006

1 commit

6394cca54 [PATCH] md: fix recent breakage of md/raid1 array checking ... Browse Code »

A recent patch broke the ability to do a user-request check of a raid1.
This patch fixes the breakage and also moves a comment that was dislocated
by the same patch.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-08-28 02:01:31 +0800

11 Jul, 2006

2 commits

d69504325 [PATCH] md: include sector number in messages about corrected read errors ... Browse Code »

This is generally useful, but particularly helps see if it is the same sector
that always needs correcting, or different ones.

[akpm@osdl.org: fix printk warnings]
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-07-11 04:24:17 +0800
5e3db645f [PATCH] md: fix usage of wrong variable in raid1 ... Browse Code »

Though it rarely matters, we should be using 's' rather than r1_bio->sector
here.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-07-11 04:24:17 +0800

27 Jun, 2006

3 commits

07d84d109 [PATCH] md: Allow re-add to work on array without bitmaps ... Browse Code »

When an array has a bitmap, a device can be removed and re-added and only
blocks changes since the removal (as recorded in the bitmap) will be resynced.

It should be possible to do a similar thing to arrays without bitmaps. i.e.
if a device is removed and re-added and *no* changes have been made in the
interim, then the add should not require a resync.

This patch allows that option. This means that when assembling an array one
device at a time (e.g. during device discovery) the array can be enabled
read-only as soon as enough devices are available, but extra devices can still
be added without causing a resync.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-27 00:58:39 +0800
5fd6c1dce [PATCH] md: allow checkpoint of recovery with version-1 superblock ... Browse Code »

For a while we have had checkpointing of resync. The version-1 superblock
allows recovery to be checkpointed as well, and this patch implements that.

Due to early carelessness we need to add a feature flag to signal that the
recovery_offset field is in use, otherwise older kernels would assume that a
partially recovered array is in fact fully recovered.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-27 00:58:37 +0800
c70810b32 [PATCH] md: reformat code in raid1_end_write_request to avoid goto ... Browse Code »

A recent change made this goto unnecessary, so reformat the code to make it
clearer what is happening.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-27 00:58:36 +0800

02 May, 2006

1 commit

5e7dd2ab6 [PATCH] md: Fix 'rdev->nr_pending' count when retrying barrier requests ... Browse Code »

When retrying a failed BIO_RW_BARRIER request, we need to keep the reference
in ->nr_pending over the whole retry. Currently, we only hold the reference
if the failed request is the *last* one to finish - which is silly, because it
would normally be the first to finish.

So move the rdev_dec_pending call up into the didn't-fail branch. As the rdev
isn't used in the later code, calling rdev_dec_pending earlier doesn't hurt.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-05-02 09:17:42 +0800