Eric Lee / smarc-fsl-linux-kernel

16 Jan, 2012

1 commit

b3c9dd182 Merge branch 'for-3.3/core' of git://git.kernel.dk/linux-block ... Browse Code »

* 'for-3.3/core' of git://git.kernel.dk/linux-block: (37 commits)
Revert "block: recursive merge requests"
block: Stop using macro stubs for the bio data integrity calls
blockdev: convert some macros to static inlines
fs: remove unneeded plug in mpage_readpages()
block: Add BLKROTATIONAL ioctl
block: Introduce blk_set_stacking_limits function
block: remove WARN_ON_ONCE() in exit_io_context()
block: an exiting task should be allowed to create io_context
block: ioc_cgroup_changed() needs to be exported
block: recursive merge requests
block, cfq: fix empty queue crash caused by request merge
block, cfq: move icq creation and rq->elv.icq association to block core
block, cfq: restructure io_cq creation path for io_context interface cleanup
block, cfq: move io_cq exit/release to blk-ioc.c
block, cfq: move icq cache management to block core
block, cfq: move io_cq lookup to blk-ioc.c
block, cfq: move cfqd->icq_list to request_queue and add request->elv.icq
block, cfq: reorganize cfq_io_context into generic and cfq specific parts
block: remove elevator_queue->ops
block: reorder elevator switch sequence
...

Fix up conflicts in:
- block/blk-cgroup.c
Switch from can_attach_task to can_attach
- block/cfq-iosched.c
conflict with now removed cic index changes (we now use q->id instead)

Linus Torvalds
2012-01-16 04:24:45 +0800

15 Jan, 2012

1 commit

ec8013bed dm: do not forward ioctls from logical volumes to the underlying device ... Browse Code »
1

A logical volume can map to just part of underlying physical volume.
In this case, it must be treated like a partition.

Based on a patch from Alasdair G Kergon.

Cc: Alasdair G Kergon
Cc: dm-devel@redhat.com
Signed-off-by: Paolo Bonzini
Signed-off-by: Linus Torvalds

Paolo Bonzini
2012-01-15 07:07:24 +0800

12 Jan, 2012

1 commit

c086ae4ed Merge tag 'md-3.3-fixes' of git://neil.brown.name/md ... Browse Code »

Two bugfixes for md.

One is a recently introduced regression that affects an unusual
configuration with a guaranteed BUG_ON. Has been tagged for -stable.
The other is minor missing functionality.

* tag 'md-3.3-fixes' of git://neil.brown.name/md:
md/raid1: perform bad-block tests for WriteMostly devices too.
md: notify the 'degraded' sysfs attribute on failure.

Linus Torvalds
2012-01-12 10:51:55 +0800

11 Jan, 2012

3 commits

b1bd055d3 block: Introduce blk_set_stacking_limits function ... Browse Code »

Stacking driver queue limits are typically bounded exclusively by the
capabilities of the low level devices, not by the stacking driver
itself.

This patch introduces blk_set_stacking_limits() which has more liberal
metrics than the default queue limits function. This allows us to
inherit topology parameters from bottom devices without manually
tweaking the default limits in each driver prior to calling the stacking
function.

Since there is now a clear distinction between stacking and low-level
devices, blk_set_default_limits() has been modified to carry the more
conservative values that we used to manually set in
blk_queue_make_request().

Signed-off-by: Martin K. Petersen
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe

Martin K. Petersen
2012-01-11 23:27:11 +0800
307729c8b md/raid1: perform bad-block tests for WriteMostly devices too. ... Browse Code »

We normally try to avoid reading from write-mostly devices, but when
we do we really have to check for bad blocks and be sure not to
try reading them.

With the current code, best_good_sectors might not get set and that
causes zero-length read requests to be send down which is very
confusing.

This bug was introduced in commit d2eb35acfdccbe2 and so the patch
is suitable for 3.1.x and 3.2.x

Reported-and-tested-by: Michał Mirosław
Reported-and-tested-by: Art -kwaak- van Breemen
Signed-off-by: NeilBrown
Cc: stable@vger.kernel.org

NeilBrown
2012-01-11 05:35:17 +0800
f2a371c5e md: notify the 'degraded' sysfs attribute on failure. ... Browse Code »

We currently only 'notify' changes to the 'degraded' attribute
when it decreases, not when it increases.

Notifying on failure is a little awkward as it happen in
interrupt context.
So instead, notify when we remove the failed device from the array,
which is very soon afterwards.

Reported-and-tested-by: Mikhail Balabin
Signed-off-by: NeilBrown

NeilBrown
2012-01-11 05:35:14 +0800

09 Jan, 2012

1 commit

2943c8332 Merge tag 'md-3.3' of git://neil.brown.name/md ... Browse Code »

md update for 3.3

Big change is new hot-replacement.
A slot in an array can hold 2 devices - one that
wants-replacement and one that is the replacement.
Once the replacement is built - either from the
original or (in the case of errors) from elsewhere,
the wants-replacement device will be removed.

* tag 'md-3.3' of git://neil.brown.name/md: (36 commits)
md/raid1: Mark device want_replacement when we see a write error.
md/raid1: If there is a spare and a want_replacement device, start replacement.
md/raid1: recognise replacements when assembling arrays.
md/raid1: handle activation of replacement device when recovery completes.
md/raid1: Allow a failed replacement device to be removed.
md/raid1: Allocate spare to store replacement devices and their bios.
md/raid1: Replace use of mddev->raid_disks with conf->raid_disks.
md/raid10: If there is a spare and a want_replacement device, start replacement.
md/raid10: recognise replacements when assembling array.
md/raid10: Allow replacement device to be replace old drive.
md/raid10: handle recovery of replacement devices.
md/raid10: Handle replacement devices during resync.
md/raid10: writes should get directed to replacement as well as original.
md/raid10: allow removal of failed replacement devices.
md/raid10: preferentially read from replacement device if possible.
md/raid10: change read_balance to return an rdev
md/raid10: prepare data structures for handling replacement.
md/raid5: Mark device want_replacement when we see a write error.
md/raid5: If there is a spare and a want_replacement device, start replacement.
md/raid5: recognise replacements when assembling array.
...

Linus Torvalds
2012-01-09 05:28:33 +0800

04 Jan, 2012

1 commit

ff01bb483 fs: move code out of buffer.c ... Browse Code »

Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
kill_bdev as well, so brd doesn't have to open code it. Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving. The small comment replacing it says enough.

Signed-off-by: Nick Piggin
Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Al Viro
2012-01-04 11:54:07 +0800

23 Dec, 2011

32 commits

19d671695 md/raid1: Mark device want_replacement when we see a write error. ... Browse Code »

Now that WantReplacement drives are replaced cleanly, mark a drive
as want_replacement when we see a write error. It might get failed soon so
the WantReplacement flag is irrelevant, but if the write error is recorded
in the bad block log, we still want to activate any spare that might
be available.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:57 +0800
7ef449d1e md/raid1: If there is a spare and a want_replacement device, start replacement. ... Browse Code »

When attempting to add a spare to a RAID1 array, also consider
adding it as a replacement for a want_replacement device.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:57 +0800
c19d57980 md/raid1: recognise replacements when assembling arrays. ... Browse Code »

If a Replacement is seen, file it as such.

If we see two replacements (or two normal devices) for the one slot,
abort.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:57 +0800
8c7a2c2bc md/raid1: handle activation of replacement device when recovery completes. ... Browse Code »

When recovery completes ->spare_active is called.
This checks if the replacement is ready and if so it fails
the original.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:57 +0800
b014f14c8 md/raid1: Allow a failed replacement device to be removed. ... Browse Code »

Replacement devices are stored at a different offset, so look
there too.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:56 +0800
8f19ccb2f md/raid1: Allocate spare to store replacement devices and their bios. ... Browse Code »

In RAID1, a replacement is much like a normal device, so we just
double the size of the relevant arrays and look at all possible
devices for reads and writes.

This means that the array looks like it is now double the size in some
way - we need to be careful about that.
In particular, we checking if the array is still degraded while
creating a recovery request we need to only consider the first 'half'
- i.e. the real (non-replacement) devices.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:56 +0800
301946364 md/raid1: Replace use of mddev->raid_disks with conf->raid_disks. ... Browse Code »

In general mddev->raid_disks can change unexpectedly while
conf->raid_disks will only change in a very controlled way. So change
some uses of one to the other.

The use of mddev->raid_disks will not cause actually problems but
this way is more consistent and safer in the long term.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:56 +0800
b7044d41b md/raid10: If there is a spare and a want_replacement device, start replacement. ... Browse Code »

When attempting to add a spare to a RAID10 array, also consider
adding it as a replacement for a want_replacement device.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:56 +0800
56a2559bb md/raid10: recognise replacements when assembling array. ... Browse Code »
43

If a Replacement is seen, file it as such.

If we see two replacements (or two normal devices) for the one slot,
abort.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:55 +0800
4ca40c2ce md/raid10: Allow replacement device to be replace old drive. ... Browse Code »
43

When recovery finish and spare_active is called, check for a
replace that might have just become fully synced and mark it
as such, marking the original as failed.

Then when the original is removed, move the replacement into
its position.

This means that 'replacement' and spontaneously become NULL in some
situations. Make sure we check for those.
It also means that 'rdev' and 'replacement' could appear to be
identical - check for that too.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:55 +0800
24afd80d9 md/raid10: handle recovery of replacement devices. ... Browse Code »

If there is a replacement device, then recover to it,
reading from any drives - maybe the one being replaced, maybe not.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:55 +0800
9ad1aefc8 md/raid10: Handle replacement devices during resync. ... Browse Code »

If we need to resync an array which has replacement devices,
we always write any block checked to every replacement.

If the resync was bitmap-based resync we will then complete the
replacement normally.
If it was a full resync, we mark the replacements as fully recovered
when the resync finishes so no further recovery is needed.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:55 +0800
475b0321a md/raid10: writes should get directed to replacement as well as original. ... Browse Code »

When writing, we need to submit two writes, one to the original,
and one to the replacements - if there is a replacement.

If the write to the replacement results in a write error we just
fail the device. We only try to record write errors to the
original.

This only handles writing new data. Writing for resync/recovery
will come later.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:55 +0800
c8ab903ea md/raid10: allow removal of failed replacement devices. ... Browse Code »

Enhance raid10_remove_disk to be able to remove ->replacement
as well as ->rdev

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:54 +0800
abbf098e6 md/raid10: preferentially read from replacement device if possible. ... Browse Code »

When reading (for array reads, not for recovery etc) we read from the
replacement device if it has recovered far enough.
This requires storing the chosen rdev in the 'r10_bio' so we can make
sure to drop the ref on the right device when the read finishes.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:54 +0800
96c3fd1f3 md/raid10: change read_balance to return an rdev ... Browse Code »

It makes more sense to return an rdev than just an index as
read_balance() gets a reference to the rdev and so returning
the pointer make this more idiomatic.

This will be needed in a future patch when we might return
a 'replacement' rdev instead of the main rdev.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:54 +0800
69335ef3b md/raid10: prepare data structures for handling replacement. ... Browse Code »

Allow each slot in the RAID10 to have 2 devices, the want_replacement
and the replacement.

Also an r10bio to have 2 bios, and for resync/recovery allocate the
second bio if there are any replacement devices.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:54 +0800
3a6de2924 md/raid5: Mark device want_replacement when we see a write error. ... Browse Code »

Now that WantReplacement drives are replaced cleanly, mark a drive
as WantReplacement when we see a write error. It might get failed soon so
the WantReplacement flag is irrelevant, but if the write error is recorded
in the bad block log, we still want to activate any spare that might
be available.

Reviewed-by: Dan Williams
Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:54 +0800
7bfec5f35 md/raid5: If there is a spare and a want_replacement device, start replacement. ... Browse Code »
43

When attempting to add a spare to a RAID[456] array, also consider
adding it as a replacement for a want_replacement device.

This requires that common md code attempt hot_add even when the array
is not formally degraded.

Reviewed-by: Dan Williams
Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:53 +0800
17045f52a md/raid5: recognise replacements when assembling array. ... Browse Code »

If a Replacement is seen, file it as such.

If we see two replacements (or two normal devices) for the one slot,
abort.

Reviewed-by: Dan Williams
Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:53 +0800
dd054fce8 md/raid5: handle activation of replacement device when recovery completes. ... Browse Code »

When recovery completes - as reported by a call to ->spare_active,
we clear In_sync on the original and set it on the replacement.

Then when the original gets removed we move the replacement from
'replacement' to 'rdev'.

This could race with other code that is looking at these pointers,
so we use memory barriers and careful ordering to ensure that
a reader might see one device twice, but never no devices.
Then the readers guard against using both devices, which could
only happen when writing.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:53 +0800
9a3e1101b md/raid5: detect and handle replacements during recovery. ... Browse Code »
43

During recovery we want to write to the replacement but not
the original. So we have two new flags
- R5_NeedReplace if this stripe has a replacement that needs to
be written at some stage
- R5_WantReplace if NeedReplace, and the data is available, and
a 'sync' has been requested on this stripe.

We also distinguish between 'sync and replace' which need to read
all other devices, and 'replace' which only needs to read the
devices being replaced.

Note that during resync we always write to any replacement device.
It might not need to be written to, but as we don't read to compare,
we have to write to be sure.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:53 +0800
977df3625 md/raid5: writes should get directed to replacement as well as original. ... Browse Code »

When writing, we need to submit two writes, one to the original, and
one to the replacement - if there is a replacement.

If the write to the replacement results in a write error, we just fail
the device. We only try to record write errors to the original.

When writing for recovery, we shouldn't write to the original. This
will be addressed in a subsequent patch that generally addresses
recovery.

Reviewed-by: Dan Williams
Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:53 +0800
657e3e4d8 md/raid5: allow removal for failed replacement devices. ... Browse Code »

Enhance raid5_remove_disk to be able to remove ->replacement
as well as ->rdev.

Reviewed-by: Dan Williams
Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:52 +0800
14a75d3e0 md/raid5: preferentially read from replacement device if possible. ... Browse Code »

If a replacement device is present and has been recovered far enough,
then use it for reading into the stripe cache.

If we get an error we don't try to repair it, we just fail the device.
A replacement device that gives errors does not sound sensible.

This requires removing the setting of R5_ReadError when we get
a read error during a read that bypasses the cache. It was probably
a bad idea anyway as we don't know that every block in the read
caused an error, and it could cause ReadError to be set for the
replacement device, which is bad.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:52 +0800
995c4275a md/raid5: remove redundant bio initialisations. ... Browse Code »

We current initialise some fields of a bio when preparing a
stripe_head, and again just before submitting the request.

Remove the duplication by only setting the fields that lower level
devices don't touch in raid5_build_block, and only set the changeable
fields in ops_run_io.

Reviewed-by: Dan Williams
Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:52 +0800
ede7ee8b4 md/raid5: raid5.h cleanup ... Browse Code »

Remove some #defines that are no longer used, and replace some
others with an enum.
And remove an unused field.

Reviewed-by: Dan Williams
Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:52 +0800
671488cc2 md/raid5: allow each slot to have an extra replacement device ... Browse Code »

Just enhance data structures to record a second device per slot to be
used as a 'replacement' device, replacing the original.
We also have a second bio in each slot in each stripe_head. This will
only be used when writing to the array - we need to write to both the
original and the replacement at the same time, so will need two bios.

For now, only try using the replacement drive for aligned-reads.
In this case, we prefer the replacement if it has been recovered far
enough, otherwise use the original.

This includes a small enhancement. Previously we would only do
aligned reads if the target device was fully recovered. Now we also
do them if it has recovered far enough.

Reviewed-by: Dan Williams
Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:52 +0800
2d78f8c45 md: create externally visible flags for supporting hot-replace. ... Browse Code »

hot-replace is a feature being added to md which will allow a
device to be replaced without removing it from the array first.

With hot-replace a spare can be activated and recovery can start while
the original device is still in place, thus allowing a transition from
an unreliable device to a reliable device without leaving the array
degraded during the transition. It can also be use when the original
device is still reliable but it not wanted for some reason.

This will eventually be supported in RAID4/5/6 and RAID10.

This patch adds a super-block flag to distinguish the replacement
device. If an old kernel sees this flag it will reject the device.

It also adds two per-device flags which are viewable and settable via
sysfs.
"want_replacement" can be set to request that a device be replaced.
"replacement" is set to show that this device is replacing another
device.

The "rd%d" links in /sys/block/mdXx/md only apply to the original
device, not the replacement. We currently don't make links for the
replacement - there doesn't seem to be a need.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:51 +0800
b8321b68d md: change hot_remove_disk to take an rdev rather than a number. ... Browse Code »

Soon an array will be able to have multiple devices with the
same raid_disk number (an original and a replacement). So removing
a device based on the number won't work. So pass the actual device
handle instead.

Reviewed-by: Dan Williams
Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:51 +0800
476a7abb9 md: remove test for duplicate device when setting slot number. ... Browse Code »

When setting the slot number on a device in an active array we
currently check that the number is not already in use.
We then call into the personality's hot_add_disk function
which performs the same test and returns the same error.

Thus the common test is not needed.

As we will shortly be changing some personalities to allow duplicates
in some cases (to support hot-replace), the common test will become
inconvenient.

So remove the common test.

Reviewed-by: Dan Williams
Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:51 +0800
915c420dd md/bitmap: be more consistent when setting new bits in memory bitmap. ... Browse Code »

For each active region corresponding to a bit in the bitmap with have
a 14bit counter (and some flags).
This counts
number of active writes + bit in the on-disk bitmap + delay-needed.

The "delay-needed" is because we always want a delay before clearing a
bit. So the number here is normally number of active writes plus 2.
If there have been no writes for a while, we drop to 1.
If still no writes we clear the bit and drop to 0.

So for consistency, when setting bit from the on-disk bitmap or by
request from user-space it is best to set the counter to '2' to start
with.

In particular we might also set the NEEDED_MASK flag at this time, and
in all other cases NEEDED_MASK is only set when the counter is 2 or
more.

Signed-off-by: NeilBrown

NeilBrown
2011-12-23 07:17:51 +0800