Eric Lee / smarc-fsl-linux-kernel

09 May, 2007

1 commit

ef51c9762 Remove do_sync_file_range() ... Browse Code »

Remove do_sync_file_range() and convert callers to just use
do_sync_mapping_range().

Signed-off-by: Mark Fasheh
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mark Fasheh
2007-05-09 02:15:04 +0800

08 May, 2007

1 commit

f98393a64 mm: remove destroy_dirty_buffers from invalidate_bdev() ... Browse Code »

Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
been used in 6 years (so akpm says).

find * -name \*.[ch] | xargs grep -l invalidate_bdev |
while read file; do
quilt add $file;
sed -ie 's/invalidate_bdev($[^,]*$,[^)]*)/invalidate_bdev(\1)/g' $file;
done

Signed-off-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2007-05-08 03:12:55 +0800

30 Apr, 2007

1 commit

5972511b7 [BLOCK] Don't pin lots of memory in mempools ... Browse Code »

Currently we scale the mempool sizes depending on memory installed
in the machine, except for the bio pool itself which sits at a fixed
256 entry pre-allocation.

There's really no point in "optimizing" this OOM path, we just need
enough preallocated to make progress. A single unit is enough, lets
scale it down to 2 just to be on the safe side.

This patch saves ~150kb of pinned kernel memory on a 32-bit box.

Signed-off-by: Jens Axboe

Jens Axboe
2007-04-30 15:08:17 +0800

13 Apr, 2007

1 commit

505fa2c4a [PATCH] md: fix calculation for size of filemap_attr array in md/bitmap ... Browse Code »

If 'num_pages' were ever 1 more than a multiple of 8 (32bit platforms)
or of 16 (64 bit platforms). filemap_attr would be allocated one
'unsigned long' shorter than required. We need a round-up in there.

Signed-off-by: Neil Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Brown
2007-04-13 06:31:42 +0800

05 Apr, 2007

1 commit

5792a2856 [PATCH] md: avoid a deadlock when removing a device from an md array via sysfs ... Browse Code »

A device can be removed from an md array via e.g.
echo remove > /sys/block/md3/md/dev-sde/state

This will try to remove the 'dev-sde' subtree which will deadlock
since
commit e7b0d26a86943370c04d6833c6edba2a72a6e240

With this patch we run the kobject_del via schedule_work so as to
avoid the deadlock.

Cc: Alan Stern
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-04-05 12:12:47 +0800

28 Mar, 2007

3 commits

5e55e2f5f [PATCH] md: convert compile time warnings into runtime warnings ... Browse Code »

... still not sure why we need this ....

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-03-28 00:05:15 +0800
041ae52e2 [PATCH] md: clear the congested_fn when stopping a raid5 ... Browse Code »

If this mddev and queue got reused for another array that doesn't register a
congested_fn, this function would get called incorretly.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-03-28 00:05:14 +0800
3d37890ba [PATCH] md: allow raid4 arrays to be reshaped ... Browse Code »

All that is missing the the function pointers in raid4_pers.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-03-28 00:05:14 +0800

17 Mar, 2007

1 commit

bed31ed9e [PATCH] fix read past end of array in md/linear.c ... Browse Code »

When iterating through an array, one must be careful to test one's index
variable rather than another similarly-named variable.

The loop will read off the end of conf->disks[] in the following
(pathological) case:

% dd bs=1 seek=840716287 if=/dev/zero of=d1 count=1
% for i in 2 3 4; do dd if=/dev/zero of=d$i bs=1k count=$(($i+150)); done
% ./vmlinux ubd0=root ubd1=d1 ubd2=d2 ubd3=d3 ubd4=d4
# mdadm -C /dev/md0 --level=linear --raid-devices=4 /dev/ubd[1234]

adding some printks, I saw this:

[42949374.960000] hash_spacing = 821120
[42949374.960000] cnt = 4
[42949374.960000] min_spacing = 801
[42949374.960000] j=0 size=820928 sz=820928
[42949374.960000] i=0 sz=820928 hash_spacing=820928
[42949374.960000] j=1 size=64 sz=64
[42949374.960000] j=2 size=64 sz=128
[42949374.960000] j=3 size=64 sz=192
[42949374.960000] j=4 size=1515870810 sz=1515871002

Cc: Gautham R Shenoy
Acked-by: Neil Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Isaacson
2007-03-17 10:25:03 +0800

05 Mar, 2007

1 commit

6d3baf2eb [PATCH] md: fix for raid6 reshape ... Browse Code »

Recent patch for raid6 reshape had a change missing that showed up in
subsequent review.

Many places in the raid5 code used "conf->raid_disks-1" to mean "number of
data disks". With raid6 that had to be changed to "conf->raid_disk -
conf->max_degraded" or similar. One place was missed.

This bug means that if a raid6 reshape were aborted in the middle the
recorded position would be wrong. On restart it would either fail (as the
position wasn't on an appropriate boundary) or would leave a section of the
array unreshaped, causing data corruption.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-03-05 23:57:53 +0800

02 Mar, 2007

6 commits

f416885ef [PATCH] md: add support for reshape of a raid6 ... Browse Code »

i.e. one or more drives can be added and the array will re-stripe
while on-line.

Most of the interesting work was already done for raid5. This just extends it
to raid6.

mdadm newer than 2.6 is needed for complete safety, however any version of
mdadm which support raid5 reshape will do a good enough job in almost all
cases (an 'echo repair > /sys/block/mdX/md/sync_action' is recommended after a
reshape that was aborted and had to be restarted with an such a version of
mdadm).

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-03-02 06:53:36 +0800
b4c4c7b80 [PATCH] md: restart a (raid5) reshape that has been aborted due to a read/write error ... Browse Code »

An error always aborts any resync/recovery/reshape on the understanding that
it will immediately be restarted if that still makes sense. However a reshape
currently doesn't get restarted. With this patch it does.

To avoid restarting when it is not possible to do work, we call into the
personality to check that a reshape is ok, and strengthen raid5_check_reshape
to fail if there are too many failed devices.

We also break some code out into a separate function: remove_and_add_spares as
the indent level for that code was getting crazy.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-03-02 06:53:36 +0800
d1b5380c7 [PATCH] md: clean out unplug and other queue function on md shutdown ... Browse Code »

The mddev and queue might be used for another array which does not set these,
so they need to be cleared.

Signed-off-by: NeilBrown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-03-02 06:53:36 +0800
7dd5e7c3d [PATCH] md: move warning about creating a raid array on partitions of the one device ... Browse Code »

md tries to warn the user if they e.g. create a raid1 using two partitions of
the same device, as this does not provide true redundancy.

However it also warns if a raid0 is created like this, and there is nothing
wrong with that.

At the place where the warning is currently printer, we don't necessarily know
what level the array will be, so move the warning from the point where the
device is added to the point where the array is started.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-03-02 06:53:36 +0800
a723406c4 [PATCH] md: RAID6: clean up CPUID and FPU enter/exit code ... Browse Code »

- Use kernel_fpu_begin() and kernel_fpu_end()
- Use boot_cpu_has() for feature testing even in userspace

Signed-off-by: H. Peter Anvin
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

H. Peter Anvin
2007-03-02 06:53:36 +0800
64a742bc6 [PATCH] md: fix raid10 recovery problem. ... Browse Code »

There are two errors that can lead to recovery problems with raid10
when used in 'far' more (not the default).

Due to a '>' instead of '>=' the wrong block is located which would result in
garbage being written to some random location, quite possible outside the
range of the device, causing the newly reconstructed device to fail.

The device size calculation had some rounding errors (it didn't round when it
should) and so recovery would go a few blocks too far which would again cause
a write to a random block address and probably a device error.

The code for working with device sizes was fairly confused and spread out, so
this has been tided up a bit.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-03-02 06:53:36 +0800

15 Feb, 2007

2 commits

0b4d41471 [PATCH] sysctl: remove insert_at_head from register_sysctl ... Browse Code »

The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.

I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.

So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.

Signed-off-by: Eric W. Biederman
Acked-by: Ralf Baechle
Acked-by: Martin Schwidefsky
Cc: Russell King
Cc: David Howells
Cc: "Luck, Tony"
Cc: Ralf Baechle
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Andi Kleen
Cc: Jens Axboe
Cc: Corey Minyard
Cc: Neil Brown
Cc: "John W. Linville"
Cc: James Bottomley
Cc: Jan Kara
Cc: Trond Myklebust
Cc: Mark Fasheh
Cc: David Chinner
Cc: "David S. Miller"
Cc: Patrick McHardy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:59 +0800
ff1d28efc [PATCH] sysctl: md: remove unnecessary insert_at_head flag ... Browse Code »

The sysctls used by the md driver are have unique binary numbers so remove the
insert_at_head flag as it serves no useful purpose.

Signed-off-by: Eric W. Biederman
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2007-02-15 00:09:55 +0800

13 Feb, 2007

1 commit

fa027c2a0 [PATCH] mark struct file_operations const 4 ... Browse Code »

Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

[akpm@sdl.org: dvb fix]
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arjan van de Ven
2007-02-13 01:48:45 +0800

12 Feb, 2007

1 commit

fc0ecff69 [PATCH] remove invalidate_inode_pages() ... Browse Code »

Convert all calls to invalidate_inode_pages() into open-coded calls to
invalidate_mapping_pages().

Leave the invalidate_inode_pages() wrapper in place for now, marked as
deprecated.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2007-02-12 02:51:31 +0800

10 Feb, 2007

2 commits

da6e1a32f [PATCH] md: avoid possible BUG_ON in md bitmap handling ... Browse Code »

md/bitmap tracks how many active write requests are pending on blocks
associated with each bit in the bitmap, so that it knows when it can clear
the bit (when count hits zero).

The counter has 14 bits of space, so if there are ever more than 16383, we
cannot cope.

Currently the code just calles BUG_ON as "all" drivers have request queue
limits much smaller than this.

However is seems that some don't. Apparently some multipath configurations
can allow more than 16383 concurrent write requests.

So, in this unlikely situation, instead of calling BUG_ON we now wait
for the count to drop down a bit. This requires a new wait_queue_head,
some waiting code, and a wakeup call.

Tested by limiting the counter to 20 instead of 16383 (writes go a lot slower
in that case...).

Signed-off-by: Neil Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Brown
2007-02-10 01:25:47 +0800
387bb1737 [PATCH] md: fix various bugs with aligned reads in RAID5 ... Browse Code »

It is possible for raid5 to be sent a bio that is too big for an underlying
device. So if it is a READ that we pass stright down to a device, it will
fail and confuse RAID5.

So in 'chunk_aligned_read' we check that the bio fits within the parameters
for the target device and if it doesn't fit, fall back on reading through
the stripe cache and making lots of one-page requests.

Note that this is the earliest time we can check against the device because
earlier we don't have a lock on the device, so it could change underneath
us.

Also, the code for handling a retry through the cache when a read fails has
not been tested and was badly broken. This patch fixes that code.

Signed-off-by: Neil Brown
Cc: "Kai"
Cc:
Cc:
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Brown
2007-02-10 01:25:46 +0800

27 Jan, 2007

6 commits

c20086de9 [PATCH] md: remove unnecessary printk when raid5 gets an unaligned read. ... Browse Code »

raid5_mergeable_bvec tries to ensure that raid5 never sees a read request
that does not fit within just one chunk. However as we must always accept
a single-page read, that is not always possible.

So when "in_chunk_boundary" fails, it might be unusual, but it is not a
problem and printing a message every time is a bad idea.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-01-27 05:51:00 +0800
2a2275d63 [PATCH] md: fix potential memalloc deadlock in md ... Browse Code »

If a GFP_KERNEL allocation is attempted in md while the mddev_lock is held,
it is possible for a deadlock to eventuate.

This happens if the array was marked 'clean', and the memalloc triggers a
write-out to the md device.

For the writeout to succeed, the array must be marked 'dirty', and that
requires getting the mddev_lock.

So, before attempting a GFP_KERNEL allocation while holding the lock, make
sure the array is marked 'dirty' (unless it is currently read-only).

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-01-27 05:51:00 +0800
bfa152fa5 [PATCH] dm-multipath: fix stall on noflush suspend/resume ... Browse Code »

Allow noflush suspend/resume of device-mapper device only for the case
where the device size is unchanged.

Otherwise, dm-multipath devices can stall when resumed if noflush was used
when suspending them, all paths have failed and queue_if_no_path is set.

Explanation:
1. Something is doing fsync() on the block dev,
holding inode->i_sem
2. The fsync write is blocked by all-paths-down and queue_if_no_path
3. Someone requests to suspend the dm device with noflush.
Pending writes are left in queue.
4. In the middle of dm_resume(), __bind() tries to get
inode->i_sem to do __set_size() and waits forever.

'noflush suspend' is a new device-mapper feature introduced in
early 2.6.20. So I hope the fix being included before 2.6.20 is
released.

Example of reproducer:
1. Create a multipath device by dmsetup
2. Fail all paths during mkfs
3. Do dmsetup suspend --noflush and load new map with healthy paths
4. Do dmsetup resume

Signed-off-by: Jun'ichi Nomura
Acked-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jun'ichi Nomura
2007-01-27 05:51:00 +0800
f49d5e62d [PATCH] md: avoid reading past the end of a bitmap file ... Browse Code »

In most cases we check the size of the bitmap file before reading data from
it. However when reading the superblock, we always read the first PAGE_SIZE
bytes, which might not always be appropriate. So limit that read to the size
of the file if appropriate.

Also, we get the count of available bytes wrong in one place, so that too can
read past the end of the file.

Cc: "yang yin"
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-01-27 05:50:59 +0800
1031be7a5 [PATCH] md: make sure the events count in an md array never returns to zero ... Browse Code »

Now that we sometimes step the array events count backwards (when
transitioning dirty->clean where nothing else interesting has happened - so
that we don't need to write to spares all the time), it is possible for the
event count to return to zero, which is potentially confusing and triggers and
MD_BUG.

We could possibly remove the MD_BUG, but is just as easy, and probably safer,
to make sure we never return to zero.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-01-27 05:50:59 +0800
3eda22d19 [PATCH] md: make 'repair' actually work for raid1 ... Browse Code »

When 'repair' finds a block that is different one the various parts of the
mirror. it is meant to write a chosen good version to the others. However it
currently writes out the original data to each. The memcpy to make all the
data the same is missing.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-01-27 05:50:59 +0800

12 Jan, 2007

1 commit

e3881a681 [PATCH] md: pass down BIO_RW_SYNC in raid{1,10} ... Browse Code »

md raidX make_request functions strip off the BIO_RW_SYNC flag, thus
introducing additional latency.

Fixing this in raid1 and raid10 seems to be straightforward enough.

For our particular usage case in DRBD, passing this flag improved some
initialization time from ~5 minutes to ~5 seconds.

Acked-by: NeilBrown
Signed-off-by: Lars Ellenberg
Acked-by: Jens Axboe
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lars Ellenberg
2007-01-12 10:18:21 +0800

23 Dec, 2006

1 commit

3f9d7b0d8 [PATCH] md: fix a few problems with the interface (sysfs and ioctl) to md ... Browse Code »

While developing more functionality in mdadm I found some bugs in md...

- When we remove a device from an inactive array (write 'remove' to
the 'state' sysfs file - see 'state_store') would should not
update the superblock information - as we may not have
read and processed it all properly yet.

- initialise all raid_disk entries to '-1' else the 'slot sysfs file
will claim '0' for all devices in an array before the array is
started.

- all '\n' not to be present at the end of words written to
sysfs files

- when we use SET_ARRAY_INFO to set the md metadata version,
set the flag to say that there is persistant metadata.

- allow GET_BITMAP_FILE to be called on an array that hasn't
been started yet.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-12-23 00:55:51 +0800

14 Dec, 2006

1 commit

802ba064c [PATCH] md: Don't assume that READ==0 and WRITE==1 - use the names explicitly ... Browse Code »

Thanks Jens for alerting me to this.

Cc: Jens Axboe
Cc:
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-12-14 01:05:48 +0800

11 Dec, 2006

9 commits

3263263f7 [CRYPTO] dm-crypt: Select CRYPTO_CBC ... Browse Code »

As CBC is the default chaining method for cryptoloop, we should select
it from cryptoloop to ease the transition. Spotted by Rene Herman.

Signed-off-by: Herbert Xu
Signed-off-by: Linus Torvalds

Herbert Xu
2006-12-11 02:18:57 +0800
175712843 [PATCH] md: assorted md and raid1 one-liners ... Browse Code »

Fix few bugs that meant that:
- superblocks weren't alway written at exactly the right time (this
could show up if the array was not written to - writting to the array
causes lots of superblock updates and so hides these errors).

- restarting device recovery after a clean shutdown (version-1 metadata
only) didn't work as intended (or at all).

1/ Ensure superblock is updated when a new device is added.
2/ Remove an inappropriate test on MD_RECOVERY_SYNC in md_do_sync.
The body of this if takes one of two branches depending on whether
MD_RECOVERY_SYNC is set, so testing it in the clause of the if
is wrong.
3/ Flag superblock for updating after a resync/recovery finishes.
4/ If we find the neeed to restart a recovery in the middle (version-1
metadata only) make sure a full recovery (not just as guided by
bitmaps) does get done.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-12-11 01:57:21 +0800
c2b00852f [PATCH] md: return a non-zero error to bi_end_io as appropriate in raid5 ... Browse Code »

Currently raid5 depends on clearing the BIO_UPTODATE flag to signal an error
to higher levels. While this should be sufficient, it is safer to explicitly
set the error code as well - less room for confusion.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-12-11 01:57:21 +0800
b8c6b6455 [PATCH] md: remove some old ifdefed-out code from raid5.c ... Browse Code »

There are some vestiges of old code that was used for bypassing the stripe
cache on reads in raid5.c. This was never updated after the change from
buffer_heads to bios, but was left as a reminder.

That functionality has nowe been implemented in a completely different way, so
the old code can go.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-12-11 01:57:21 +0800
fdee8ae44 [PATCH] MD: conditionalize some code ... Browse Code »

The autorun code is only used if this module is built into the static
kernel image. Adjust #ifdefs accordingly.

Signed-off-by: Jeff Garzik
Acked-by: NeilBrown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Garzik
2006-12-11 01:57:21 +0800
b875e531f [PATCH] md: fix innocuous bug in raid6 stripe_to_pdidx ... Browse Code »

stripe_to_pdidx finds the index of the parity disk for a given stripe. It
assumes raid5 in that it uses "disks-1" to determine the number of data disks.

This is incorrect for raid6 but fortunately the two usages cancel each other
out. The only way that 'data_disks' affects the calculation of pd_idx in
raid5_compute_sector is when it is divided into the sector number. But as
that sector number is calculated by multiplying in the wrong value of
'data_disks' the division produces the right value.

So it is innocuous but needs to be fixed.

Also change the calculation of raid_disks in compute_blocknr to make it
more obviously correct (it seems at first to always use disks-1 too).

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-12-11 01:57:21 +0800
524886151 [PATCH] md: enable bypassing cache for reads ... Browse Code »

Call the chunk_aligned_read where appropriate.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Raz Ben-Jehuda(caro)
2006-12-11 01:57:20 +0800
46031f9a3 [PATCH] md: allow reads that have bypassed the cache to be retried on failure ... Browse Code »

If a bypass-the-cache read fails, we simply try again through the cache. If
it fails again it will trigger normal recovery precedures.

update 1:

From: NeilBrown

1/
chunk_aligned_read and retry_aligned_read assume that
data_disks == raid_disks - 1
which is not true for raid6.
So when an aligned read request bypasses the cache, we can get the wrong data.

2/ The cloned bio is being used-after-free in raid5_align_endio
(to test BIO_UPTODATE).

3/ We forgot to add rdev->data_offset when submitting
a bio for aligned-read

4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
so we need to invalidate the segment counts.

5/ We don't de-reference the rdev when the read completes.
This means we need to record the rdev to so it is still
available in the end_io routine. Fortunately
bi_next in the original bio is unused at this point so
we can stuff it in there.

6/ We leak a cloned bio if the target rdev is not usable.

From: NeilBrown

update 2:

1/ When aligned requests fail (read error) they need to be retried
via the normal method (stripe cache). As we cannot be sure that
we can process a single read in one go (we may not be able to
allocate all the stripes needed) we store a bio-being-retried
and a list of bioes-that-still-need-to-be-retried.
When find a bio that needs to be retried, we should add it to
the list, not to single-bio...

2/ We were never incrementing 'scnt' when resubmitting failed
aligned requests.

[akpm@osdl.org: build fix]
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Raz Ben-Jehuda(caro)
2006-12-11 01:57:20 +0800
f679623f5 [PATCH] md: handle bypassing the read cache (assuming nothing fails) ... Browse Code »

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Raz Ben-Jehuda(caro)
2006-12-11 01:57:20 +0800