Eric Lee / smarc-fsl-linux-kernel

24 Jul, 2007

1 commit

165125e1e [BLOCK] Get rid of request_queue_t typedef ... Browse Code »

Some of the code has been gradually transitioned to using the proper
struct request_queue, but there's lots left. So do a full sweet of
the kernel and get rid of this typedef and replace its uses with
the proper type.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-24 15:28:11 +0800

18 Jul, 2007

2 commits

4ad136637 md: change bitmap_unplug and others to void functions ... Browse Code »

bitmap_unplug only ever returns 0, so it may as well be void. Two callers try
to print a message if it returns non-zero, but that message is already printed
by bitmap_file_kick.

write_page returns an error which is not consistently checked. It always
causes BITMAP_WRITE_ERROR to be set on an error, and that can more
conveniently be checked.

When the return of write_page is checked, an error causes bitmap_file_kick to
be called - so move that call into write_page - and protect against recursive
calls into bitmap_file_kick.

bitmap_update_sb returns an error that is never checked.

So make these 'void' and be consistent about checking the bit.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-07-18 01:23:15 +0800
713f6ab18 md: improve the is_mddev_idle test fix ... Browse Code »

Don't use 'unsigned' variable to track sync vs non-sync IO, as the only thing
we want to do with them is a signed comparison, and fix up the comment which
had become quite wrong.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-07-18 01:23:15 +0800

13 Jul, 2007

6 commits

b5e98d65d md: handle_stripe5 - add request/completion logic for async read ops ... Browse Code »

When a read bio is attached to the stripe and the corresponding block is
marked R5_UPTODATE, then a read (biofill) operation is scheduled to copy
the data from the stripe cache to the bio buffer. handle_stripe flags the
blocks to be operated on with the R5_Wantfill flag. If new read requests
arrive while raid5_run_ops is running they will not be handled until
handle_stripe is scheduled to run again.

Changelog:
* cleanup to_read and to_fill accounting
* do not fail reads that have reached the cache

Signed-off-by: Dan Williams
Acked-By: NeilBrown

Dan Williams
2007-07-13 23:06:17 +0800
f38e12199 md: handle_stripe5 - add request/completion logic for async compute ops ... Browse Code »

handle_stripe will compute a block when a backing disk has failed, or when
it determines it can save a disk read by computing the block from all the
other up-to-date blocks.

Previously a block would be computed under the lock and subsequent logic in
handle_stripe could use the newly up-to-date block. With the raid5_run_ops
implementation the compute operation is carried out a later time outside
the lock. To preserve the old functionality we take advantage of the
dependency chain feature of async_tx to flag the block as R5_Wantcompute
and then let other parts of handle_stripe operate on the block as if it
were up-to-date. raid5_run_ops guarantees that the block will be ready
before it is used in another operation.

However, this only works in cases where the compute and the dependent
operation are scheduled at the same time. If a previous call to
handle_stripe sets the R5_Wantcompute flag there is no facility to pass the
async_tx dependency chain across successive calls to raid5_run_ops. The
req_compute variable protects against this case.

Changelog:
* remove the req_compute BUG_ON

Signed-off-by: Dan Williams
Acked-By: NeilBrown

Dan Williams
2007-07-13 23:06:17 +0800
91c009248 md: raid5_run_ops - run stripe operations outside sh->lock ... Browse Code »

When the raid acceleration work was proposed, Neil laid out the following
attack plan:

1/ move the xor and copy operations outside spin_lock(&sh->lock)
2/ find/implement an asynchronous offload api

The raid5_run_ops routine uses the asynchronous offload api (async_tx) and
the stripe_operations member of a stripe_head to carry out xor+copy
operations asynchronously, outside the lock.

To perform operations outside the lock a new set of state flags is needed
to track new requests, in-flight requests, and completed requests. In this
new model handle_stripe is tasked with scanning the stripe_head for work,
updating the stripe_operations structure, and finally dropping the lock and
calling raid5_run_ops for processing. The following flags outline the
requests that handle_stripe can make of raid5_run_ops:

STRIPE_OP_BIOFILL
- copy data into request buffers to satisfy a read request
STRIPE_OP_COMPUTE_BLK
- generate a missing block in the cache from the other blocks
STRIPE_OP_PREXOR
- subtract existing data as part of the read-modify-write process
STRIPE_OP_BIODRAIN
- copy data out of request buffers to satisfy a write request
STRIPE_OP_POSTXOR
- recalculate parity for new data that has entered the cache
STRIPE_OP_CHECK
- verify that the parity is correct
STRIPE_OP_IO
- submit i/o to the member disks (note this was already performed outside
the stripe lock, but it made sense to add it as an operation type

The flow is:
1/ handle_stripe sets STRIPE_OP_* in sh->ops.pending
2/ raid5_run_ops reads sh->ops.pending, sets sh->ops.ack, and submits the
operation to the async_tx api
3/ async_tx triggers the completion callback routine to set
sh->ops.complete and release the stripe
4/ handle_stripe runs again to finish the operation and optionally submit
new operations that were previously blocked

Note this patch just defines raid5_run_ops, subsequent commits (one per
major operation type) modify handle_stripe to take advantage of this
routine.

Changelog:
* removed ops_complete_biodrain in favor of ops_complete_postxor and
ops_complete_write.
* removed the raid5_run_ops workqueue
* call bi_end_io for reads in ops_complete_biofill, saves a call to
handle_stripe
* explicitly handle the 2-disk raid5 case (xor becomes memcpy), Neil Brown
* fix race between async engines and bi_end_io call for reads, Neil Brown
* remove unnecessary spin_lock from ops_complete_biofill
* remove test_and_set/test_and_clear BUG_ONs, Neil Brown
* remove explicit interrupt handling for channel switching, this feature
was absorbed (i.e. it is now implicit) by the async_tx api
* use return_io in ops_complete_biofill

Signed-off-by: Dan Williams
Acked-By: NeilBrown

Dan Williams
2007-07-13 23:06:15 +0800
a44568564 raid5: refactor handle_stripe5 and handle_stripe6 (v3) ... Browse Code »

handle_stripe5 and handle_stripe6 have very deep logic paths handling the
various states of a stripe_head. By introducing the 'stripe_head_state'
and 'r6_state' objects, large portions of the logic can be moved to
sub-routines.

'struct stripe_head_state' consumes all of the automatic variables that previously
stood alone in handle_stripe5,6. 'struct r6_state' contains the handle_stripe6
specific variables like p_failed and q_failed.

One of the nice side effects of the 'stripe_head_state' change is that it
allows for further reductions in code duplication between raid5 and raid6.
The following new routines are shared between raid5 and raid6:

handle_completed_write_requests
handle_requests_to_failed_array
handle_stripe_expansion

Changes:
* v2: fixed 'conf->raid_disk-1' for the raid6 'handle_stripe_expansion' path
* v3: removed the unused 'dirty' field from struct stripe_head_state
* v3: coalesced open coded bi_end_io routines into return_io()

Signed-off-by: Dan Williams
Acked-By: NeilBrown

Dan Williams
2007-07-13 23:06:15 +0800
9bc89cd82 async_tx: add the async_tx api ... Browse Code »

The async_tx api provides methods for describing a chain of asynchronous
bulk memory transfers/transforms with support for inter-transactional
dependencies. It is implemented as a dmaengine client that smooths over
the details of different hardware offload engine implementations. Code
that is written to the api can optimize for asynchronous operation and the
api will fit the chain of operations to the available offload resources.

I imagine that any piece of ADMA hardware would register with the
'async_*' subsystem, and a call to async_X would be routed as
appropriate, or be run in-line. - Neil Brown

async_tx exploits the capabilities of struct dma_async_tx_descriptor to
provide an api of the following general format:

struct dma_async_tx_descriptor *
async_(..., struct dma_async_tx_descriptor *depend_tx,
dma_async_tx_callback cb_fn, void *cb_param)
{
struct dma_chan *chan = async_tx_find_channel(depend_tx, );
struct dma_device *device = chan ? chan->device : NULL;
int int_en = cb_fn ? 1 : 0;
struct dma_async_tx_descriptor *tx = device ?
device->device_prep_dma_(chan, len, int_en) : NULL;

if (tx) { /* run asynchronously */
...
tx->tx_set_dest(addr, tx, index);
...
tx->tx_set_src(addr, tx, index);
...
async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param);
} else { /* run synchronously */
...

...
async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param);
}

return tx;
}

async_tx_find_channel() returns a capable channel from its pool. The
channel pool is organized as a per-cpu array of channel pointers. The
async_tx_rebalance() routine is tasked with managing these arrays. In the
uniprocessor case async_tx_rebalance() tries to spread responsibility
evenly over channels of similar capabilities. For example if there are two
copy+xor channels, one will handle copy operations and the other will
handle xor. In the SMP case async_tx_rebalance() attempts to spread the
operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor
channel0 while cpu1 gets copy channel 1 and xor channel 1. When a
dependency is specified async_tx_find_channel defaults to keeping the
operation on the same channel. A xor->copy->xor chain will stay on one
channel if it supports both operation types, otherwise the transaction will
transition between a copy and a xor resource.

Currently the raid5 implementation in the MD raid456 driver has been
converted to the async_tx api. A driver for the offload engines on the
Intel Xscale series of I/O processors, iop-adma, is provided in a later
commit. With the iop-adma driver and async_tx, raid456 is able to offload
copy, xor, and xor-zero-sum operations to hardware engines.

On iop342 tiobench showed higher throughput for sequential writes (20 - 30%
improvement) and sequential reads to a degraded array (40 - 55%
improvement). For the other cases performance was roughly equal, +/- a few
percentage points. On a x86-smp platform the performance of the async_tx
implementation (in synchronous mode) was also +/- a few percentage points
of the original implementation. According to 'top' on iop342 CPU
utilization drops from ~50% to ~15% during a 'resync' while the speed
according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s.

The tiobench command line used for testing was: tiobench --size 2048
--block 4096 --block 131072 --dir /mnt/raid --numruns 5
* iop342 had 1GB of memory available

Details:
* if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making
async_tx_find_channel a static inline routine that always returns NULL
* when a callback is specified for a given transaction an interrupt will
fire at operation completion time and the callback will occur in a
tasklet. if the the channel does not support interrupts then a live
polling wait will be performed
* the api is written as a dmaengine client that requests all available
channels
* In support of dependencies the api implicitly schedules channel-switch
interrupts. The interrupt triggers the cleanup tasklet which causes
pending operations to be scheduled on the next channel
* Xor engines treat an xor destination address differently than a software
xor routine. To the software routine the destination address is an implied
source, whereas engines treat it as a write-only destination. This patch
modifies the xor_blocks routine to take a an explicit destination address
to mirror the hardware.

Changelog:
* fixed a leftover debug print
* don't allow callbacks in async_interrupt_cond
* fixed xor_block changes
* fixed usage of ASYNC_TX_XOR_DROP_DEST
* drop dma mapping methods, suggested by Chris Leech
* printk warning fixups from Andrew Morton
* don't use inline in C files, Adrian Bunk
* select the API when MD is enabled
* BUG_ON xor source counts
Signed-off-by: Dan Williams
Acked-By: NeilBrown

Dan Williams
2007-07-13 23:06:14 +0800
685784aaf xor: make 'xor_blocks' a library routine for use with async_tx ... Browse Code »

The async_tx api tries to use a dma engine for an operation, but will fall
back to an optimized software routine otherwise. Xor support is
implemented using the raid5 xor routines. For organizational purposes this
routine is moved to a common area.

The following fixes are also made:
* rename xor_block => xor_blocks, suggested by Adrian Bunk
* ensure that xor.o initializes before md.o in the built-in case
* checkpatch.pl fixes
* mark calibrate_xor_blocks __init, Adrian Bunk

Cc: Adrian Bunk
Cc: NeilBrown
Cc: Herbert Xu
Signed-off-by: Dan Williams

Dan Williams
2007-07-13 23:06:14 +0800

24 May, 2007

1 commit

ab6085c79 md: don't write more than is required of the last page of a bitmap ... Browse Code »

It is possible that real data or metadata follows the bitmap without full page
alignment.

So limit the last write to be only the required number of bytes, rounded up to
the hard sector size of the device.

Signed-off-by: Neil Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-05-24 11:14:14 +0800

10 May, 2007

2 commits

44ce6294d Revert "md: improve partition detection in md array" ... Browse Code »

This reverts commit 5b479c91da90eef605f851508744bfe8269591a0.

Quoth Neil Brown:

"It causes an oops when auto-detecting raid arrays, and it doesn't
seem easy to fix.

The array may not be 'open' when do_md_run is called, so
bdev->bd_disk might be NULL, so bd_set_size can oops.

This whole approach of opening an md device before it has been
assembled just seems to get more and more painful. I think I'm going
to have to come up with something clever to provide both backward
comparability with usage expectation, and sane integration into the
rest of the kernel."

Signed-off-by: Linus Torvalds

Linus Torvalds
2007-05-10 09:51:36 +0800
5b479c91d md: improve partition detection in md array ... Browse Code »

md currently uses ->media_changed to make sure rescan_partitions
is call on md array after they are assembled.

However that doesn't happen until the array is opened, which is later
than some people would like.

So use blkdev_ioctl to do the rescan immediately that the
array has been assembled.

This means we can remove all the ->change infrastructure as it was only used
to trigger a partition rescan.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-05-10 03:30:57 +0800

05 Apr, 2007

1 commit

5792a2856 [PATCH] md: avoid a deadlock when removing a device from an md array via sysfs ... Browse Code »

A device can be removed from an md array via e.g.
echo remove > /sys/block/md3/md/dev-sde/state

This will try to remove the 'dev-sde' subtree which will deadlock
since
commit e7b0d26a86943370c04d6833c6edba2a72a6e240

With this patch we run the kobject_del via schedule_work so as to
avoid the deadlock.

Cc: Alan Stern
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-04-05 12:12:47 +0800

10 Feb, 2007

1 commit

da6e1a32f [PATCH] md: avoid possible BUG_ON in md bitmap handling ... Browse Code »

md/bitmap tracks how many active write requests are pending on blocks
associated with each bit in the bitmap, so that it knows when it can clear
the bit (when count hits zero).

The counter has 14 bits of space, so if there are ever more than 16383, we
cannot cope.

Currently the code just calles BUG_ON as "all" drivers have request queue
limits much smaller than this.

However is seems that some don't. Apparently some multipath configurations
can allow more than 16383 concurrent write requests.

So, in this unlikely situation, instead of calling BUG_ON we now wait
for the count to drop down a bit. This requires a new wait_queue_head,
some waiting code, and a wakeup call.

Tested by limiting the counter to 20 instead of 16383 (writes go a lot slower
in that case...).

Signed-off-by: Neil Brown
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Brown
2007-02-10 01:25:47 +0800

27 Jan, 2007

1 commit

2a2275d63 [PATCH] md: fix potential memalloc deadlock in md ... Browse Code »

If a GFP_KERNEL allocation is attempted in md while the mddev_lock is held,
it is possible for a deadlock to eventuate.

This happens if the array was marked 'clean', and the memalloc triggers a
write-out to the md device.

For the writeout to succeed, the array must be marked 'dirty', and that
requires getting the mddev_lock.

So, before attempting a GFP_KERNEL allocation while holding the lock, make
sure the array is marked 'dirty' (unless it is currently read-only).

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2007-01-27 05:51:00 +0800

11 Dec, 2006

1 commit

46031f9a3 [PATCH] md: allow reads that have bypassed the cache to be retried on failure ... Browse Code »

If a bypass-the-cache read fails, we simply try again through the cache. If
it fails again it will trigger normal recovery precedures.

update 1:

From: NeilBrown

1/
chunk_aligned_read and retry_aligned_read assume that
data_disks == raid_disks - 1
which is not true for raid6.
So when an aligned read request bypasses the cache, we can get the wrong data.

2/ The cloned bio is being used-after-free in raid5_align_endio
(to test BIO_UPTODATE).

3/ We forgot to add rdev->data_offset when submitting
a bio for aligned-read

4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
so we need to invalidate the segment counts.

5/ We don't de-reference the rdev when the read completes.
This means we need to record the rdev to so it is still
available in the end_io routine. Fortunately
bi_next in the original bio is unused at this point so
we can stuff it in there.

6/ We leak a cloned bio if the target rdev is not usable.

From: NeilBrown

update 2:

1/ When aligned requests fail (read error) they need to be retried
via the normal method (stripe cache). As we cannot be sure that
we can process a single read in one go (we may not be able to
allocate all the stripes needed) we store a bio-being-retried
and a list of bioes-that-still-need-to-be-retried.
When find a bio that needs to be retried, we should add it to
the list, not to single-bio...

2/ We were never incrementing 'scnt' when resubmitting failed
aligned requests.

[akpm@osdl.org: build fix]
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Raz Ben-Jehuda(caro)
2006-12-11 01:57:20 +0800

08 Dec, 2006

1 commit

e18b890bb [PATCH] slab: remove kmem_cache_t ... Browse Code »

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800

22 Oct, 2006

2 commits

4f2e639af [PATCH] md: endian annotations for the bitmap superblock ... Browse Code »

And a couple of bug fixes found by sparse.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-22 04:35:05 +0800
1c05b4bc2 [PATCH] md: endian annotation for v1 superblock access ... Browse Code »

Includes a couple of bugfixes found by sparse.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-22 04:35:05 +0800

03 Oct, 2006

8 commits

e8703fe1f [PATCH] md: remove MAX_MD_DEVS which is an arbitrary limit ... Browse Code »

Once upon a time we needed to fixed limit to the number of md devices,
probably because we preallocated some array. This need no longer exists, but
we still have an arbitrary limit.

So remove MAX_MD_DEVS and allow as many devices as we can fit into the 'minor'
part of a device number.

Also remove some useless noise at init time (which reports MAX_MD_DEVS) and
remove MD_THREAD_NAME_MAX which hasn't been used for a while.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-03 23:04:18 +0800
11ce99e62 [PATCH] md: Remove working_disks from raid1 state data ... Browse Code »

It is equivalent to conf->raid_disks - conf->mddev->degraded.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-03 23:04:17 +0800
9b1d1dac1 [PATCH] md: new sysfs interface for setting bits in the write-intent-bitmap ... Browse Code »

Add a new sysfs interface that allows the bitmap of an array to be dirtied.
The interface is write-only, and is used as follows:

echo "1000" > /sys/block/md2/md/bitmap

(dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk
bitmaps of array md2)

echo "1000-2000" > /sys/block/md1/md/bitmap

(dirty the bits for chunks 1000-2000 in md1's bitmap)

This is useful, for example, in cluster environments where you may need to
combine two disjoint bitmaps into one (following a server failure, after a
secondary server has taken over the array). By combining the bitmaps on
the two servers, a full resync can be avoided (This was discussed on the
list back on March 18, 2005, "[PATCH 1/2] md bitmap bug fixes" thread).

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Clements
2006-10-03 23:04:17 +0800
76186dd8b [PATCH] md: remove 'working_disks' from raid10 state ... Browse Code »

It isn't needed as mddev->degraded contains equivalent info.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-03 23:04:17 +0800
02c2de8cc [PATCH] md: remove the working_disks and failed_disks from raid5 state data. ... Browse Code »

They are not needed. conf->failed_disks is the same as mddev->degraded and
conf->working_disks is conf->raid_disks - mddev->degraded.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-03 23:04:17 +0800
850b2b420 [PATCH] md: replace magic numbers in sb_dirty with well defined bit flags ... Browse Code »

Instead of magic numbers (0,1,2,3) in sb_dirty, we have
some flags instead:
MD_CHANGE_DEVS
Some device state has changed requiring superblock update
on all devices.
MD_CHANGE_CLEAN
The array has transitions from 'clean' to 'dirty' or back,
requiring a superblock update on active devices, but possibly
not on spares
MD_CHANGE_PENDING
A superblock update is underway.

We wait for an update to complete by waiting for all flags to be clear. A
flag can be set at any time, even during an update, without risk that the
change will be lost.

Stop exporting md_update_sb - isn't needed.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-03 23:04:17 +0800
b5c124af6 [PATCH] md: fix a comment that is wrong in raid5.h ... Browse Code »

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-10-03 23:04:16 +0800
fbedac04f [PATCH] md: the scheduled removal of the START_ARRAY ioctl for md ... Browse Code »

This patch contains the scheduled removal of the START_ARRAY ioctl for md.

Signed-off-by: Adrian Bunk
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2006-10-03 23:04:16 +0800

01 Oct, 2006

1 commit

9361401eb [PATCH] BLOCK: Make it possible to disable the block layer [try #6] ... Browse Code »

Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.

This patch does the following:

(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.

(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:

(*) Block I/O tracing.

(*) Disk partition code.

(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.

(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.

(*) MTD blockdev handling and FTL.

(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.

(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.

(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.

(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.

(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:

(*) Default blockdev file operations (to give error ENODEV on opening).

(*) Makes some /proc changes:

(*) /proc/devices does not list any blockdevs.

(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.

(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).

(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.

Signed-Off-By: David Howells
Signed-off-by: Jens Axboe

David Howells
2006-10-01 02:52:31 +0800

19 Sep, 2006

1 commit

fadcfa33b [HEADERS] One line per header in Kbuild files to reduce conflicts ... Browse Code »

Signed-off-by: David Woodhouse

David Woodhouse
2006-09-19 19:43:58 +0800

11 Jul, 2006

1 commit

ff4e8d9a9 [PATCH] md: fix resync speed calculation for restarted resyncs ... Browse Code »

We introduced 'io_sectors' recently so we could count the sectors that causes
io during resync separate from sectors which didn't cause IO - there can be a
difference if a bitmap is being used to accelerate resync.

However when a speed is reported, we find the number of sectors processed
recently by subtracting an oldish io_sectors count from a current
'curr_resync' count. This is wrong because curr_resync counts all sectors,
not just io sectors.

So, add a field to mddev to store the curren io_sectors separately from
curr_resync, and use that in the calculations.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-07-11 04:24:16 +0800

05 Jul, 2006

1 commit

6fa0cb114 Merge git://git.infradead.org/hdrinstall-2.6 ... Browse Code »

* git://git.infradead.org/hdrinstall-2.6:
Remove export of include/linux/isdn/tpam.h
Remove and from userspace export
Restrict headers exported to userspace for SPARC and SPARC64
Add empty Kbuild files for 'make headers_install' in remaining arches.
Add Kbuild file for Alpha 'make headers_install'
Add Kbuild file for SPARC 'make headers_install'
Add Kbuild file for IA64 'make headers_install'
Add Kbuild file for S390 'make headers_install'
Add Kbuild file for i386 'make headers_install'
Add Kbuild file for x86_64 'make headers_install'
Add Kbuild file for PowerPC 'make headers_install'
Add generic Kbuild files for 'make headers_install'
Basic implementation of 'make headers_check'
Basic implementation of 'make headers_install'

Linus Torvalds
2006-07-05 03:55:45 +0800

27 Jun, 2006

9 commits

425437691 [PATCH] md: Don't write dirty/clean update to spares - leave them alone ... Browse Code »

- record the 'event' count on each individual device (they
might sometimes be slightly different now)
- add a new value for 'sb_dirty': '3' means that the super
block only needs to be updated to record a cleandirty
transition.
- Prefer odd event numbers for dirty states and even numbers
for clean states
- Using all the above, don't update the superblock on
a spare device if the update is just doing a clean-dirty
transition. To accomodate this, a transition from
dirty back to clean might now decrement the events counter
if nothing else has changed.

The net effect of this is that spare drives will not see any IO requests
during normal running of the array, so they can go to sleep if that is what
they want to do.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-27 00:58:39 +0800
d785a06a0 [PATCH] md/bitmap: change md/bitmap file handling to use bmap to file blocks ... Browse Code »

If md is asked to store a bitmap in a file, it tries to hold onto the page
cache pages for that file, manipulate them directly, and call a cocktail of
operations to write the file out. I don't believe this is a supportable
approach.

This patch changes the approach to use the same approach as swap files. i.e.
bmap is used to enumerate all the block address of parts of the file and we
write directly to those blocks of the device.

swapfile only uses parts of the file that provide a full pages at contiguous
addresses. We don't have that luxury so we have to cope with pages that are
non-contiguous in storage. To handle this we attach buffers to each page, and
store the addresses in those buffers.

With this approach the pagecache may contain data which is inconsistent with
what is on disk. To alleviate the problems this can cause, md invalidates the
pagecache when releasing the file. If the file is to be examined while the
array is active (a non-critical but occasionally useful function), O_DIRECT io
must be used. And new version of mdadm will have support for this.

This approach simplifies a lot of code:
- we no longer need to keep a list of pages which we need to wait for,
as the b_endio function can keep track of how many outstanding
writes there are. This saves a mempool.
- -EAGAIN returns from write_page are no longer possible (not sure if
they ever were actually).

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-27 00:58:38 +0800
0b79ccf0c [PATCH] md/bitmap: remove bitmap writeback daemon ... Browse Code »

md/bitmap currently has a separate thread to wait for writes to the bitmap
file to complete (as we cannot get a callback on that action).

However this isn't needed as bitmap_unplug is called from process context and
waits for the writeback thread to do it's work. The same result can be
achieved by doing the waiting directly in bitmap_unplug.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-27 00:58:38 +0800
5e56341d0 [PATCH] md: make md_print_devices() static ... Browse Code »

This patch makes the needlessly global md_print_devices() static.

Signed-off-by: Adrian Bunk
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2006-06-27 00:58:37 +0800
c93983bf5 [PATCH] md: support stripe/offset mode in raid10 ... Browse Code »

The "industry standard" DDF format allows for a stripe/offset layout where
data is duplicated on different stripes. e.g.

A B C D
D A B C
E F G H
H E F G

(columns are drives, rows are stripes, LETTERS are chunks of data).

This is similar to raid10's 'far' mode, but not quite the same. So enhance
'far' mode with a 'far/offset' option which follows the layout of DDFs
stripe/offset.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-27 00:58:37 +0800
7c7546ccf [PATCH] md: allow a linear array to have drives added while active ... Browse Code »

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-27 00:58:37 +0800
5fd6c1dce [PATCH] md: allow checkpoint of recovery with version-1 superblock ... Browse Code »

For a while we have had checkpointing of resync. The version-1 superblock
allows recovery to be checkpointed as well, and this patch implements that.

Due to early carelessness we need to add a feature flag to signal that the
recovery_offset field is in use, otherwise older kernels would assume that a
partially recovered array is in fact fully recovered.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-27 00:58:37 +0800
16a53ecc3 [PATCH] md: merge raid5 and raid6 code ... Browse Code »

There is a lot of commonality between raid5.c and raid6main.c. This patches
merges both into one module called raid456. This saves a lot of code, and
paves the way for online raid5->raid6 migrations.

There is still duplication, e.g. between handle_stripe5 and handle_stripe6.
This will probably be cleaned up later.

Cc: "H. Peter Anvin"
Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-27 00:58:37 +0800
8932c2e0d [PATCH] md: remove arbitrary limit on chunk size ... Browse Code »

The largest chunk size the code can support without substantial surgery is
2^30 bytes, so make that the limit instead of an arbitrary 4Meg. Some day,
the 'chunksize' should change to a sector-shift instead of a byte-count. Then
no limit would be needed.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-06-27 00:58:36 +0800