Eric Lee / smarc-fsl-linux-kernel

07 Jan, 2006

40 commits

9ffae0cf3 [PATCH] md: convert md to use kzalloc throughout ... Browse Code »

Replace multiple kmalloc/memset pairs with kzalloc calls.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:05 +0800
2d1f3b5d1 [PATCH] md: clean up 'page' related names in md ... Browse Code »

Substitute:

page_cache_get -> get_page
page_cache_release -> put_page
PAGE_CACHE_SHIFT -> PAGE_SHIFT
PAGE_CACHE_SIZE -> PAGE_SIZE
PAGE_CACHE_MASK -> PAGE_MASK
__free_page -> put_page

because we aren't using the page cache, we are just using pages.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:05 +0800
d7603b7e3 [PATCH] md: make /proc/mdstat pollable ... Browse Code »

With this patch it is possible to poll /proc/mdstat to detect arrays appearing
or disappearing, to detect failures, recovery starting, recovery completing,
and devices being added and removed.

It is similar to the poll-ability of /proc/mounts, though different in that:

We always report that the file is readable (because face it, it is, even if
only for EOF).

We report POLLPRI when there is a change so that select() can detect
it as an exceptional event. Not only are these exceptional events, but
that is the mechanism that the current 'mdadm' uses to watch for events
(It also polls after a timeout).
(We also report POLLERR like /proc/mounts).

Finally, we only reset the per-file event counter when the start of the file
is read, rather than when poll() returns an event. This is more robust as it
means that an fd will continue to report activity to poll/select until the
program clearly responds to that activity.

md_new_event takes an 'mddev' which isn't currently used, but it will be soon.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:05 +0800
0eb3ff12a [PATCH] md: raid10 read-error handling - resync and read-only ... Browse Code »

Add in correct read-error handling for resync and read-only situations.

When read-only, we don't over-write, so we need to mark the failed drive in
the r10_bio so we don't re-try it. During resync, we always read all blocks,
so if there is a read error, we simply over-write it with the good block that
we found (assuming we found one).

Note that the recovery case still isn't handled in an interesting way. There
is nothing useful to do for the 2-copies case. If there are 3 or more copies,
then we could try reading from one of the non-missing copies, but this is a
bit complicated and very rarely would be used, so I'm leaving it for now.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:05 +0800
4443ae10c [PATCH] md: auto-correct correctable read errors in raid10 ... Browse Code »

Largely just a cross-port from raid1.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:05 +0800
220946c90 [PATCH] md: make sure read error on last working drive of raid1 actually returns failure ... Browse Code »

We are inadvertently setting the R1BIO_Uptodate bit on read errors when we
decide not to try correcting (because there are no other working devices).
This means that the read error is reported to the client as success.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:04 +0800
d11c171e6 [PATCH] md: allow raid1 to check consistency ... Browse Code »

Where performing a user-requested 'check' or 'repair', we read all readable
devices, and compare the contents. We only write to blocks which had read
errors, or blocks with content that differs from the first good device found.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:04 +0800
18f08819f [PATCH] md: support check-without-repair of raid10 arrays ... Browse Code »

Also keep count on the number of errors found.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:04 +0800
9910f16af [PATCH] md: fix up some rdev rcu locking in raid5/6 ... Browse Code »

There is this "FIXME" comment with a typo in it!! that been annoying me for
days, so I just had to remove it.

conf->disks[i].rdev should only be accessed if
- we know we hold a reference or
- the mddev->reconfig_sem is down or
- we have a rcu_readlock

handle_stripe was referencing rdev in three places without any of these. For
the first two, get an rcu_readlock. For the last, the same access
(md_sync_acct call) is made a little later after the rdev has been claimed
under and rcu_readlock, if R5_Syncio is set. So just use that access...
However R5_Syncio isn't really needed as the 'syncing' variable contains the
same information. So use that instead.

Issues, comment, and fix are identical in raid5 and raid6.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:04 +0800
cf30a473a [PATCH] md: handle errors when read-only ... Browse Code »

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:04 +0800
69382e853 [PATCH] md: better handling for read error in raid1 during resync ... Browse Code »

Handling of read errors during resync is separate from handling of read errors
during normal IO in raid1. A previous patch added support for read errors
during normal IO. This one adds support for read errors during resync or
recovery.

The key differences are that we don't need to freeze the array, because the
normal handling of resync means that this part of the array will be idle
except for resync, and the read/overwrite/re-read is needed in a separate
piece of code.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:04 +0800
3e198f782 [PATCH] md: tidyup some issues with raid1 resync and prepare for catching read errors ... Browse Code »

We are dereferencing ->rdev without an rcu lock!

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:03 +0800
ddaf22aba [PATCH] md: attempt to auto-correct read errors in raid1 ... Browse Code »

On a read-error we suspend the array, then synchronously read the block from
other arrays until we find one where we can read it. Then we try writing the
good data back everywhere and make sure it works. If any write or subsequent
read fails, only then do we fail the device out of the array.

To be able to suspend the array, we need to also keep track of how many
requests are queued for handling by raid1d.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:03 +0800
d69762e98 [PATCH] md: improve handing of read errors with raid6 ... Browse Code »

This is a simple port of match functionality across from raid5. If we get a
read error, we don't kick the drive straight away, but try to over-write with
good data first.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:03 +0800
ca65b73bd [PATCH] md: fix raid6 resync check/repair code ... Browse Code »

raid6 currently does not check the P/Q syndromes when doing a resync, it just
calculates the correct value and writes it. Doing the check can reduce writes
(often to 0) for a resync, and it is needed to properly implement the

echo check > sync_action

operation.

This patch implements the appropriate checks and tidies up some related code.

It also allows raid6 user-requested resync to bypass the intent bitmap.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:03 +0800
6cce3b23f [PATCH] md: write intent bitmap support for raid10 ... Browse Code »

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:03 +0800
b15c2e57f [PATCH] md: move bitmap_create to after md array has been initialised ... Browse Code »

This is important because bitmap_create uses
mddev->resync_max_sectors
and that doesn't have a valid value until after the array
has been initialised (with pers->run()).
[It doesn't make a difference for current personalities that
support bitmaps, but will make a difference for raid10]

This has the added advantage of meaning with can move the thread->timeout
manipulation inside the bitmap.c code instead of sprinkling identical code
throughout all personalities.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:03 +0800
6ff8d8ec0 [PATCH] md: allow dirty raid[456] arrays to be started at boot ... Browse Code »

See patch to md.txt for more details

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:02 +0800
14f8d26b8 [PATCH] md: small cleanups for raid5 ... Browse Code »

Resync code:
A test that isn't needed,
a 'compute_block' that makes more sense
elsewhere (And then doesn't need a test),
a couple of BUG_ONs to confirm the change makes sense.

Printks:
A few were missing KERN_*

Also fix a typo in a comment..

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:02 +0800
0a27ec96b [PATCH] md: improve raid10 "IO Barrier" concept ... Browse Code »

raid10 needs to put up a barrier to new requests while it does resync or other
background recovery. The code for this is currently open-coded, slighty
obscure by its use of two waitqueues, and not documented.

This patch gathers all the related code into 4 functions, and includes a
comment which (hopefully) explains what is happening.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:02 +0800
17999be4a [PATCH] md: improve raid1 "IO Barrier" concept ... Browse Code »

raid1 needs to put up a barrier to new requests while it does resync or other
background recovery. The code for this is currently open-coded, slighty
obscure by its use of two waitqueues, and not documented.

This patch gathers all the related code into 4 functions, and includes a
comment which (hopefully) explains what is happening.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

NeilBrown
2006-01-07 00:34:01 +0800
ac81b2ee4 [PATCH] make dm-mirror not issue invalid resync requests ... Browse Code »

I've been attempting to set up a (Host)RAID mirror with dm_mirror on
2.6.14.3, and I've been having a strange little problem. The configuration
in question is a set of 9GB SCSI disks that have 17942584 sectors. I set
up the dm_mirror table as such:

0 17942528 mirror core 2 2048 nosync 2 8:48 0 8:64 0

If I'm not mistaken, this sets up a 9GB RAID1 mriror with 1MB stripes
across both SCSI disks. The sector count of the dm device is less than the
size of the disks, so we shouldn't fall off the end. However, I always get
the messages like this in dmesg when I set up the dm table:

attempt to access beyond end of device
sdd: rw=0, want=17958656, limit=17942584

Clearly, something is trying to read sectors past the end of the drive. I
traced it down to the __rh_recovery_prepare function in dm-raid1.c, which
gets called when we're putting the mirror set together. This function
calls the dirty region log's get_resync_work function to see if there's any
resync that needs to be done, and queues up any areas that are out of sync.
The log's get_resync_work function is actually a pointer to the
core_get_resync_work function in dm-log.c.

The core_get_resync_work function queries a bitset lc->sync_bits to find
out if there are any regions that are out of date (i.e. the bit is 0),
which is where the problem occurs. If every bit in lc->sync_bits is 1
(which is the case when we've just configured a new RAID1 with the nosync
option), the find_next_zero_bit does NOT return the size parameter
(lc->region_count in this case), it returns the size parameter rounded up
to the nearest multiple of 32! I don't know if this is intentional, but
i386 and x86_64 both exhibit this behavior.

In any case, the statement "if (*region == lc->region_count)" looks like
it's supposed to catch the case where are no regions to resync and
return 0. Since find_next_zero_bit apparently has a habit of returning
a value that's larger than lc->region_count, the enclosed patch changes
the equality test to a greater-than test so that we don't try to resync
areas outside of the RAID1 region. Seeing as the HostRAID metadata
lives just past the end of the RAID1 data, mucking around in that area
is not a good idea.

I suppose another way to fix this would be to amend find_next_zero_bit so
that it doesn't return values larger than "size", but I don't know if
there's a reason for the current behavior.

Signed-Off-By: Darrick J. Wong
Acked-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Darrick J. Wong
2006-01-07 00:34:01 +0800
9d3520a33 [PATCH] dm-crypt: zero key before freeing it ... Browse Code »

Zap the memory before freeing it so we don't leave crypto information
around in memory.

Signed-off-by: Stefan Rompf
Acked-by: Clemens Fruhwirth
Acked-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stefan Rompf
2006-01-07 00:34:01 +0800
0b56306e5 [PATCH] drivers/md/kcopyd.c: #if 0 kcopyd_cancel() ... Browse Code »

This patch #if 0's the not yet implemented global function kcopyd_cancel().

Signed-off-by: Adrian Bunk
Acked-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2006-01-07 00:34:01 +0800
6da487dcc [PATCH] device-mapper ioctl: add skip lock_fs flag ... Browse Code »

Add ioctl DM_SKIP_LOCKFS_FLAG for userspace to request that lock_fs is
bypassed when suspending a device.

There's no change to the behaviour of existing code that doesn't know about
the new flag.

Signed-off-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alasdair G Kergon
2006-01-07 00:34:01 +0800
aa8d7c2fb [PATCH] device-mapper: make lock_fs optional ... Browse Code »

Devices only needs syncing when creating snapshots, so make this optional when
suspending a device.

Signed-off-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alasdair G Kergon
2006-01-07 00:34:01 +0800
e39e2e95e [PATCH] device-mapper: rename frozen_bdev ... Browse Code »

Rename frozen_bdev to suspended_bdev and move the bdget outside lockfs. (This
prepares for making lockfs optional.)

Signed-off-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alasdair G Kergon
2006-01-07 00:34:00 +0800
a1a190807 [PATCH] device-mapper raid1: add default mirror ... Browse Code »

This patch introduces a new field to the mirror_set (default_mirror) to store
the default mirror.

(A subsequent patch will allow us to change the default mirror in the event of
a failure.)

Signed-off-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jonathan E Brassow
2006-01-07 00:34:00 +0800
2d5fe6898 [PATCH] device-mapper: scanf sector format change ... Browse Code »

Use %llu not %Lu in sscanf/printf format strings.

Signed-off-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alasdair G Kergon
2006-01-07 00:34:00 +0800
e6c276159 [PATCH] device-mapper: remove unused definition ... Browse Code »

This patch removes an unused #define.

Signed-off-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Stribblehill
2006-01-07 00:34:00 +0800
2d38fe204 [PATCH] device-mapper snapshot: metadata reading separation ... Browse Code »

More snapshot metadata reading into separate function, to prepare for changing
the place it gets called from.

Signed-off-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alasdair G Kergon
2006-01-07 00:34:00 +0800
81f1777a5 [PATCH] device-mapper ioctl: event on rename ... Browse Code »

After changing the name of a mapped device, trigger a dm event. (For
userspace multipath tools.)

Signed-off-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

goggin, edward
2006-01-07 00:34:00 +0800
d229a9589 [PATCH] device-mapper: add dm_get_md ... Browse Code »

Add dm_get_dev() to get a mapped device given its dev_t.

Signed-off-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Teigland
2006-01-07 00:34:00 +0800
637842cfd [PATCH] device-mapper: add dm_find_md ... Browse Code »

Abstract dm_find_md() from dm_get_mdptr() to allow use elsewhere.

Signed-off-by: Alasdair G Kergon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Teigland
2006-01-07 00:33:59 +0800
9f708e40f [PATCH] knfsd: reduce stack consumption ... Browse Code »

A typical nfsd call trace is
nfsd -> svc_process -> nfsd_dispatch -> nfsd3_proc_write ->
nfsd_write ->nfsd_vfs_write -> vfs_writev

These add up to over 300 bytes on the stack.
Looking at each of these, I see that nfsd_write (which includes
nfsd_vfs_write) contributes 0x8c to stack usage itself!!

It turns out this is because it puts a 'struct iattr' on the stack so
it can kill suid if needed. The following patch saves about 50 bytes
off the stack in this call path.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Brown
2006-01-07 00:33:59 +0800
a334de286 [PATCH] knfsd: check error status from vfs_getattr and i_op->fsync ... Browse Code »

Both vfs_getattr and i_op->fsync return error statuses which nfsd was
largely ignoring. This as noticed when exporting directories using fuse.

This patch cleans up most of the offences, which involves moving the call
to vfs_getattr out of the xdr encoding routines (where it is too late to
report an error) into the main NFS procedure handling routines.

There is still a called to vfs_gettattr (related to the ACL code) where the
status is ignored, and called to nfsd_sync_dir don't check return status
either.

Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Shaw
2006-01-07 00:33:59 +0800
93fbf1a5d [PATCH] Keep nfsd from exiting when seeing recv() errors ... Browse Code »

I submitted this one previously - svc_tcp_recvfrom currently returns
any errors to the caller, including ECONNRESET and the like.

This is something svc_recv isn't able to deal with:

len = svsk->sk_recvfrom(rqstp);
[...]
if (len == 0 || len == -EAGAIN) {
[...]
return -EAGAIN;
}

[...]
return len;

The nfsd main loop will exit when it sees an error code other than
EAGAIN.

The following patch fixes this problem

svc_recv is not equipped to deal with error codes other than EAGAIN,
and will propagate anything else (such as ECONNRESET) up to nfsd,
causing it to exit.

Signed-off-by: Olaf Kirch
Cc: Trond Myklebust
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Olaf Kirch
2006-01-07 00:33:59 +0800
f93ea411b [PATCH] jbd: split checkpoint lists ... Browse Code »

Split the checkpoint list of the transaction into two lists. In the first
list we keep the buffers that need to be submitted for IO. In the second
list are kept buffers that were already submitted and we just have to wait
for the IO to complete. This should simplify a handling of checkpoint
lists a bit and can eventually be also a performance gain.

Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2006-01-07 00:33:59 +0800
6fe2e70bb [PATCH] kernel/module.c: removed dead code ... Browse Code »

This patch fixes an issue reported by Coverity in kernel/module.c

Error reported: Cannot reach this line of code "else return ptr;"

Patch description:
This is the error path, so 'err' will be negative, the else case
is not required, this patch removes it.

Signed-off-by: Jayachandran C.
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jayachandran C
2006-01-07 00:33:59 +0800
066bb8d03 [PATCH] fix remaining list_for_each_safe_rcu in -mm (take 2) ... Browse Code »

I missed a use of list_for_each_rcu_safe() in -mm tree. Here is an updated
patch to fix it. This time tested on a machine that actually uses IPMI...
(Thanks to Serge Hallyn for spotting this.)

Signed-off-by: "Paul E. McKenney"
Cc: Corey Minyard
Cc: Matt Domsch
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul E. McKenney
2006-01-07 00:33:58 +0800