Eric Lee / smarc-fsl-linux-kernel

08 Mar, 2010

1 commit

52cf25d0a Driver core: Constify struct sysfs_ops in struct kobj_type ... Browse Code »

Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime

* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime

* potentially better optimized code as the compiler
can assume that the const data cannot be changed

* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing

Signed-off-by: Emese Revfy
Acked-by: David Teigland
Acked-by: Matt Domsch
Acked-by: Maciej Sosnowski
Acked-by: Hans J. Koch
Acked-by: Pekka Enberg
Acked-by: Jens Axboe
Acked-by: Stephen Hemminger
Signed-off-by: Greg Kroah-Hartman

Emese Revfy
2010-03-08 09:04:49 +0800

06 Mar, 2010

14 commits

f07030409 dm raid1: fix deadlock when suspending failed device ... Browse Code »

To prevent deadlock, bios in the hold list should be flushed before
dm_rh_stop_recovery() is called in mirror_suspend().

The recovery can't start because there are pending bios and therefore
dm_rh_stop_recovery deadlocks.

When there are pending bios in the hold list, the recovery waits for
the completion of the bios after recovery_count is acquired.
The recovery_count is released when the recovery finished, however,
the bios in the hold list are processed after dm_rh_stop_recovery() in
mirror_presuspend(). dm_rh_stop_recovery() also acquires recovery_count,
then deadlock occurs.

Signed-off-by: Takahiro Yasui
Signed-off-by: Alasdair G Kergon
Reviewed-by: Mikulas Patocka

Takahiro Yasui
2010-03-06 10:32:35 +0800
924e600d4 dm: eliminate some holes data structures ... Browse Code »

Eliminate a 4-byte hole in 'struct dm_io_memory' by moving 'offset' above the
'ptr' to which it applies (size reduced from 24 to 16 bytes). And by
association, 1-4 byte hole is eliminated in 'struct dm_io_request' (size
reduced from 56 to 48 bytes).

Eliminate all 6 4-byte holes and 1 cache-line in 'struct dm_snapshot' (size
reduced from 392 to 368 bytes).

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-03-06 10:32:33 +0800
3abf85b5b dm ioctl: introduce flag indicating uevent was generated ... Browse Code »

Set a new DM_UEVENT_GENERATED_FLAG when returning from ioctls to
indicate that a uevent was actually generated. This tells the userspace
caller that it may need to wait for the event to be processed.

Signed-off-by: Peter Rajnoha
Signed-off-by: Alasdair G Kergon

Peter Rajnoha
2010-03-06 10:32:31 +0800
a97f925a3 dm: free dm_io before bio_endio not after ... Browse Code »

Free the dm_io structure before calling bio_endio() instead of after it,
to ensure that the io_pool containing it is not referenced after it is
freed.

This partially fixes a problem described here
https://www.redhat.com/archives/dm-devel/2010-February/msg00109.html

thread 1:
bio_endio(bio, io_error);
/* scheduling happens */
thread 2:
close the device
remove the device
thread 1:
free_io(md, io);

Thread 2, when removing the device, sees non-empty md->io_pool (because the
io hasn't been freed by thread 1 yet) and may crash with BUG in mempool_free.
Thread 1 may also crash, when freeing into a nonexisting mempool.

To fix this we must make sure that bio_endio() is the last call and
the md structure is not accessed afterwards.

There is another bio_endio in process_barrier, but it is called from the thread
and the thread is destroyed prior to freeing the mempools, so this call is
not affected by the bug.

A similar bug exists with module unloads - the module may be unloaded
immediately after bio_endio - but that is more difficult to fix.

Signed-off-by: Mikulas Patocka
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2010-03-06 10:32:29 +0800
8215d6ec5 dm table: remove unused dm_get_device range parameters ... Browse Code »

Remove unused parameters(start and len) of dm_get_device()
and fix the callers.

Signed-off-by: Nikanth Karthikesan
Signed-off-by: Alasdair G Kergon

Nikanth Karthikesan
2010-03-06 10:32:27 +0800
0f3649a9e dm ioctl: only issue uevent on resume if state changed ... Browse Code »

Only issue a uevent on a resume if the state of the device changed,
i.e. if it was suspended and/or its table was replaced.

Signed-off-by: Dave Wysochanski
Signed-off-by: Mike Snitzer
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-03-06 10:32:24 +0800
ede5ea0b8 dm raid1: always return error if all legs fail ... Browse Code »

If all mirror legs fail, always return an error instead of holding the
bio, even if the handle_errors option was set. At present it is the
responsibility of the driver underneath us to deal with retries,
multipath etc.

The patch adds the bio to the failures list instead of holding it
directly. do_failures tests first if all legs failed and, if so,
returns the bio with -EIO. If any leg is still alive and handle_errors
is set, do_failures calls hold_bio.

Reviewed-by: Takahiro Yasui
Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2010-03-06 10:32:22 +0800
fb6126429 dm mpath: refactor pg_init ... Browse Code »

This patch pulls the pg_init path activation code out of
process_queued_ios() into a new function.

No functional change.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2010-03-06 10:32:18 +0800
2bded7bd7 dm mpath: wait for pg_init completion when suspending ... Browse Code »

When suspending the device we must wait for all I/O to complete, but
pg-init may be still in progress even after flushing the workqueue
for kmpath_handlerd in multipath_postsuspend.

This patch waits for pg-init completion correctly in
multipath_postsuspend().

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2010-03-06 10:32:13 +0800
d0259bf0e dm mpath: hold io until all pg_inits completed ... Browse Code »

m->queue_io is set to block processing I/Os, and it needs to be kept
while pg-init, which issues multiple path activations, is in progress.
But m->queue is cleared when a path activation completes without error
in pg_init_done(), even while other path activations are in progress.
That may cause undesired -EIO on paths which are not complete activation.

This patch fixes that by not clearing m->queue_io until all path
activations complete.

(Before the hardware handlers were moved into the SCSI layer, pg_init
only used one path.)

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2010-03-06 10:30:02 +0800
fce323dd6 dm mpath: avoid storing private suspended state ... Browse Code »

'suspended' flag in struct multipath was introduced to check whether
the multipath target is in suspended state, but the same check is
done through dm_suspended() now, so remove the flag and related code.

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Cc: Mike Anderson
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2010-03-06 10:29:59 +0800
ecdb2e257 dm table: remove dm_get from dm_table_get_md ... Browse Code »

Remove the dm_get() in dm_table_get_md() because dm_table_get_md() could
be called from presuspend/postsuspend, which are called while
mapped_device is in DMF_FREEING state, where dm_get() is not allowed.

Justification for that is the lifetime of both objects: As far as the
current dm design/implementation, mapped_device is never freed while
targets are doing something, because dm core waits for targets to become
quiet in dm_put() using presuspend/postsuspend. So targets should be
able to touch mapped_device without holding reference count of the
mapped_device, and we should allow targets to touch mapped_device even
if it is in DMF_FREEING state.

Backgrounds:
I'm trying to remove the multipath internal queue, since dm core now has
a generic queue for request-based dm. In the patch-set, the multipath
target wants to request dm core to start/stop queue. One of such
start/stop requests can happen during postsuspend() while the target
waits for pg-init to complete, because the target stops queue when
starting pg-init and tries to restart it when completing pg-init. Since
queue belongs to mapped_device, it involves calling dm_table_get_md()
and dm_put(). On the other hand, postsuspend() is called in dm_put()
for mapped_device which is in DMF_FREEING state, and that triggers
BUG_ON(DMF_FREEING) in the 2nd dm_put().

I had tried to solve this problem by changing only multipath not to
touch mapped_device which is in DMF_FREEING state, but I couldn't and I
came up with a question why we need dm_get() in dm_table_get_md().

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2010-03-06 10:29:52 +0800
f7b934c81 dm mpath: skip activate_path for failed paths ... Browse Code »

This patch adds two minor fixes while processing device mapper path activation.

Skip failed paths while calling activate_path. If the path is already failed
then activate_path will fail for sure. We don't have to call in that case. In
some case this might cause prolonged retries unnecessarily.

Change the misleading message if the path being activated fails with SCSI_DH_NOSYS.

Signed-off-by: Babu Moger
Signed-off-by: Alasdair G Kergon

Moger, Babu
2010-03-06 10:29:49 +0800
83c0d5d53 dm mpath: pass struct pgpath to pg init done ... Browse Code »

This patch removes some unnecessary argument casting. There is no
functional change with this patch.

Passes 'struct pgpath' through to pg_init_done() instead of the enclosed
'struct dm_path'.

Tested the changes with LSI storage..

CC: Chandra Seetharaman
Signed-off-by: Babu Moger
Acked-by: Kiyoshi Ueda
Signed-off-by: Alasdair G Kergon

Moger, Babu
2010-03-06 10:29:45 +0800

03 Mar, 2010

1 commit

0a135ba14 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: add __percpu sparse annotations to what's left
percpu: add __percpu sparse annotations to fs
percpu: add __percpu sparse annotations to core kernel subsystems
local_t: Remove leftover local.h
this_cpu: Remove pageset_notifier
this_cpu: Page allocator conversion
percpu, x86: Generic inc / dec percpu instructions
local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c
module: Use this_cpu_xx to dynamically allocate counters
local_t: Remove cpu_local_xx macros
percpu: refactor the code in pcpu_[de]populate_chunk()
percpu: remove compile warnings caused by __verify_pcpu_ptr()
percpu: make accessors check for percpu pointer in sparse
percpu: add __percpu for sparse.
percpu: make access macros universal
percpu: remove per_cpu__ prefix.

Linus Torvalds
2010-03-03 23:34:18 +0800

26 Feb, 2010

2 commits

8a78362c4 block: Consolidate phys_segment and hw_segment limits ... Browse Code »

Except for SCSI no device drivers distinguish between physical and
hardware segment limits. Consolidate the two into a single segment
limit.

Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Martin K. Petersen
2010-02-26 20:58:08 +0800
086fa5ff0 block: Rename blk_queue_max_sectors to blk_queue_max_hw_sectors ... Browse Code »

The block layer calling convention is blk_queue_.
blk_queue_max_sectors predates this practice, leading to some confusion.
Rename the function to appropriately reflect that its intended use is to
set max_hw_sectors.

Also introduce a temporary wrapper for backwards compability. This can
be removed after the merge window is closed.

Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Martin K. Petersen
2010-02-26 20:58:08 +0800

17 Feb, 2010

8 commits

a29d8b8e2 percpu: add __percpu sparse annotations to what's left ... Browse Code »

Add __percpu sparse annotations to places which didn't make it in one
of the previous patches. All converions are trivial.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.

Signed-off-by: Tejun Heo
Acked-by: Borislav Petkov
Cc: Dan Williams
Cc: Huang Ying
Cc: Len Brown
Cc: Neil Brown

Tejun Heo
2010-02-17 10:17:38 +0800
9307f6b19 dm: sysfs revert add empty release function to avoid debug warning ... Browse Code »

Revert commit d2bb7df8cac647b92f51fb84ae735771e7adbfa7 at Greg's request.

Author: Milan Broz
Date: Thu Dec 10 23:51:53 2009 +0000

dm: sysfs add empty release function to avoid debug warning

This patch just removes an unnecessary warning:
kobject: 'dm': does not have a release() function,
it is broken and must be fixed.

The kobject is embedded in mapped device struct, so
code does not need to release memory explicitly here.

Cc: Greg KH
Signed-off-by: Alasdair G Kergon

Alasdair G Kergon
2010-02-17 02:43:04 +0800
9eef87da2 dm mpath: fix stall when requeueing io ... Browse Code »

This patch fixes the problem that system may stall if target's ->map_rq
returns DM_MAPIO_REQUEUE in map_request().
E.g. stall happens on 1 CPU box when a dm-mpath device with queue_if_no_path
bounces between all-paths-down and paths-up on I/O load.

When target's ->map_rq returns DM_MAPIO_REQUEUE, map_request() requeues
the request and returns to dm_request_fn(). Then, dm_request_fn()
doesn't exit the I/O dispatching loop and continues processing
the requeued request again.
This map and requeue loop can be done with interrupt disabled,
so 1 CPU system can be stalled if this situation happens.

For example, commands below can stall my 1 CPU box within 1 minute or so:
# dmsetup table mp
mp: 0 2097152 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 2 8:144 1 1
# while true; do dd if=/dev/mapper/mp of=/dev/null bs=1M count=100; done &
# while true; do \
> dmsetup message mp 0 "fail_path 8:144" \
> dmsetup suspend --noflush mp \
> dmsetup resume mp \
> dmsetup message mp 0 "reinstate_path 8:144" \
> done

To fix the problem above, this patch changes dm_request_fn() to exit
the I/O dispatching loop once if a request is requeued in map_request().

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon

Kiyoshi Ueda
2010-02-17 02:43:01 +0800
558569aa9 dm raid1: fix null pointer dereference in suspend ... Browse Code »

When suspending a failed mirror, bios are completed by mirror_end_io() and
__rh_lookup() in dm_rh_dec() returns NULL where a non-NULL return value is
required by design. Fix this by not changing the state of the recovery failed
region from DM_RH_RECOVERING to DM_RH_NOSYNC in dm_rh_recovery_end().

Issue

On 2.6.33-rc1 kernel, I hit the bug when I suspended the failed
mirror by dmsetup command.

BUG: unable to handle kernel NULL pointer dereference at 00000020
IP: [] dm_rh_dec+0x35/0xa1 [dm_region_hash]
...
EIP: 0060:[] EFLAGS: 00010046 CPU: 0
EIP is at dm_rh_dec+0x35/0xa1 [dm_region_hash]
EAX: 00000286 EBX: 00000000 ECX: 00000286 EDX: 00000000
ESI: eff79eac EDI: eff79e80 EBP: f6915cd4 ESP: f6915cc4
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process dmsetup (pid: 2849, ti=f6914000 task=eff03e80 task.ti=f6914000)
...
Call Trace:
[] ? mirror_end_io+0x53/0x1b1 [dm_mirror]
[] ? clone_endio+0x4d/0xa2 [dm_mod]
[] ? mirror_end_io+0x0/0x1b1 [dm_mirror]
[] ? clone_endio+0x0/0xa2 [dm_mod]
[] ? bio_endio+0x28/0x2b
[] ? hold_bio+0x2d/0x62 [dm_mirror]
[] ? mirror_presuspend+0xeb/0xf7 [dm_mirror]
[] ? vmap_page_range+0xb/0xd
[] ? suspend_targets+0x2d/0x3b [dm_mod]
[] ? dm_table_presuspend_targets+0xe/0x10 [dm_mod]
[] ? dm_suspend+0x4d/0x150 [dm_mod]
[] ? dev_suspend+0x55/0x18a [dm_mod]
[] ? _copy_from_user+0x42/0x56
[] ? dm_ctl_ioctl+0x22c/0x281 [dm_mod]
[] ? dev_suspend+0x0/0x18a [dm_mod]
[] ? dm_ctl_ioctl+0x0/0x281 [dm_mod]
[] ? vfs_ioctl+0x22/0x85
[] ? do_vfs_ioctl+0x4cb/0x516
[] ? sys_ioctl+0x40/0x5a
[] ? sysenter_do_call+0x12/0x28

Analysis

When recovery process of a region failed, dm_rh_recovery_end() function
changes the state of the region from RM_RH_RECOVERING to DM_RH_NOSYNC.
When recovery_complete() is executed between dm_rh_update_states() and
dm_writes() in do_mirror(), bios are processed with the region state,
DM_RH_NOSYNC. However, the region data is freed without checking its
pending count when dm_rh_update_states() is called next time.

When bios are finished by mirror_end_io(), __rh_lookup() in dm_rh_dec()
returns NULL even though a valid return value are expected.

Solution

Remove the state change of the recovery failed region from DM_RH_RECOVERING
to DM_RH_NOSYNC in dm_rh_recovery_end(). We can remove the state change
because:

- If the region data has been released by dm_rh_update_states(),
a new region data is created with the state of DM_RH_NOSYNC, and
bios are processed according to the DM_RH_NOSYNC state.

- If the region data has not been released by dm_rh_update_states(),
a state of the region is DM_RH_RECOVERING and bios are put in the
delayed_bio list.

The flag change from DM_RH_RECOVERING to DM_RH_NOSYNC in dm_rh_recovery_end()
was added in the following commit:
dm raid1: handle resync failures
author Jonathan Brassow
Thu, 12 Jul 2007 16:29:04 +0000 (17:29 +0100)
http://git.kernel.org/linus/f44db678edcc6f4c2779ac43f63f0b9dfa28b724

Signed-off-by: Takahiro Yasui
Reviewed-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Takahiro Yasui
2010-02-17 02:42:58 +0800
5528d17de dm raid1: fail writes if errors are not handled and log fails ... Browse Code »

If the mirror log fails when the handle_errors option was not selected
and there is no remaining valid mirror leg, writes return success even
though they weren't actually written to any device. This patch
completes them with EIO instead.

This code path is taken:
do_writes:
bio_list_merge(&ms->failures, &sync);
do_failures:
if (!get_valid_mirror(ms)) (false)
else if (errors_handled(ms)) (false)
else bio_endio(bio, 0);

The logic in do_failures is based on presuming that the write was already
tried: if it succeeded at least on one leg (without handle_errors) it
is reported as success.

Reference: https://bugzilla.redhat.com/show_bug.cgi?id=555197

Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon

Mikulas Patocka
2010-02-17 02:42:55 +0800
ebfd32bba dm log: userspace fix overhead_size calcuations ... Browse Code »

This patch fixes two bugs that revolve around the miscalculation and
misuse of the variable 'overhead_size'. 'overhead_size' is the size of
the various header structures used during communication.

The first bug is the use of 'sizeof' with the pointer of a structure
instead of the structure itself - resulting in the wrong size being
computed. This is then used in a check to see if the payload
(data_size) would be to large for the preallocated structure. Since the
bug produces a smaller value for the overhead, it was possible for the
structure to be breached. (Although the current users of the code do
not currently send enough data to trigger this bug.)

The second bug is that the 'overhead_size' value is used to compute how
much of the preallocated space should be cleared before populating it
with fresh data. This should have simply been 'sizeof(struct cn_msg)'
not overhead_size. The fact that 'overhead_size' was computed
incorrectly made this problem "less bad" - leaving only a pointer's
worth of space at the end uncleared. Thus, this bug was never producing
a bad result, but still needs to be fixed - especially now that the
value is computed correctly.

Cc: stable@kernel.org
Signed-off-by: Jonathan Brassow

Jonathan Brassow
2010-02-17 02:42:53 +0800
55f67f2de dm snapshot: persistent annotate work_queue as on stack ... Browse Code »

chunk_io() declares its 'struct mdata_req' on the stack and then
initializes its 'struct work_struct' member. Annotate the
initialization of this workqueue with INIT_WORK_ON_STACK to suppress a
debugobjects warning seen when CONFIG_DEBUG_OBJECTS_WORK is enabled.

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-02-17 02:42:51 +0800
781248c1b dm stripe: avoid divide by zero with invalid stripe count ... Browse Code »

If a table containing zero as stripe count is passed into stripe_ctr
the code attempts to divide by zero.

This patch changes DM_TABLE_LOAD to return -EINVAL if the stripe count
is zero.

We now get the following error messages:
device-mapper: table: 253:0: striped: Invalid stripe count
device-mapper: ioctl: error adding target to table

Signed-off-by: Nikanth Karthikesan
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon

Nikanth Karthikesan
2010-02-17 02:42:47 +0800

10 Feb, 2010

1 commit

ef286f6fa md: fix some lockdep issues between md and sysfs. ... Browse Code »

======
This fix is related to
http://bugzilla.kernel.org/show_bug.cgi?id=15142
but does not address that exact issue.
======

sysfs does like attributes being removed while they are being accessed
(i.e. read or written) and waits for the access to complete.

As accessing some md attributes takes the same lock that is held while
removing those attributes a deadlock can occur.

This patch addresses 3 issues in md that could lead to this deadlock.

Two relate to calling flush_scheduled_work while the lock is held.
This is probably a bad idea in general and as we use schedule_work to
delete various sysfs objects it is particularly bad.

In one case flush_scheduled_work is called from md_alloc (called by
md_probe) called from do_md_run which holds the lock. This call is
only present to ensure that ->gendisk is set. However we can be sure
that gendisk is always set (though possibly we couldn't when that code
was originally written. This is because do_md_run is called in three
different contexts:
1/ from md_ioctl. This requires that md_open has succeeded, and it
fails if ->gendisk is not set.
2/ from writing a sysfs attribute. This can only happen if the
mddev has been registered in sysfs which happens in md_alloc
after ->gendisk has been set.
3/ from autorun_array which is only called by autorun_devices, which
checks for ->gendisk to be set before calling autorun_array.
So the call to md_probe in do_md_run can be removed, and the check on
->gendisk can also go.

In the other case flush_scheduled_work is being called in do_md_stop,
purportedly to wait for all md_delayed_delete calls (which delete the
component rdevs) to complete. However there really isn't any need to
wait for them - they have already been disconnected in all important
ways.

The third issue is that raid5->stop() removes some attribute names
while the lock is held. There is already some infrastructure in place
to delay attribute removal until after the lock is released (using
schedule_work). So extend that infrastructure to remove the
raid5_attrs_group.

This does not address all lockdep issues related to the sysfs
"s_active" lock. The rest can be address by splitting that lockdep
context between symlinks and non-symlinks which hopefully will happen.

Signed-off-by: NeilBrown

NeilBrown
2010-02-10 08:26:09 +0800

09 Feb, 2010

1 commit

9eb07c259 md: fix 'degraded' calculation when starting a reshape. ... Browse Code »

This code was written long ago when it was not possible to
reshape a degraded array. Now it is so the current level of
degraded-ness needs to be taken in to account. Also newly addded
devices should only reduce degradedness if they are deemed to be
in-sync.

In particular, if you convert a RAID5 to a RAID6, and increase the
number of devices at the same time, then the 5->6 conversion will
make the array degraded so the current code will produce a wrong
value for 'degraded' - "-1" to be precise.

If the reshape runs to completion end_reshape will calculate a correct
new value for 'degraded', but if a device fails during the reshape an
incorrect decision might be made based on the incorrect value of
"degraded".

This patch is suitable for 2.6.32-stable and if they are still open,
2.6.31-stable and 2.6.30-stable as well.

Cc: stable@kernel.org
Reported-by: Michael Evans
Signed-off-by: NeilBrown

NeilBrown
2010-02-09 13:34:29 +0800

11 Jan, 2010

1 commit

b27d7f16d DM: Fix device mapper topology stacking ... Browse Code »

Make DM use bdev_stack_limits() function so that partition offsets get
taken into account when calculating alignment. Clarify stacking
warnings.

Also remove obsolete clearing of final alignment_offset and misalignment
flag.

Signed-off-by: Martin K. Petersen
Signed-off-by: Mike Snitzer
Cc: Alasdair G. Kergon
Signed-off-by: Jens Axboe

Martin K. Petersen
2010-01-11 21:29:20 +0800

30 Dec, 2009

5 commits

404e4b43f md: allow a resync that is waiting for other resync to complete, to be aborted. ... Browse Code »

If two arrays share a device, then they will not both resync at the
same time. One will wait for the other to complete.
While waiting, the MD_RECOVERY_INTR flag is not checked so a device
failure, which would make the resync pointless, does not cause the
resync to abort, so the failed device cannot be removed (as it cannot
be remove while a resync is happening).

So add a test for MD_RECOVERY_INTR.

Reported-by: Brett Russ
Signed-off-by: NeilBrown

NeilBrown
2009-12-30 12:25:23 +0800
7fb9dadc9 md: remove unnecessary code from do_md_run ... Browse Code »

Since commit dfc7064500061677720fa26352963c772d3ebe6b,
->hot_remove_disks has not removed non-failed devices from
an array until recovery is no longer possible.
So the code in do_md_run to get around the fact that
md_check_recovery (which calls ->hot_remove_disks) would
remove partially-in-sync devices is no longer needed.

So remove it.

Signed-off-by: NeilBrown

NeilBrown
2009-12-30 12:20:43 +0800
a2d79c324 md: make recovery started by do_md_run() visible via sync_action ... Browse Code »

By default md_do_sync() will perform recovery if no other actions are
specified. However, action_show() relies on MD_RECOVERY_RECOVER to be
set otherwise it returns 'idle'. So, add a missing set
MD_RECOVERY_RECOVER when starting recovery.

Signed-off-by: Dan Williams
Signed-off-by: NeilBrown

Dan Williams
2009-12-30 12:20:31 +0800
0f9552b5d md: fix small irregularity with start_ro module parameter ... Browse Code »

The start_ro modules parameter can be used to force arrays to be
started in 'auto-readonly' in which they are read-only until the first
write. This ensures that no resync/recovery happens until something
else writes to the device. This is important for resume-from-disk
off an md array.

However if an array is started 'readonly' (by writing 'readonly' to
the 'array_state' sysfs attribute) we want it to be really 'readonly',
not 'auto-readonly'.

So strengthen the condition to only set auto-readonly if the
array is not already read-only.

Signed-off-by: NeilBrown

NeilBrown
2009-12-30 12:20:12 +0800
cbd199837 md: Fix unfortunate interaction with evms ... Browse Code »

evms configures md arrays by:
open device
send ioctl
close device

for each different ioctl needed.
Since 2.6.29, the device can disappear after the 'close'
unless a significant configuration has happened to the device.
The change made by "SET_ARRAY_INFO" can too minor to stop the device
from disappearing, but important enough that losing the change is bad.

So: make sure SET_ARRAY_INFO sets mddev->ctime, and keep the device
active as long as ctime is non-zero (it gets zeroed with lots of other
things when the array is stopped).

This is suitable for -stable kernels since 2.6.29.

Signed-off-by: NeilBrown
Cc: stable@kernel.org

NeilBrown
2009-12-30 09:08:49 +0800

16 Dec, 2009

3 commits

53365383c Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (80 commits)
dm snapshot: use merge origin if snapshot invalid
dm snapshot: report merge failure in status
dm snapshot: merge consecutive chunks together
dm snapshot: trigger exceptions in remaining snapshots during merge
dm snapshot: delay merging a chunk until writes to it complete
dm snapshot: queue writes to chunks being merged
dm snapshot: add merging
dm snapshot: permit only one merge at once
dm snapshot: support barriers in snapshot merge target
dm snapshot: avoid allocating exceptions in merge
dm snapshot: rework writing to origin
dm snapshot: add merge target
dm exception store: add merge specific methods
dm snapshot: create function for chunk_is_tracked wait
dm snapshot: make bio optional in __origin_write
dm mpath: reject messages when device is suspended
dm: export suspended state to targets
dm: rename dm_suspended to dm_suspended_md
dm: swap target postsuspend call and setting suspended flag
dm crypt: add plain64 iv
...

Linus Torvalds
2009-12-16 01:12:01 +0800
7b75c2f8c drivers/md/md.c: use %pU to print UUIDs ... Browse Code »

Signed-off-by: Joe Perches
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2009-12-16 00:53:33 +0800
e7d2860b6 tree-wide: convert open calls to remove spaces to skip_spaces() lib function ... Browse Code »

Makes use of skip_spaces() defined in lib/string.c for removing leading
spaces from strings all over the tree.

It decreases lib.a code size by 47 bytes and reuses the function tree-wide:
text data bss dec hex filename
64688 584 592 65864 10148 (TOTALS-BEFORE)
64641 584 592 65817 10119 (TOTALS-AFTER)

Also, while at it, if we see (*str && isspace(*str)), we can be sure to
remove the first condition (*str) as the second one (isspace(*str)) also
evaluates to 0 whenever *str == 0, making it redundant. In other words,
"a char equals zero is never a space".

Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below,
and found occurrences of this pattern on 3 more files:
drivers/leds/led-class.c
drivers/leds/ledtrig-timer.c
drivers/video/output.c

@@
expression str;
@@

( // ignore skip_spaces cases
while (*str && isspace(*str)) { \(str++;\|++str;\) }
|
- *str &&
isspace(*str)
)

Signed-off-by: André Goddard Rosa
Cc: Julia Lawall
Cc: Martin Schwidefsky
Cc: Jeff Dike
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Richard Purdie
Cc: Neil Brown
Cc: Kyle McMartin
Cc: Henrique de Moraes Holschuh
Cc: David Howells
Cc:
Cc: Samuel Ortiz
Cc: Patrick McHardy
Cc: Takashi Iwai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

André Goddard Rosa
2009-12-16 00:53:32 +0800

14 Dec, 2009

3 commits

06e3c817b md: add 'recovery_start' per-device sysfs attribute ... Browse Code »

Enable external metadata arrays to manage rebuild checkpointing via a
md/dev-XXX/recovery_start attribute which reflects rdev->recovery_offset

Also update resync_start_store to allow 'none' to be written, for
consistency.

Signed-off-by: Dan Williams
Signed-off-by: NeilBrown

Dan Williams
2009-12-14 09:58:57 +0800
4e59ca7da md: rcu_read_lock() walk of mddev->disks in md_do_sync() ... Browse Code »

Other walks of this list are either under rcu_read_lock() or the list
mutation lock (mddev_lock()). This protects against the improbable case of a
disk being removed from the array at the start of md_do_sync().

Signed-off-by: Dan Williams

Dan Williams
2009-12-14 09:57:43 +0800
93be75ffd md: integrate spares into array at earliest opportunity. ... Browse Code »

As v1.x metadata can record that a member of the array is
not completely recovered, it make sense to record that a
spare has become a regular member of the array at the earliest
opportunity.
So remove the tests on "recovery_offset > 0" in super_1_sync
as they really aren't needed, and schedule a metadata update
immediately after adding spares to a degraded array.

This means that if a crash happens immediately after a recovery
starts, the new device will be included in the array and recovery will
continue from wherever it was up to. Previously this didn't happen
unless recovery was at least 1/16 of the way through.

Signed-off-by: NeilBrown

NeilBrown
2009-12-14 09:51:41 +0800