23 Dec, 2011

1 commit


11 Oct, 2011

7 commits


28 Jul, 2011

3 commits

  • When we get a write error (in the data area, not in metadata),
    update the badblock log rather than failing the whole device.

    As the write may well be many blocks, we trying writing each
    block individually and only log the ones which fail.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • If we succeed in writing to a block that was recorded as
    being bad, we clear the bad-block record.

    This requires some delayed handling as the bad-block-list update has
    to happen in process-context.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • This patch just covers the basic read path:
    1/ read_balance needs to check for badblocks, and return not only
    the chosen slot, but also how many good blocks are available
    there.
    2/ read submission must be ready to issue multiple reads to
    different devices as different bad blocks on different devices
    could mean that a single large read cannot be served by any one
    device, but can still be served by the array.
    This requires keeping count of the number of outstanding requests
    per bio. This count is stored in 'bi_phys_segments'

    On read error we currently just fail the request if another target
    cannot handle the whole request. Next patch refines that a bit.

    Signed-off-by: NeilBrown

    NeilBrown
     

27 Jul, 2011

1 commit

  • When we get a read error during recovery, RAID10 previously
    arranged for the recovering device to appear to fail so that
    the recovery stops and doesn't restart. This is misleading and wrong.

    Instead, make use of the new recovery_disabled handling and mark
    the target device and having recovery disabled.

    Add appropriate checks in add_disk and remove_disk so that devices
    are removed and not re-added when recovery is disabled.

    Signed-off-by: NeilBrown

    NeilBrown
     

31 Mar, 2011

1 commit


24 Jun, 2010

1 commit

  • Most array level changes leave the list of devices largely unchanged,
    possibly causing one at the end to become redundant.
    However conversions between RAID0 and RAID10 need to renumber
    all devices (except 0).

    This renumbering is currently being done in the ->run method when the
    new personality takes over. However this is too late as the common
    code in md.c might already have invalidated some of the devices if
    they had a ->raid_disk number that appeared to high.

    Moving it into the ->takeover method is too early as the array is
    still active at that time and wrong ->raid_disk numbers could cause
    confusion.

    So add a ->new_raid_disk field to mdk_rdev_s and use it to communicate
    the new raid_disk number.
    Now the common code knows exactly which devices need to be renumbered,
    and which can be invalidated, and can do it all at a convenient time
    when the array is suspend.
    It can also update some symlinks in sysfs which previously were not be
    updated correctly.

    Reported-by: Maciej Trela
    Signed-off-by: NeilBrown

    NeilBrown
     

18 May, 2010

1 commit


16 Jun, 2009

1 commit

  • Having a macro just to cast a void* isn't really helpful.
    I would must rather see that we are simply de-referencing ->private,
    than have to know what the macro does.

    So open code the macro everywhere and remove the pointless cast.

    Signed-off-by: NeilBrown

    NeilBrown
     

31 Mar, 2009

2 commits