23 Dec, 2011

1 commit

  • In RAID1, a replacement is much like a normal device, so we just
    double the size of the relevant arrays and look at all possible
    devices for reads and writes.

    This means that the array looks like it is now double the size in some
    way - we need to be careful about that.
    In particular, we checking if the array is still degraded while
    creating a recovery request we need to only consider the first 'half'
    - i.e. the real (non-replacement) devices.

    Signed-off-by: NeilBrown

    NeilBrown
     

11 Oct, 2011

7 commits


07 Oct, 2011

1 commit


28 Jul, 2011

4 commits

  • When we get a write error (in the data area, not in metadata),
    update the badblock log rather than failing the whole device.

    As the write may well be many blocks, we trying writing each
    block individually and only log the ones which fail.

    Signed-off-by: NeilBrown
    Reviewed-by: Namhyung Kim

    NeilBrown
     
  • When performing write-behind we allocate pages to store the data
    during write.
    Previously we just keep a list of pages. Now we keep a list of
    bi_vec which includes offset and size.
    This means that the r1bio has complete information to create a new
    bio which will be needed for retrying after write errors.

    Signed-off-by: NeilBrown
    Reviewed-by: Namhyung Kim

    NeilBrown
     
  • If we succeed in writing to a block that was recorded as
    being bad, we clear the bad-block record.

    This requires some delayed handling as the bad-block-list update has
    to happen in process-context.

    Signed-off-by: NeilBrown
    Reviewed-by: Namhyung Kim

    NeilBrown
     
  • Now that we have a bad block list, we should not read from those
    blocks.
    There are several main parts to this:
    1/ read_balance needs to check for bad blocks, and return not only
    the chosen device, but also how many good blocks are available
    there.
    2/ fix_read_error needs to avoid trying to read from bad blocks.
    3/ read submission must be ready to issue multiple reads to
    different devices as different bad blocks on different devices
    could mean that a single large read cannot be served by any one
    device, but can still be served by the array.
    This requires keeping count of the number of outstanding requests
    per bio. This count is stored in 'bi_phys_segments'
    4/ retrying a read needs to also be ready to submit a smaller read
    and queue another request for the rest.

    This does not yet handle bad blocks when reading to perform resync,
    recovery, or check.

    'md_trim_bio' will also be used for RAID10, so put it in md.c and
    export it.

    Signed-off-by: NeilBrown

    NeilBrown
     

27 Jul, 2011

1 commit

  • If we hit a read error while recovering a mirror, we want to abort the
    recovery without necessarily failing the disk - as having a disk this
    a read error is better than not having an array at all.

    Currently this is managed with a per-array flag "recovery_disabled"
    and is only implemented for RAID1. For RAID10 we will need finer
    grained control as we might want to disable recovery for individual
    devices separately.

    So push more of the decision making into the personality.
    'recovery_disabled' is now a 'cookie' which is copied when the
    personality want to disable recovery and is changed when a device is
    added to the array as this is used as a trigger to 'try recovery
    again'.

    This will allow RAID10 to get the control that it needs.

    Signed-off-by: NeilBrown

    NeilBrown
     

08 Jun, 2011

1 commit


11 May, 2011

1 commit

  • The current handling and freeing of these pages is a bit fragile.
    We only keep the list of allocated pages in each bio, so we need to
    still have a valid bio when freeing the pages, which is a bit clumsy.

    So simply store the allocated page list in the r1_bio so it can easily
    be found and freed when we are finished with the r1_bio.

    Signed-off-by: NeilBrown

    NeilBrown
     

29 Oct, 2010

1 commit


10 Sep, 2010

1 commit

  • This patch converts md to support REQ_FLUSH/FUA instead of now
    deprecated REQ_HARDBARRIER. In the core part (md.c), the following
    changes are notable.

    * Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA don't interfere with
    processing of other requests and thus there is no reason to mark the
    queue congested while FLUSH/FUA is in progress.

    * REQ_FLUSH/FUA failures are final and its users don't need retry
    logic. Retry logic is removed.

    * Preflush needs to be issued to all member devices but FUA writes can
    be handled the same way as other writes - their processing can be
    deferred to request_queue of member devices. md_barrier_request()
    is renamed to md_flush_request() and simplified accordingly.

    For linear, raid0 and multipath, the core changes are enough. raid1,
    5 and 10 need the following conversions.

    * raid1: Handling of FLUSH/FUA bio's can simply be deferred to
    request_queues of member devices. Barrier related logic removed.

    * raid5: Queue draining logic dropped. FUA bit is propagated through
    biodrain and stripe resconstruction such that all the updated parts
    of the stripe are written out with FUA writes if any of the dirtying
    writes was FUA. preread_active_stripes handling in make_request()
    is updated as suggested by Neil Brown.

    * raid10: FUA bit needs to be propagated to write clones.

    linear, raid0, 1, 5 and 10 tested.

    Signed-off-by: Tejun Heo
    Reviewed-by: Neil Brown
    Signed-off-by: Jens Axboe

    Tejun Heo
     

14 Dec, 2009

1 commit


16 Jun, 2009

1 commit

  • Having a macro just to cast a void* isn't really helpful.
    I would must rather see that we are simply de-referencing ->private,
    than have to know what the macro does.

    So open code the macro everywhere and remove the pointless cast.

    Signed-off-by: NeilBrown

    NeilBrown
     

31 Mar, 2009

2 commits