03 Oct, 2006

40 commits

  • Once upon a time we needed to fixed limit to the number of md devices,
    probably because we preallocated some array. This need no longer exists, but
    we still have an arbitrary limit.

    So remove MAX_MD_DEVS and allow as many devices as we can fit into the 'minor'
    part of a device number.

    Also remove some useless noise at init time (which reports MAX_MD_DEVS) and
    remove MD_THREAD_NAME_MAX which hasn't been used for a while.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • md.txt has two sections describing the 'level' sysfs attribute, and some of
    the text is out-of-date. So make just one section, and make it right.

    Cc: Christian Kujau
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • It is possible to request a 'check' of an md/raid array where the whole array
    is read and consistancies are reported.

    This uses the same mechanisms as 'resync' and so reports in the kernel logs
    that a resync is being started. This understandably confuses/worries people.

    Also the text in /proc/mdstat suggests a 'resync' is happen when it is just a
    check.

    This patch changes those messages to be more specific about what is happening.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • This is very different from other raid levels and all requests go through a
    'stripe cache', and it has congestion management already.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • raid1, raid10 and multipath don't report their 'congested' status through
    bdi_*_congested, but should.

    This patch adds the appropriate functions which just check the 'congested'
    status of all active members (with appropriate locking).

    raid1 read_balance should be modified to prefer devices where
    bdi_read_congested returns false. Then we could use the '&' branch rather
    than the '|' branch. However that should would need some benchmarking first
    to make sure it is actually a good idea.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Each backing_dev needs to be able to report whether it is congested, either by
    modulating BDI_*_congested in ->state, or by defining a ->congested_fn.
    md/raid did neither of these. This patch add a congested_fn which simply
    checks all component devices to see if they are congested.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • The error handling routines don't use proper locking, and so two concurrent
    errors could trigger a problem.

    So:
    - use test-and-set and test-and-clear to synchonise
    the In_sync bits with the ->degraded count
    - use the spinlock to protect updates to the
    degraded count (could use an atomic_t but that
    would be a bigger change in code, and isn't
    really justified)
    - remove un-necessary locking in raid5

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • It is equivalent to conf->raid_disks - conf->mddev->degraded.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • raid1d has toooo many nested block, so take the fix_read_error functionality
    out into a separate function.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Signed-off-by: Coywolf Qi Hunt
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Coywolf Qi Hunt
     
  • Add a new sysfs interface that allows the bitmap of an array to be dirtied.
    The interface is write-only, and is used as follows:

    echo "1000" > /sys/block/md2/md/bitmap

    (dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk
    bitmaps of array md2)

    echo "1000-2000" > /sys/block/md1/md/bitmap

    (dirty the bits for chunks 1000-2000 in md1's bitmap)

    This is useful, for example, in cluster environments where you may need to
    combine two disjoint bitmaps into one (following a server failure, after a
    secondary server has taken over the array). By combining the bitmaps on
    the two servers, a full resync can be avoided (This was discussed on the
    list back on March 18, 2005, "[PATCH 1/2] md bitmap bug fixes" thread).

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Clements
     
  • It isn't needed as mddev->degraded contains equivalent info.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • They are not needed. conf->failed_disks is the same as mddev->degraded and
    conf->working_disks is conf->raid_disks - mddev->degraded.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Instead of magic numbers (0,1,2,3) in sb_dirty, we have
    some flags instead:
    MD_CHANGE_DEVS
    Some device state has changed requiring superblock update
    on all devices.
    MD_CHANGE_CLEAN
    The array has transitions from 'clean' to 'dirty' or back,
    requiring a superblock update on active devices, but possibly
    not on spares
    MD_CHANGE_PENDING
    A superblock update is underway.

    We wait for an update to complete by waiting for all flags to be clear. A
    flag can be set at any time, even during an update, without risk that the
    change will be lost.

    Stop exporting md_update_sb - isn't needed.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • raid10d has toooo many nested block, so take the fix_read_error functionality
    out into a separate function.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • This patch contains the scheduled removal of the START_ARRAY ioctl for md.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This patch adds support for a per-target dm_flush_fn method. This is needed
    to allow dm-loop to invalidate page cache mappings in response to BLKFLSBUF
    ioctl commands.

    Signed-off-by: Bryn Reeves
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bryn Reeves
     
  • Separate the setting of device I/O limits from dm_get_device(). dm-loop will
    use this.

    Signed-off-by: Bryn Reeves
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bryn Reeves
     
  • I found a problem within device-mapper that occurs in low-mem situations. It
    was found using a mirror target but I think in theory it would hit any setup
    that stacks device-mapper devices (like LVM on top of multipath).

    Since device-mapper core uses the common fs_bioset in clone_bio(), and a
    private, but still global, bio_set in split_bvec() it is possible that the
    filesystem and the first level target successfully get bios but the lower
    level target doesn't because there is no more memory and the pool was drained
    by upper layers. So the remapping will be stuck forever. To solve this
    device-mapper core needs to use a private bio_set for each device.

    Signed-off-by: Stefan Bader
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stefan Bader
     
  • In the low memory situation dm-crypt needs to use a private mempool of bios to
    avoid blocking.

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milan Broz
     
  • This patch is designed to help dm-crypt comply with the
    new constraints imposed by the following patch in -mm:
    md-dm-reduce-stack-usage-with-stacked-block-devices.patch

    Under low memory the existing implementation relies upon waiting for I/O
    submitted recursively to generic_make_request() completing before the original
    generic_make_request() call can return.

    This patch moves the I/O submission to a workqueue so the original
    generic_make_request() can return immediately.

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milan Broz
     
  • Restructure the dm-crypt write processing in preparation for workqueue changes
    in the next patches.

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milan Broz
     
  • Restructure part of the dm-crypt code in preparation for workqueue changes.

    Use 'base_bio' or 'clone' variable names consistently throughout. No
    functional changes are included in this patch.

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milan Broz
     
  • Add the facility to wipe the encryption key from memory (for example while a
    laptop is suspended) and reinstate it later (when the laptop gets resumed).

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milan Broz
     
  • This patch adds a target preresume hook.

    It is called before the targets are resumed and if it returns an error the
    resume gets cancelled.

    The crypt target will use this to indicate that it is unable to process I/O
    because no encryption key has been supplied.

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Milan Broz
     
  • Add CONFIG_DM_DEBUG and DMDEBUG() macro.

    Signed-off-by: Bryn Reeves
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bryn Reeves
     
  • Device-mapper devices are not accessible until a 'resume' ioctl has been
    issued. For userspace to find out when this happens we need to generate an
    uevent for udev to take appropriate action.

    As discussed at OLS we should send 'change' events for 'resume'. We can think
    of no useful purpose served by also having 'suspend' events.

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Kay Sievers
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hannes Reinecke
     
  • Use kzalloc() instead of kmalloc() + memset().

    Signed-off-by: Micha³ Miros³aw
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Micha³ Miros³aw
     
  • After initialising m->ti, there's no need to pass it in subsequent calls to
    static functions used for parsing parameters.

    Signed-off-by: Micha³ Miros³aw
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Micha³ Miros³aw
     
  • Remove trailing space from 'dmsetup table' output.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonathan Brassow
     
  • If a snapshot became invalid while there are outstanding pending_exceptions,
    when pending_complete() processes each one it forgets to remove the
    corresponding exception from its exception table before freeing it.

    Fix this by moving the 'out:' label up one statement so that
    remove_exception() is always called. Then __invalidate_exception() no longer
    needs to call it and its 'pe' argument become superfluous.

    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alasdair G Kergon
     
  • Rename sibling_count to ref_count and introduce get and put functions.

    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alasdair G Kergon
     
  • Add a workqueue so that I/O can be queued up to be flushed from a separate
    thread (e.g. if local interrupts are disabled).

    A new per-snapshot spinlock pe_lock is introduced to protect queued_bios.

    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Mark McLoughlin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alasdair G Kergon
     
  • This patch rearranges the pending_complete() code so that the functional
    changes in subsequent patches are clearer.

    By consolidating the error and the non-error paths, we can move
    error_snapshot_bios() and __flush_bios() in line.

    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Mark McLoughlin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alasdair G Kergon
     
  • This patch rearranges the snapshot_map code so that the functional changes in
    subsequent patches are clearer.

    The only functional change is to replace the existing read lock with a write
    lock which the next patch needs.

    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alasdair G Kergon
     
  • When suspending a device-mapper device, dm_suspend() sleeps until all
    necessary I/O is completed. This state is triggered by a callback from
    persistent_commit(). But some I/O can still be issued *after* the callback
    (to prepare the next metadata area for use if the current one is full). This
    patch delays the callback until after that I/O is complete.

    Signed-off-by: Mark McLoughlin
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark McLoughlin
     
  • read_exception() and write_exception() only return an error if supplied with
    an out-of-range index. If this ever happens it's the result of a bug in the
    calling code so we handle this with an assertion and remove the error handling
    in the callers.

    Signed-off-by: Mark McLoughlin
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark McLoughlin
     
  • Fix the error handling when store.read_metadata is called: the error should be
    returned immediately.

    Signed-off-by: Mark McLoughlin
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark McLoughlin
     
  • The chunk size of snapshots cannot be changed so it is redundant to require it
    as a parameter when activating an existing snapshot. Allow a value of zero in
    this case and ignore it. For a new snapshot, use a default value if zero is
    specified.

    Signed-off-by: Mark McLoughlin
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark McLoughlin