02 May, 2006

5 commits


20 Apr, 2006

1 commit

  • - fix mddev_lock() usage bugs in md_attr_show() and md_attr_store().
    [they did not anticipate the possibility of getting a signal]

    - remove mddev_lock_uninterruptible() [unused]

    Signed-off-by: Ingo Molnar
    Acked-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

15 Apr, 2006

1 commit

  • It works like this:
    Open the file
    Read all the contents.
    Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
    When poll returns,
    close the file and go to top of loop.
    or lseek to start of file and go back to the 'read'.

    Events are signaled by an object manager calling
    sysfs_notify(kobj, dir, attr);

    If the dir is non-NULL, it is used to find a subdirectory which
    contains the attribute (presumably created by sysfs_create_group).

    This has a cost of one int per attribute, one wait_queuehead per kobject,
    one int per open file.

    The name "sysfs_notify" may be confused with the inotify
    functionality. Maybe it would be nice to support inotify for sysfs
    attributes as well?

    This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
    to be pollable

    Signed-off-by: Neil Brown
    Signed-off-by: Greg Kroah-Hartman

    NeilBrown
     

11 Apr, 2006

1 commit

  • reshape_position is a 64bit field that was not 64bit aligned. So swap with
    new_level.

    NOTE: this is a user-visible change. However:
    - The bad code has not appeared in a released kernel
    - This code is still marked 'experimental'
    - This only affects version-1 superblock, which are not in wide use
    - These field are only used (rather than simply reported) by user-space
    tools in extemely rare circumstances : after a reshape crashes in the
    first second of the reshape process.

    So I believe that, at this stage, the change is safe. Especially if people
    heed the 'help' message on use mdadm-2.4.1.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

02 Apr, 2006

3 commits


01 Apr, 2006

5 commits


28 Mar, 2006

24 commits

  • ... being careful that mutex_trylock is inverted wrt down_trylock

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • When retrying a write due to barrier failure, we don't reset 'remaining', so
    it goes negative and never hits 0 again.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • An md array can be asked to change the amount of each device that it is using,
    and in particular can be asked to use the maximum available space. This
    currently only works if the first device is not larger than the rest. As
    'size' gets changed and so 'fit' becomes wrong. So check if a 'fit' is
    required early and don't corrupt it.

    Signed-off-by: Doug Ledford
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • raid5 overloads bi_phys_segments to count the number of blocks that the
    request was broken in to so that it knows when the bio is completely handled.

    Accessing this must always be done under a spinlock. In one case we also call
    bi_end_io under that spinlock, which probably isn't ideal as bi_end_io could
    be expensive (even though it isn't allowed to sleep).

    So we reducde the range of the spinlock to just accessing bi_phys_segments.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • wait_event_lock_irq puts a ';' after its usage of the 4th arg, so we don't
    need to.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • This allows user-space to access data safely. This is needed for raid5
    reshape as user-space needs to take a backup of the first few stripes before
    allowing reshape to commence.

    It will also be useful in cluster-aware raid1 configurations so that all
    cluster members can leave a section of the array untouched while a
    resync/recovery happens.

    A 'start' and 'end' of the suspended range are written to 2 sysfs attributes.
    Note that only one range can be suspended at a time.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • This allows reshape to be triggerred via sysfs (which is the only way to start
    it happening).

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • check_reshape checks validity and does things that can be done instantly -
    like adding devices to raid1. start_reshape initiates a restriping process to
    convert the whole array.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Instead of checkpointing at each stripe, only checkpoint when a new write
    would overwrite uncheckpointed data. Block any write to the uncheckpointed
    area. Arbitrarily checkpoint at least every 3Meg.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • We allow the superblock to record an 'old' and a 'new' geometry, and a
    position where any conversion is up to. The geometry allows for changing
    chunksize, layout and level as well as number of devices.

    When using verion-0.90 superblock, we convert the version to 0.91 while the
    conversion is happening so that an old kernel will refuse the assemble the
    array. For version-1, we use a feature bit for the same effect.

    When starting an array we check for an incomplete reshape and restart the
    reshape process if needed. If the reshape stopped at an awkward time (like
    when updating the first stripe) we refuse to assemble the array, and let
    user-space worry about it.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • This patch adds raid5_reshape and end_reshape which will start and finish the
    reshape processes.

    raid5_reshape is only enabled in CONFIG_MD_RAID5_RESHAPE is set, to discourage
    accidental use.

    Read the 'help' for the CONFIG_MD_RAID5_RESHAPE entry.

    and Make sure that you have backups, just in case.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • This patch provides the core of the resize/expand process.

    sync_request notices if a 'reshape' is happening and acts accordingly.

    It allocated new stripe_heads for the next chunk-wide-stripe in the target
    geometry, marking them STRIPE_EXPANDING.

    Then it finds which stripe heads in the old geometry can provide data needed
    by these and marks them STRIPE_EXPAND_SOURCE. This causes stripe_handle to
    read all blocks on those stripes.

    Once all blocks on a STRIPE_EXPAND_SOURCE stripe_head are read, any that are
    needed are copied into the corresponding STRIPE_EXPANDING stripe_head. Once a
    STRIPE_EXPANDING stripe_head is full, it is marks STRIPE_EXPAND_READY and then
    is written out and released.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • We need to allow that different stripes are of different effective sizes, and
    use the appropriate size. Also, when a stripe is being expanded, we must
    block any IO attempts until the stripe is stable again.

    Key elements in this change are:
    - each stripe_head gets a 'disk' field which is part of the key,
    thus there can sometimes be two stripe heads of the same area of
    the array, but covering different numbers of devices. One of these
    will be marked STRIPE_EXPANDING and so won't accept new requests.
    - conf->expand_progress tracks how the expansion is progressing and
    is used to determine whether the target part of the array has been
    expanded yet or not.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Before a RAID-5 can be expanded, we need to be able to expand the stripe-cache
    data structure.

    This requires allocating new stripes in a new kmem_cache. If this succeeds,
    we copy cache pages over and release the old stripes and kmem_cache.

    We then allocate new pages. If that fails, we leave the stripe cache at it's
    new size. It isn't worth the effort to shrink it back again.

    Unfortuanately this means we need two kmem_cache names as we, for a short
    period of time, we have two kmem_caches. So they are raid5/%s and
    raid5/%s-alt

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • The remainder of this batch implements raid5 reshaping. Currently the only
    shape change that is supported is added a device, but it is envisioned that
    changing the chunksize and layout will also be supported, as well as changing
    the level (e.g. 1->5, 5->6).

    The reshape process naturally has to move all of the data in the array, and so
    should be used with caution. It is believed to work, and some testing does
    support this, but wider testing would be great for increasing my confidence.

    You will need a version of mdadm newer than 2.3.1 to make use of raid5 growth.
    This is because mdadm need to take a copy of a 'critical section' at the
    start of the array incase there is a crash at an awkward moment. On restart,
    mdadm will restore the critical section and allow reshape to continue.

    I hope to release a 2.4-pre by early next week - it still needs a little more
    polishing.

    This patch:

    Previously the array of disk information was included in the raid5 'conf'
    structure which was allocated to an appropriate size. This makes it awkward
    to change the size of that array. So we split it off into a separate
    kmalloced array which will require a little extra indexing, but is much easier
    to grow.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • status_resync - used by /proc/mdstat to report the status of a resync, assumes
    that device sizes will always fit into an 'unsigned long' This is no longer
    the case...

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • We are counting failed devices twice, once of the device that is failed, and
    once for the hole that has been left in the array. Remove the former so
    'failed' matches 'missing'. Storing these counts in the superblock is a bit
    silly anyway....

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • I really should make this a function of the personality....

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • This flag should be set for a virtual device iff it is set for all underlying
    devices.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Signed-off-by: Kevin Corry
    Cc: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kevin Corry
     
  • Use bd_claim_by_disk.

    Following symlinks are created if dm-0 maps to sda:
    /sys/block/dm-0/slaves/sda --> /sys/block/sda
    /sys/block/sda/holders/dm-0 --> /sys/block/dm-0

    Signed-off-by: Jun'ichi Nomura
    Cc: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jun'ichi Nomura
     
  • Use bd_claim_by_disk.

    Following symlinks are created if md0 is built from sda and sdb
    /sys/block/md0/slaves/sda --> /sys/block/sda
    /sys/block/md0/slaves/sdb --> /sys/block/sdb
    /sys/block/sda/holders/md0 --> /sys/block/md0
    /sys/block/sdb/holders/md0 --> /sys/block/md0

    Signed-off-by: Jun'ichi Nomura
    Cc: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jun'ichi Nomura