23 Jun, 2016

1 commit

  • In the current Documentation/md.txt, the lower limit value of
    stripe_cache_size is 16 and the default value is 128, but when
    I update kernel to the latest mainline version and RAID5 array
    is created by mdadm, then execute the following commands, it
    shows an error and a difference respectively.

    1) set stripe_cache_size to 16
    [root@localhost ~]# echo 16 > /sys/block/md0/md/stripe_cache_size
    bash: echo: write error: Invalid argument
    2) read the default value of stripe_cache_size
    [root@localhost ~]# cat /sys/block/md0/md/stripe_cache_size
    256

    I read drivers/md/raid5.c and find the following related code:
    1) in function 'raid5_set_cache_size':
    if (size 32768)
    return -EINVAL;
    2) #define NR_STRIPES 256

    So the lower limit value of stripe_cache_size should be 17 and
    the default value should be 256.

    Signed-off-by: Tiezhu Yang
    Signed-off-by: Jonathan Corbet

    Tiezhu Yang
     

23 Jun, 2015

1 commit


02 Dec, 2013

1 commit


03 Jul, 2013

1 commit


24 Apr, 2013

1 commit


23 Dec, 2011

1 commit

  • hot-replace is a feature being added to md which will allow a
    device to be replaced without removing it from the array first.

    With hot-replace a spare can be activated and recovery can start while
    the original device is still in place, thus allowing a transition from
    an unreliable device to a reliable device without leaving the array
    degraded during the transition. It can also be use when the original
    device is still reliable but it not wanted for some reason.

    This will eventually be supported in RAID4/5/6 and RAID10.

    This patch adds a super-block flag to distinguish the replacement
    device. If an old kernel sees this flag it will reject the device.

    It also adds two per-device flags which are viewable and settable via
    sysfs.
    "want_replacement" can be set to request that a device be replaced.
    "replacement" is set to show that this device is replacing another
    device.

    The "rd%d" links in /sys/block/mdXx/md only apply to the original
    device, not the replacement. We currently don't make links for the
    replacement - there doesn't seem to be a need.

    Signed-off-by: NeilBrown

    NeilBrown
     

28 Jul, 2011

2 commits


09 Jun, 2011

1 commit


20 Apr, 2011

1 commit


04 Aug, 2010

1 commit

  • Below you will find an updated version from the original series bunching all patches into one big patch
    updating broken web addresses that are located in Documentation/*
    Some of the addresses date as far far back as 1995 etc... so searching became a bit difficult,
    the best way to deal with these is to use web.archive.org to locate these addresses that are outdated.
    Now there are also some addresses pointing to .spec files some are located, but some(after searching
    on the companies site)where still no where to be found. In this case I just changed the address
    to the company site this way the users can contact the company and they can locate them for the users.

    Signed-off-by: Justin P. Mattock
    Signed-off-by: Thomas Weber
    Signed-off-by: Mike Frysinger
    Cc: Paulo Marques
    Cc: Randy Dunlap
    Cc: Michael Neuling
    Signed-off-by: Jiri Kosina

    Justin P. Mattock
     

23 Apr, 2010

1 commit


14 Dec, 2009

3 commits

  • Enable external metadata arrays to manage rebuild checkpointing via a
    md/dev-XXX/recovery_start attribute which reflects rdev->recovery_offset

    Also update resync_start_store to allow 'none' to be written, for
    consistency.

    Signed-off-by: Dan Williams
    Signed-off-by: NeilBrown

    Dan Williams
     
  • In this case, the metadata needs to not be in the same
    sector as the bitmap.
    md will not read/write any bitmap metadata. Config must be
    done via sysfs and when a recovery makes the array non-degraded
    again, writing 'true' to 'bitmap/can_clear' will allow bits in
    the bitmap to be cleared again.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • A new attribute directory 'bitmap' in 'md' is created which
    contains files for configuring the bitmap.
    'location' identifies where the bitmap is, either 'none',
    or 'file' or 'sector offset from metadata'.
    Writing 'location' can create or remove a bitmap.
    Adding a 'file' bitmap this way is not yet supported.
    'chunksize' and 'time_base' must be set before 'location'
    can be set.

    'chunksize' can be set before creating a bitmap, but is
    currently always over-ridden by the bitmap superblock.

    'time_base' and 'backlog' can be updated at any time.

    Signed-off-by: NeilBrown
    Reviewed-by: Andre Noll

    NeilBrown
     

31 Mar, 2009

1 commit


21 Jul, 2008

1 commit

  • - used strict_strtoull in place of simple_strtoull
    - use my_mddev in place of rdev->mddev (they have the same value)
    and more significantly,
    - don't adjust mddev->size to fit, rather reject changes which make
    rdev->size smaller than mddev->size

    Adjusting mddev->size is a hangover from bind_rdev_to_array which
    does a similar thing. But it really is a better design to insist that
    mddev->size is set as required, then the rdev->sizes are set to allow
    for that. The previous way invites confusion.

    Signed-off-by: NeilBrown

    Neil Brown
     

28 Jun, 2008

4 commits

  • The important state change happens during an interrupt
    in md_error. So just set a flag there and call sysfs_notify
    later in process context.

    Signed-off-by: Neil Brown

    Neil Brown
     
  • When a device fails, when a spare is activated, when
    an array is reshaped, or when an array is started,
    the extent to which the array is degraded can change.

    Signed-off-by: Neil Brown

    Neil Brown
     
  • When the 'resync' thread starts or stops, when we explicitly
    set sync_action, or when we determine that there is definitely nothing
    to do, we notify sync_action.

    To stop "sync_action" from occasionally showing the wrong value,
    we introduce a new flags - MD_RECOVERY_RECOVER - to say that a
    recovery is probably needed or happening, and we make sure
    that we set MD_RECOVERY_RUNNING before clearing MD_RECOVERY_NEEDED.

    Signed-off-by: Neil Brown

    Neil Brown
     
  • Changes in md/array_state could be of interest to a monitoring
    program. So make sure all changes trigger a notification.

    Exceptions:
    changing active_idle to active is not reported because it
    is frequent and not interesting.
    changing active to active_idle is only reported on arrays
    with externally managed metadata, as it is not interesting
    otherwise.

    Signed-off-by: Neil Brown

    Neil Brown
     

28 Apr, 2008

1 commit

  • Improve write performance by preventing the delayed_list from dumping all its
    stripes onto the handle_list in one shot. Delayed stripes are now further
    delayed by being held on the 'hold_list'. The 'hold_list' is bypassed when:

    * a STRIPE_IO_STARTED stripe is found at the head of 'handle_list'
    * 'handle_list' is empty and i/o is being done to satisfy full stripe-width
    write requests
    * 'bypass_count' is less than 'bypass_threshold'. By default the threshold
    is 1, i.e. every other stripe handled is a preread stripe provided the
    top two conditions are false.

    Benchmark data:
    System: 2x Xeon 5150, 4x SATA, mem=1GB
    Baseline: 2.6.24-rc7
    Configuration: mdadm --create /dev/md0 /dev/sd[b-e] -n 4 -l 5 --assume-clean
    Test1: dd if=/dev/zero of=/dev/md0 bs=1024k count=2048
    * patched: +33% (stripe_cache_size = 256), +25% (stripe_cache_size = 512)

    Test2: tiobench --size 2048 --numruns 5 --block 4096 --block 131072 (XFS)
    * patched: +13%
    * patched + preread_bypass_threshold = 0: +37%

    Changes since v1:
    * reduce bypass_threshold from (chunk_size / sectors_per_chunk) to (1) and
    make it configurable. This defaults to fairness and modest performance
    gains out of the box.
    Changes since v2:
    * [neilb@suse.de]: kill STRIPE_PRIO_HI and preread_needed as they are not
    necessary, the important change was clearing STRIPE_DELAYED in
    add_stripe_bio and this has been moved out to make_request for the hang
    fix.
    * [neilb@suse.de]: simplify get_priority_stripe
    * [dan.j.williams@intel.com]: reset the bypass_count when ->hold_list is
    sampled empty (+11%)
    * [dan.j.williams@intel.com]: decrement the bypass_count at the detection
    of stripes being naturally promoted off of hold_list +2%. Note, resetting
    bypass_count instead of decrementing on these events yields +4% but that is
    probably too aggressive.
    Changes since v3:
    * cosmetic fixups

    Tested-by: James W. Laferriere
    Signed-off-by: Dan Williams
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

07 Feb, 2008

1 commit

  • This allows userspace to control resync/reshape progress and synchronise it
    with other activities, such as shared access in a SAN, or backing up critical
    sections during a tricky reshape.

    Writing a number of sectors (which must be a multiple of the chunk size if
    such is meaningful) causes a resync to pause when it gets to that point.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

10 May, 2007

1 commit

  • "reshape_position" records how much progress has been made on a "reshape"
    (adding drives, changing layout or chunksize).

    When it is set, the number of drives, layout and chunksize can have
    two possible values, an old an a new.

    So allow these different values to be visible, and allow both old and new to
    be set: Set the old ones first, then the reshape_position, then the new
    values.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

04 Oct, 2006

4 commits


03 Oct, 2006

2 commits

  • md.txt has two sections describing the 'level' sysfs attribute, and some of
    the text is out-of-date. So make just one section, and make it right.

    Cc: Christian Kujau
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Add a new sysfs interface that allows the bitmap of an array to be dirtied.
    The interface is write-only, and is used as follows:

    echo "1000" > /sys/block/md2/md/bitmap

    (dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk
    bitmaps of array md2)

    echo "1000-2000" > /sys/block/md1/md/bitmap

    (dirty the bits for chunks 1000-2000 in md1's bitmap)

    This is useful, for example, in cluster environments where you may need to
    combine two disjoint bitmaps into one (following a server failure, after a
    secondary server has taken over the array). By combining the bitmaps on
    the two servers, a full resync can be avoided (This was discussed on the
    list back on March 18, 2005, "[PATCH 1/2] md bitmap bug fixes" thread).

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Clements
     

27 Jun, 2006

6 commits

  • It appears in /sys/mdX/md/dev-YYY/state
    and can be set or cleared by writing 'writemostly' or '-writemostly'
    respectively.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • The md/dev-XXX/state file can now be written:

    "faulty" simulates an error on the device
    "remove" removes the device from the array (if it is not busy)

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • This allows the state of an md/array to be directly controlled via sysfs and
    adds the ability to stop and array without tearing it down.

    Array states/settings:

    clear
    No devices, no size, no level
    Equivalent to STOP_ARRAY ioctl
    inactive
    May have some settings, but array is not active
    all IO results in error
    When written, doesn't tear down array, but just stops it
    suspended (not supported yet)
    All IO requests will block. The array can be reconfigured.
    Writing this, if accepted, will block until array is quiescent
    readonly
    no resync can happen. no superblocks get written.
    write requests fail
    read-auto
    like readonly, but behaves like 'clean' on a write request.

    clean - no pending writes, but otherwise active.
    When written to inactive array, starts without resync
    If a write request arrives then
    if metadata is known, mark 'dirty' and switch to 'active'.
    if not known, block and switch to write-pending
    If written to an active array that has pending writes, then fails.
    active
    fully active: IO and resync can be happening.
    When written to inactive array, starts with resync

    write-pending (not supported yet)
    clean, but writes are blocked waiting for 'active' to be written.

    active-idle
    like active, but no writes have been seen for a while (100msec).

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • When a md array has been idle (no writes) for 20msecs it is marked as 'clean'.
    This delay turns out to be too short for some real workloads. So increase it
    to 200msec (the time to update the metadata should be a tiny fraction of that)
    and make it sysfs-configurable.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

07 Jan, 2006

4 commits