06 Oct, 2014

2 commits

  • In case of RAID levels 4, 5 and 6 we have to verify each RAID members'
    ability to zero data on discards to avoid stripe data corruption -- if
    discard_zeroes_data is not set for each RAID member discard support must
    be disabled. But given the uncertainty of whether or not a RAID member
    properly supports zeroing data on discard we require the user to
    explicitly allow discard support on RAID levels 4, 5, and 6 by setting
    a dm-raid module paramter, e.g.: dm-raid.devices_handle_discard_safely=Y
    Otherwise, discards could cause data corruption on RAID4/5/6.

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     
  • Discard support is not enabled for RAID levels 4, 5, and 6 at this time
    due to concerns about unreliable discard_zeroes_data support on some
    hardware. Otherwise, discards could cause stripe data corruption
    (classic example of bad apples spoiling the bunch).

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     

26 Jun, 2013

1 commit

  • MD: Remember the last sync operation that was performed

    This patch adds a field to the mddev structure to track the last
    sync operation that was performed. This is especially useful when
    it comes to what is recorded in mismatch_cnt in sysfs. If the
    last operation was "data-check", then it reports the number of
    descrepancies found by the user-initiated check. If it was a
    "repair" operation, then it is reporting the number of
    descrepancies repaired. etc.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

14 Jun, 2013

5 commits

  • The usage of strict_strtoul() is not preferred, because
    strict_strtoul() is obsolete. Thus, kstrtoul() should be
    used.

    Signed-off-by: Jingoo Han
    Signed-off-by: NeilBrown

    Jingoo Han
     
  • This doesn't really need to be initialised, but it doesn't hurt,
    silences the compiler, and as it is a counter it makes sense for it to
    start at zero.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • DM RAID: Fix raid_resume not reviving failed devices in all cases

    When a device fails in a RAID array, it is marked as Faulty. Later,
    md_check_recovery is called which (through the call chain) calls
    'hot_remove_disk' in order to have the personalities remove the device
    from use in the array.

    Sometimes, it is possible for the array to be suspended before the
    personalities get their chance to perform 'hot_remove_disk'. This is
    normally not an issue. If the array is deactivated, then the failed
    device will be noticed when the array is reinstantiated. If the
    array is resumed and the disk is still missing, md_check_recovery will
    be called upon resume and 'hot_remove_disk' will be called at that
    time. However, (for dm-raid) if the device has been restored,
    a resume on the array would cause it to attempt to revive the device
    by calling 'hot_add_disk'. If 'hot_remove_disk' had not been called,
    a situation is then created where the device is thought to concurrently
    be the replacement and the device to be replaced. Thus, the device
    is first sync'ed with the rest of the array (because it is the replacement
    device) and then marked Faulty and removed from the array (because
    it is also the device being replaced).

    The solution is to check and see if the device had properly been removed
    before the array was suspended. This is done by seeing whether the
    device's 'raid_disk' field is -1 - a condition that implies that
    'md_check_recovery -> remove_and_add_spares (where raid_disk is set to -1)
    -> hot_remove_disk' has been called. If 'raid_disk' is not -1, then
    'hot_remove_disk' must be called to complete the removal of the previously
    faulty device before it can be revived via 'hot_add_disk'.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     
  • DM RAID: Break-up untidy function

    Clean-up excessive indentation by moving some code in raid_resume()
    into its own function.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     
  • DM RAID: Add ability to restore transiently failed devices on resume

    This patch adds code to the resume function to check over the devices
    in the RAID array. If any are found to be marked as failed and their
    superblocks can be read, an attempt is made to reintegrate them into
    the array. This allows the user to refresh the array with a simple
    suspend and resume of the array - rather than having to load a
    completely new table, allocate and initialize all the structures and
    throw away the old instantiation.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

24 Apr, 2013

1 commit

  • DM RAID: Add message/status support for changing sync action

    This patch adds a message interface to dm-raid to allow the user to more
    finely control the sync actions being performed by the MD driver. This
    gives the user the ability to initiate "check" and "repair" (i.e. scrubbing).
    Two additional fields have been appended to the status output to provide more
    information about the type of sync action occurring and the results of those
    actions, specifically: and . These new fields
    will always be populated. This is essentially the device-mapper way of doing
    what MD controls through the 'sync_action' sysfs file and shows through the
    'mismatch_cnt' sysfs file.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

06 Mar, 2013

1 commit

  • Pull md updates from NeilBrown:
    "Mostly little bugfixes.

    Only "feature" is a new RAID10 layout which slightly improves the
    number of sets of devices that can concurrently fail, without data
    loss."

    * tag 'md-3.9' of git://neil.brown.name/md:
    md: expedite metadata update when switching read-auto -> active
    md: remove CONFIG_MULTICORE_RAID456
    md/raid1,raid10: fix deadlock with freeze_array()
    md/raid0: improve error message when converting RAID4-with-spares to RAID0
    md: raid0: fix error return from create_stripe_zones.
    md: fix two bugs when attempting to resize RAID0 array.
    DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms
    MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 2)
    MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1)
    MD RAID10: Minor non-functional code changes
    md: raid1,10: Handle REQ_WRITE_SAME flag in write bios
    md: protect against crash upon fsync on ro array

    Linus Torvalds
     

02 Mar, 2013

2 commits

  • Use 'bio' in the name of variables and functions that deal with
    bios rather than 'request' to avoid confusion with the normal
    block layer use of 'request'.

    No functional changes.

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     
  • Avoid returning a truncated table or status string instead of setting
    the DM_BUFFER_FULL_FLAG when the last target of a table fills the
    buffer.

    When processing a table or status request, the function retrieve_status
    calls ti->type->status. If ti->type->status returns non-zero,
    retrieve_status assumes that the buffer overflowed and sets
    DM_BUFFER_FULL_FLAG.

    However, targets don't return non-zero values from their status method
    on overflow. Most targets returns always zero.

    If a buffer overflow happens in a target that is not the last in the
    table, it gets noticed during the next iteration of the loop in
    retrieve_status; but if a buffer overflow happens in the last target, it
    goes unnoticed and erroneously truncated data is returned.

    In the current code, the targets behave in the following way:
    * dm-crypt returns -ENOMEM if there is not enough space to store the
    key, but it returns 0 on all other overflows.
    * dm-thin returns errors from the status method if a disk error happened.
    This is incorrect because retrieve_status doesn't check the error
    code, it assumes that all non-zero values mean buffer overflow.
    * all the other targets always return 0.

    This patch changes the ti->type->status function to return void (because
    most targets don't use the return code). Overflow is detected in
    retrieve_status: if the status method fills up the remaining space
    completely, it is assumed that buffer overflow happened.

    Cc: stable@vger.kernel.org
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

26 Feb, 2013

1 commit

  • DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms

    Until now, dm-raid.c only supported the "near" algorthm of MD's RAID10
    implementation. This patch adds support for the "far" and "offset"
    algorithms, but only with the improved redundancy that is brought with
    the introduction of the 'use_far_sets' bit, which shifts copied stripes
    according to smaller sets vs the entire array. That is, the 17th bit
    of the 'layout' variable that defines the RAID10 implementation will
    always be set. (More information on how the 'layout' variable selects
    the RAID10 algorithm can be found in the opening comments of
    drivers/md/raid10.c.)

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

24 Jan, 2013

1 commit

  • Before attempting to activate a RAID array, it is checked for sufficient
    redundancy. That is, we make sure that there are not too many failed
    devices - or devices specified for rebuild - to undermine our ability to
    activate the array. The current code performs this check twice - once to
    ensure there were not too many devices specified for rebuild by the user
    ('validate_rebuild_devices') and again after possibly experiencing a failure
    to read the superblock ('analyse_superblocks'). Neither of these checks are
    sufficient. The first check is done properly but with insufficient
    information about the possible failure state of the devices to make a good
    determination if the array can be activated. The second check is simply
    done wrong in the case of RAID10 because it doesn't account for the
    independence of the stripes (i.e. mirror sets). The solution is to use the
    properly written check ('validate_rebuild_devices'), but perform the check
    after the superblocks have been read and we know which devices have failed.
    This gives us one check instead of two and performs it in a location where
    it can be done right.

    Only RAID10 was affected and it was affected in the following ways:
    - the code did not properly catch the condition where a user specified
    a device for rebuild that already had a failed device in the same mirror
    set. (This condition would, however, be caught at a deeper level in MD.)
    - the code triggers a false positive and denies activation when devices in
    independent mirror sets have failed - counting the failures as though they
    were all in the same set.

    The most likely place this error was introduced (or this patch should have
    been included) is in commit 4ec1e369 - first introduced in v3.7-rc1.
    Consequently this fix should also go in v3.7.y, however there is a
    small conflict on the .version in raid_target, so I'll submit a
    separate patch to -stable.

    Cc: stable@vger.kernel.org
    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

22 Dec, 2012

2 commits

  • This patch removes map_info from bio-based device mapper targets.
    map_info is still used for request-based targets.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • If the user does not supply a bitmap region_size to the dm raid target,
    a reasonable size is computed automatically. If this is not a power of 2,
    the md code will report an error later.

    This patch catches the problem early and rounds the region_size to the
    next power of two.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan Brassow
     

11 Oct, 2012

4 commits

  • There are two table arguments that can be given to a DM RAID target
    that control whether the array is forced to (re)synchronize or skip
    initialization: "sync" and "nosync". When "sync" is given, we set
    mddev->recovery_cp to 0 in order to cause the device to resynchronize.
    This is insufficient if there is a bitmap in use, because the array
    will simply look at the bitmap and see that there is no recovery
    necessary.

    The fix is to skip over the loading of the superblocks when "sync" is
    given, causing new superblocks to be written that will force the array
    to go through initialization (i.e. synchronization).

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     
  • DM RAID: Fix comparison of index and quantity for "rebuild" parameter

    The "rebuild" parameter takes an index argument that starts counting from
    zero. The conditional used to validate the index was using '>' rather than
    '>=', leaving the door open for an index value that would be 1 too large.

    Reported-by: Neil Brown
    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     
  • DM RAID: Add code to validate replacement slots for RAID10 arrays

    RAID10 can handle 'copies - 1' failures for each mirror group. This code
    ensures the user has provided a valid array - one whose devices specified for
    rebuild do not exceed the amount of redundancy available.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     
  • DM RAID: Move chunk of code to it's own function

    The code that checks whether device replacements/rebuilds are possible given
    a specific RAID type is moved to it's own function. It will further expand
    when the code to check RAID10 is added. A separate function makes it easier
    to read.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

02 Aug, 2012

1 commit

  • Pull md updates from NeilBrown.

    * 'for-next' of git://neil.brown.name/md:
    DM RAID: Add support for MD RAID10
    md/RAID1: Add missing case for attempting to repair known bad blocks.
    md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE.
    md/raid1: don't abort a resync on the first badblock.
    md: remove duplicated test on ->openers when calling do_md_stop()
    raid5: Add R5_ReadNoMerge flag which prevent bio from merging at block layer
    md/raid1: prevent merging too large request
    md/raid1: read balance chooses idlest disk for SSD
    md/raid1: make sequential read detection per disk based
    MD RAID10: Export md_raid10_congested
    MD: Move macros from raid1*.h to raid1*.c
    MD RAID1: rename mirror_info structure
    MD RAID10: rename mirror_info structure
    MD RAID10: Fix compiler warning.
    raid5: add a per-stripe lock
    raid5: remove unnecessary bitmap write optimization
    raid5: lockless access raid5 overrided bi_phys_segments
    raid5: reduce chance release_stripe() taking device_lock

    Linus Torvalds
     

01 Aug, 2012

1 commit


27 Jul, 2012

4 commits

  • Commit outstanding metadata before returning the status for a dm thin
    pool so that the numbers reported are as up-to-date as possible.

    The commit is not performed if the device is suspended or if
    the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
    through a new 'status_flags' parameter in the target's dm_status_fn.

    The userspace dmsetup tool will support the --noflush flag with the
    'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
    onwards.

    Tested-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     
  • In preparation for RAID10 inclusion in dm-raid, we move the sectors_per_dev
    calculation later in the device creation process. This is because we won't
    know up-front how many stripes vs how many mirrors there are which will
    change the calculation.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan E Brassow
     
  • In preparation for RAID10 addition to dm-raid, we change an 'if' conditional
    to a 'switch' conditional to make it easier to see what is being checked for
    each RAID type.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan E Brassow
     
  • Remove the restriction that limits a target's specified maximum incoming
    I/O size to be a power of 2.

    Rename this setting from 'split_io' to the less-ambiguous 'max_io_len'.
    Change it from sector_t to uint32_t, which is plenty big enough, and
    introduce a wrapper function dm_set_target_max_io_len() to set it.
    Use sector_div() to process it now that it is not necessarily a power of 2.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     

22 May, 2012

4 commits

  • When encountering an error while reading the superblock, call md_error.

    We are currently setting the 'Faulty' bit on one of the array devices when an
    error is encountered while reading the superblock of a dm-raid array. We should
    be calling md_error(), as it handles the error more completely.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     
  • Missing dm-raid devices should be recorded in the superblock

    When specifying the devices that compose a DM RAID array, it is possible to denote
    failed or missing devices with '-'s. When this occurs, we must record this in the
    superblock. We do this by checking if the array position's data device is missing
    and then forcing MD to record the superblock by setting 'MD_CHANGE_DEVS' in
    'raid_resume'. If we do not cause the superblock to be rewritten by the resume
    function, it is possible for a stale superblock to be written by an out-going
    in-active table (during 'raid_dtr').

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     
  • Properly initialize MD recovery flags when resuming device-mapper devices.

    When a device-mapper device is suspended, all I/O must stop. This is done by
    calling 'md_stop_writes' and 'mddev_suspend'. These calls in-turn manipulate
    the recovery flags - including setting 'MD_RECOVERY_FROZEN'. The DM device
    may have been suspended while recovery was not yet complete, so the process
    needs to pick-up where it left off. Since 'mddev_resume' does not unset
    'MD_RECOVERY_FROZEN' and set 'MD_RECOVERY_NEEDED', we must do it ourselves.
    'MD_RECOVERY_NEEDED' can safely be set in 'mddev_resume', but 'MD_RECOVERY_FROZEN'
    must be set outside of 'mddev_resume' due to how MD handles RAID reshaping.
    (e.g. It is possible for a user to delay reshaping a RAID5->RAID6 by purposefully
    setting 'MD_RECOVERY_FROZEN'. Clearing it in 'mddev_resume' would override the
    desired behavior.)

    Because 'mddev_resume' already unconditionally calls 'md_wakeup_thread(mddev->thread)'
    there is no need to make this call from 'raid_resume' since it calls 'mddev_resume'.

    Also clean up where level_store calls mddev_resume() - it current
    duplicates some of the funcitons of that call. - NB

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     
  • dm-raid currently open-codes the freeing of some members of
    and rdev. It is more maintainable to have it call common code
    from md.c which does this for all call-sites.

    So remove free_disk_sb to md_rdev_clear, export it, and use it in
    dm-raid.c

    Signed-off-by: NeilBrown

    NeilBrown
     

24 Apr, 2012

1 commit


29 Mar, 2012

1 commit

  • The dm-raid code currently fails to create a RAID array if any of the
    superblocks cannot be read. This was an oversight as there is already
    code to handle this case if the values ('- -') were provided for the
    failed array position.

    With this patch, if a superblock cannot be read, the array position's
    fields are initialized as though '- -' was set in the table. That is,
    the device is failed and the position should not be used, but if there
    is sufficient redundancy, the array should still be activated.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan E Brassow
     

19 Mar, 2012

1 commit

  • md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
    mddev. However it uses the 'safe' version of list_for_each_entry,
    and so requires the extra variable, but doesn't include 'safe' in the
    name, which is useful documentation.

    Consequently some places use this safe version without needing it, and
    many use an explicity list_for_each entry.

    So:
    - rename rdev_for_each to rdev_for_each_safe
    - create a new rdev_for_each which uses the plain
    list_for_each_entry,
    - use the 'safe' version only where needed, and convert all other
    list_for_each_entry calls to use rdev_for_each.

    Signed-off-by: NeilBrown

    NeilBrown
     

08 Mar, 2012

2 commits

  • Fix dm-raid flush support.

    Both md and dm have support for flush, but the dm-raid target
    forgot to set the flag to indicate that flushes should be
    passed on. (Important for data integrity e.g. with writeback cache
    enabled.)

    Signed-off-by: Jonathan Brassow
    Acked-by: Mike Snitzer
    Cc: stable@kernel.org
    Signed-off-by: Alasdair G Kergon

    Jonathan E Brassow
     
  • The 'rebuild' parameter is used to rebuild individual devices in an
    array (e.g. resynchronize a RAID1 device or recalculate a parity device
    in higher RAID). The MD_CHANGE_DEVS flag must be set when this
    parameter is given in order to write out the superblocks and make the
    change take immediate effect. The code that handles new devices in
    super_load already sets MD_CHANGE_DEVS and 'FirstUse'. (The 'FirstUse'
    flag was being set as a special case for rebuilds in
    super_init_validation.)

    Add a condition for rebuilds in super_load to take care of both flags
    without the special case in 'super_init_validation'.

    Signed-off-by: Jonathan Brassow
    Cc: stable@kernel.org
    Signed-off-by: Alasdair G Kergon

    Jonathan E Brassow
     

31 Jan, 2012

1 commit

  • The life cycle of a device-mapper target is:
    1) create
    2) resume
    3) suspend
    *) possibly repeat from 2
    4) destroy

    The dm-raid target is unconditionally calling MD's bitmap_load function upon
    every resume. If steps 2 & 3 above are repeated, bitmap_load is called
    multiple times. It is only written to be called once; otherwise, it allocates
    new memory for the bitmap (without freeing the old) and incrementing the number
    of pages it thinks it has without zeroing first. This ultimately leads to
    access beyond allocated memory and lost memory.

    Simply avoiding the bitmap_load call upon resume is not sufficient. If the
    target was suspended while the initial recovery was only partially complete,
    it needs to be restarted when the target is resumed. This is why
    'md_wakeup_thread' is called before issuing the 'mddev_resume'.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

03 Nov, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/linux-dm:
    dm: raid fix device status indicator when array initializing
    dm log userspace: add log device dependency
    dm log userspace: fix comment hyphens
    dm: add thin provisioning target
    dm: add persistent data library
    dm: add bufio
    dm: export dm get md
    dm table: add immutable feature
    dm table: add always writeable feature
    dm table: add singleton feature
    dm kcopyd: add dm_kcopyd_zero to zero an area
    dm: remove superfluous smp_mb
    dm: use local printk ratelimit
    dm table: propagate non rotational flag

    Linus Torvalds
     

01 Nov, 2011

2 commits