12 Jul, 2013

1 commit

  • Pull device-mapper changes from Alasdair G Kergon:
    "Add a device-mapper target called dm-switch to provide a multipath
    framework for storage arrays that dynamically reconfigure their
    preferred paths for different device regions.

    Fix a bug in the verity target that prevented its use with some
    specific sizes of devices.

    Improve some locking mechanisms in the device-mapper core and bufio.

    Add Mike Snitzer as a device-mapper maintainer.

    A few more clean-ups and fixes"

    * tag 'dm-3.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
    dm: add switch target
    dm: update maintainers
    dm: optimize reorder structure
    dm: optimize use SRCU and RCU
    dm bufio: submit writes outside lock
    dm cache: fix arm link errors with inline
    dm verity: use __ffs and __fls
    dm flakey: correct ctr alloc failure mesg
    dm verity: remove pointless comparison
    dm: use __GFP_HIGHMEM in __vmalloc
    dm verity: fix inability to use a few specific devices sizes
    dm ioctl: set noio flag to avoid __vmalloc deadlock
    dm mpath: fix ioctl deadlock when no paths

    Linus Torvalds
     

11 Jul, 2013

1 commit

  • dm-switch is a new target that maps IO to underlying block devices
    efficiently when there is a large number of fixed-sized address regions
    but there is no simple pattern to allow for a compact mapping
    representation such as dm-stripe.

    Though we have developed this target for a specific storage device, Dell
    EqualLogic, we have made an effort to keep it as general purpose as
    possible in the hope that others may benefit.

    Originally developed by Jim Ramsay. Simplified by Mikulas Patocka.

    Signed-off-by: Jim Ramsay
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Jim Ramsay
     

05 Jul, 2013

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "The usual stuff from trivial tree"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
    treewide: relase -> release
    Documentation/cgroups/memory.txt: fix stat file documentation
    sysctl/net.txt: delete reference to obsolete 2.4.x kernel
    spinlock_api_smp.h: fix preprocessor comments
    treewide: Fix typo in printk
    doc: device tree: clarify stuff in usage-model.txt.
    open firmware: "/aliasas" -> "/aliases"
    md: bcache: Fixed a typo with the word 'arithmetic'
    irq/generic-chip: fix a few kernel-doc entries
    frv: Convert use of typedef ctl_table to struct ctl_table
    sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
    doc: clk: Fix incorrect wording
    Documentation/arm/IXP4xx fix a typo
    Documentation/networking/ieee802154 fix a typo
    Documentation/DocBook/media/v4l fix a typo
    Documentation/video4linux/si476x.txt fix a typo
    Documentation/virtual/kvm/api.txt fix a typo
    Documentation/early-userspace/README fix a typo
    Documentation/video4linux/soc-camera.txt fix a typo
    lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
    ...

    Linus Torvalds
     

26 Jun, 2013

1 commit

  • MD: Remember the last sync operation that was performed

    This patch adds a field to the mddev structure to track the last
    sync operation that was performed. This is especially useful when
    it comes to what is recorded in mismatch_cnt in sysfs. If the
    last operation was "data-check", then it reports the number of
    descrepancies found by the user-initiated check. If it was a
    "repair" operation, then it is reporting the number of
    descrepancies repaired. etc.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

14 Jun, 2013

1 commit

  • DM RAID: Add ability to restore transiently failed devices on resume

    This patch adds code to the resume function to check over the devices
    in the RAID array. If any are found to be marked as failed and their
    superblocks can be read, an attempt is made to reintegrate them into
    the array. This allows the user to refresh the array with a simple
    suspend and resume of the array - rather than having to load a
    completely new table, allocate and initialize all the structures and
    throw away the old instantiation.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

28 May, 2013

1 commit


24 Apr, 2013

1 commit

  • DM RAID: Add message/status support for changing sync action

    This patch adds a message interface to dm-raid to allow the user to more
    finely control the sync actions being performed by the MD driver. This
    gives the user the ability to initiate "check" and "repair" (i.e. scrubbing).
    Two additional fields have been appended to the status output to provide more
    information about the type of sync action occurring and the results of those
    actions, specifically: and . These new fields
    will always be populated. This is essentially the device-mapper way of doing
    what MD controls through the 'sync_action' sysfs file and shows through the
    'mismatch_cnt' sysfs file.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

06 Mar, 2013

1 commit

  • Pull md updates from NeilBrown:
    "Mostly little bugfixes.

    Only "feature" is a new RAID10 layout which slightly improves the
    number of sets of devices that can concurrently fail, without data
    loss."

    * tag 'md-3.9' of git://neil.brown.name/md:
    md: expedite metadata update when switching read-auto -> active
    md: remove CONFIG_MULTICORE_RAID456
    md/raid1,raid10: fix deadlock with freeze_array()
    md/raid0: improve error message when converting RAID4-with-spares to RAID0
    md: raid0: fix error return from create_stripe_zones.
    md: fix two bugs when attempting to resize RAID0 array.
    DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms
    MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 2)
    MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1)
    MD RAID10: Minor non-functional code changes
    md: raid1,10: Handle REQ_WRITE_SAME flag in write bios
    md: protect against crash upon fsync on ro array

    Linus Torvalds
     

02 Mar, 2013

3 commits

  • A simple cache policy that writes back all data to the origin.

    This is used to decommission a dm cache by emptying it.

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Joe Thornber
    Signed-off-by: Alasdair G Kergon

    Heinz Mauelshagen
     
  • A cache policy that uses a multiqueue ordered by recent hit
    count to select which blocks should be promoted and demoted.
    This is meant to be a general purpose policy. It prioritises
    reads over writes.

    Signed-off-by: Joe Thornber
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • Add a target that allows a fast device such as an SSD to be used as a
    cache for a slower device such as a disk.

    A plug-in architecture was chosen so that the decisions about which data
    to migrate and when are delegated to interchangeable tunable policy
    modules. The first general purpose module we have developed, called
    "mq" (multiqueue), follows in the next patch. Other modules are
    under development.

    Signed-off-by: Joe Thornber
    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     

26 Feb, 2013

1 commit

  • DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms

    Until now, dm-raid.c only supported the "near" algorthm of MD's RAID10
    implementation. This patch adds support for the "far" and "offset"
    algorithms, but only with the improved redundancy that is brought with
    the introduction of the 'use_far_sets' bit, which shifts copied stripes
    according to smaller sets vs the entire array. That is, the 17th bit
    of the 'layout' variable that defines the RAID10 implementation will
    always be set. (More information on how the 'layout' variable selects
    the RAID10 algorithm can be found in the opening comments of
    drivers/md/raid10.c.)

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

24 Jan, 2013

1 commit

  • Before attempting to activate a RAID array, it is checked for sufficient
    redundancy. That is, we make sure that there are not too many failed
    devices - or devices specified for rebuild - to undermine our ability to
    activate the array. The current code performs this check twice - once to
    ensure there were not too many devices specified for rebuild by the user
    ('validate_rebuild_devices') and again after possibly experiencing a failure
    to read the superblock ('analyse_superblocks'). Neither of these checks are
    sufficient. The first check is done properly but with insufficient
    information about the possible failure state of the devices to make a good
    determination if the array can be activated. The second check is simply
    done wrong in the case of RAID10 because it doesn't account for the
    independence of the stripes (i.e. mirror sets). The solution is to use the
    properly written check ('validate_rebuild_devices'), but perform the check
    after the superblocks have been read and we know which devices have failed.
    This gives us one check instead of two and performs it in a location where
    it can be done right.

    Only RAID10 was affected and it was affected in the following ways:
    - the code did not properly catch the condition where a user specified
    a device for rebuild that already had a failed device in the same mirror
    set. (This condition would, however, be caught at a deeper level in MD.)
    - the code triggers a false positive and denies activation when devices in
    independent mirror sets have failed - counting the failures as though they
    were all in the same set.

    The most likely place this error was introduced (or this patch should have
    been included) is in commit 4ec1e369 - first introduced in v3.7-rc1.
    Consequently this fix should also go in v3.7.y, however there is a
    small conflict on the .version in raid_target, so I'll submit a
    separate patch to -stable.

    Cc: stable@vger.kernel.org
    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

11 Oct, 2012

1 commit

  • DM RAID: Add code to validate replacement slots for RAID10 arrays

    RAID10 can handle 'copies - 1' failures for each mirror group. This code
    ensures the user has provided a valid array - one whose devices specified for
    rebuild do not exceed the amount of redundancy available.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: NeilBrown

    Jonathan Brassow
     

02 Aug, 2012

1 commit

  • Pull md updates from NeilBrown.

    * 'for-next' of git://neil.brown.name/md:
    DM RAID: Add support for MD RAID10
    md/RAID1: Add missing case for attempting to repair known bad blocks.
    md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE.
    md/raid1: don't abort a resync on the first badblock.
    md: remove duplicated test on ->openers when calling do_md_stop()
    raid5: Add R5_ReadNoMerge flag which prevent bio from merging at block layer
    md/raid1: prevent merging too large request
    md/raid1: read balance chooses idlest disk for SSD
    md/raid1: make sequential read detection per disk based
    MD RAID10: Export md_raid10_congested
    MD: Move macros from raid1*.h to raid1*.c
    MD RAID1: rename mirror_info structure
    MD RAID10: rename mirror_info structure
    MD RAID10: Fix compiler warning.
    raid5: add a per-stripe lock
    raid5: remove unnecessary bitmap write optimization
    raid5: lockless access raid5 overrided bi_phys_segments
    raid5: reduce chance release_stripe() taking device_lock

    Linus Torvalds
     

01 Aug, 2012

1 commit


27 Jul, 2012

3 commits

  • Add read-only and fail-io modes to thin provisioning.

    If a transaction commit fails the pool's metadata device will transition
    to "read-only" mode. If a commit fails once already in read-only mode
    the transition to "fail-io" mode occurs.

    Once in fail-io mode the pool and all associated thin devices will
    report a status of "Fail".

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • Support non-power-of-2 chunk sizes with dm striping for proper alignment
    of stripe IO on storage that has non-power-of-2 optimal IO sizes (e.g.
    RAID6 10+2).

    Signed-off-by: Mike Snitzer
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • dm-stripe is supposed to ensure that all the space allocated to the
    stripes is fully used and that all stripes are the same size. This
    patch fixes the test. It checks that device length is divisible by the
    chunk size and checks that the resulting quotient is divisible by the
    number of stripes (which is equivalent to testing if device length is
    divisible by chunk_size * stripes).

    Previously, the code only tested that the number of sectors in the target
    was divisible by each of the chunk size and the number of stripes
    separately, which could leave entire stripes unused.

    (A setup that genuinely needs some stripes to be shorter than others
    can be created by concatenating striped targets.)

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

03 Jul, 2012

1 commit

  • Veritysetup is now part of cryptsetup package.
    Remove on-disk header description (which is not parsed in kernel)
    and point users to cryptsetup where it the format is documented.
    Mention units for block size paramaters.
    Fix target line specification and dmsetup parameters.

    Signed-off-by: Milan Broz
    Cc: stable@kernel.org
    Signed-off-by: Alasdair G Kergon

    Milan Broz
     

03 Jun, 2012

1 commit

  • This patch implements two new messages that can be sent to the thin
    pool target allowing it to take a snapshot of the _metadata_. This,
    read-only snapshot can be accessed by userland, concurrently with the
    live target.

    Only one metadata snapshot can be held at a time. The pool's status
    line will give the block location for the current msnap.

    Since version 0.1.5 of the userland thin provisioning tools, the
    thin_dump program displays the msnap as follows:

    thin_dump -m

    Available here: https://github.com/jthornber/thin-provisioning-tools

    Now that userland can access the metadata we can do various things
    that have traditionally been kernel side tasks:

    i) Incremental backups.

    By using metadata snapshots we can work out what blocks have
    changed over time. Combined with data snapshots we can ensure
    the data doesn't change while we back it up.

    A short proof of concept script can be found here:

    https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb

    ii) Migration of thin devices from one pool to another.

    iii) Merging snapshots back into an external origin.

    iv) Asyncronous replication.

    Signed-off-by: Joe Thornber
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     

29 Mar, 2012

5 commits

  • This device-mapper target creates a read-only device that transparently
    validates the data on one underlying device against a pre-generated tree
    of cryptographic checksums stored on a second device.

    Two checksum device formats are supported: version 0 which is already
    shipping in Chromium OS and version 1 which incorporates some
    improvements.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mandeep Singh Baines
    Signed-off-by: Will Drewry
    Signed-off-by: Elly Jones
    Cc: Milan Broz
    Cc: Olof Johansson
    Cc: Steffen Klassert
    Cc: Andrew Morton
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Add dm thin target arguments to control discard support.

    ignore_discard: Disables discard support

    no_discard_passdown: Don't pass discards down to the underlying data
    device, but just remove the mapping within the thin provisioning target.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • Support the use of an external _read only_ device as an origin for a thin
    device.

    Any read to an unprovisioned area of the thin device will be passed
    through to the origin. Writes trigger allocation of new blocks as
    usual.

    One possible use case for this would be VM hosts that want to run
    guests on thinly-provisioned volumes but have the base image on another
    device (possibly shared between many VMs).

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • The thin metadata format can only make use of a device that is = 1 GB, physical extents).

    Rather than reject a larger metadata device, during thin-pool device
    construction, switch to allowing it but issue a warning if a device
    larger than THIN_METADATA_MAX_SECTORS_WARNING (16 GB) is
    provided. Any space over 15.9375 GB will not be used.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • Remove documentation for unimplemented 'trim' message.

    I'd planned a 'trim' target message for shrinking thin devices, but
    this is better handled via the discard ioctl.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     

07 Mar, 2012

1 commit


21 Feb, 2012

1 commit


01 Nov, 2011

3 commits

  • Fix comments: clustered-disk needs a hyphen not an underscore.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan Brassow
     
  • Initial EXPERIMENTAL implementation of device-mapper thin provisioning
    with snapshot support. The 'thin' target is used to create instances of
    the virtual devices that are hosted in the 'thin-pool' target. The
    thin-pool target provides data sharing among devices. This sharing is
    made possible using the persistent-data library in the previous patch.

    The main highlight of this implementation, compared to the previous
    implementation of snapshots, is that it allows many virtual devices to
    be stored on the same data volume, simplifying administration and
    allowing sharing of data between volumes (thus reducing disk usage).

    Another big feature is support for arbitrary depth of recursive
    snapshots (snapshots of snapshots of snapshots ...). The previous
    implementation of snapshots did this by chaining together lookup tables,
    and so performance was O(depth). This new implementation uses a single
    data structure so we don't get this degradation with depth.

    For further information and examples of how to use this, please read
    Documentation/device-mapper/thin-provisioning.txt

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • The persistent-data library offers a re-usable framework for the storage
    and management of on-disk metadata in device-mapper targets.

    It's used by the thin-provisioning target in the next patch and in an
    upcoming hierarchical storage target.

    For further information, please read
    Documentation/device-mapper/persistent-data.txt

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     

02 Aug, 2011

7 commits

  • Add optional parameter field to dmcrypt table and support
    "allow_discards" option.

    Discard requests bypass crypt queue processing. Bio is simple remapped
    to underlying device.

    Note that discard will be never enabled by default because of security
    consequences. It is up to the administrator to enable it for encrypted
    devices.

    (Note that userspace cryptsetup does not understand new optional
    parameters yet. Support for this will come later. Until then, you
    should use 'dmsetup' to enable and disable this.)

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon

    Milan Broz
     
  • Add the ability to parse and use metadata devices to dm-raid. Although
    not strictly required, without the metadata devices, many features of
    RAID are unavailable. They are used to store a superblock and bitmap.

    The role, or position in the array, of each device must be recorded in
    its superblock. This is to help with fault handling, array reshaping,
    and sanity checks. RAID 4/5/6 devices must be loaded in a specific order:
    in this way, the 'array_position' field helps validate the correctness
    of the mapping when it is loaded. It can be used during reshaping to
    identify which devices are added/removed. Fault handling is impossible
    without this field. For example, when a device fails it is recorded in
    the superblock. If this is a RAID1 device and the offending device is
    removed from the array, there must be a way during subsequent array
    assembly to determine that the failed device was the one removed. This
    is done by correlating the 'array_position' field and the bit-field
    variable 'failed_devices'.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan Brassow
     
  • Add the write_mostly parameter to RAID1 dm-raid tables.

    This allows the user to set the WriteMostly flag on a RAID1 device that
    should normally be avoided for read I/O.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan Brassow
     
  • Allow the user to specify the region_size.

    Ensures that the supplied value meets md's constraints, viz. the number of
    regions does not exceed 2^21.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan Brassow
     
  • Add more information about some dm-raid table parameters and clarify how
    parameters are printed when 'dmsetup table' is issued.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan Brassow
     
  • Add corrupt_bio_byte feature to simulate corruption by overwriting a byte at a
    specified position with a specified value during intervals when the device is
    "down".

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • Add 'drop_writes' option to drop writes silently while the
    device is 'down'. Reads are not touched.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     

31 Mar, 2011

1 commit


24 Mar, 2011

1 commit

  • This target is the same as the linear target except that it returns I/O
    errors periodically. It's been found useful in simulating failing
    devices for testing purposes.

    I needed a dm target to do some failure testing on btrfs's raid code, and
    Mike pointed me at this.

    Signed-off-by: Josef Bacik
    Signed-off-by: Alasdair G Kergon

    Josef Bacik