01 Nov, 2011

2 commits

  • Initial EXPERIMENTAL implementation of device-mapper thin provisioning
    with snapshot support. The 'thin' target is used to create instances of
    the virtual devices that are hosted in the 'thin-pool' target. The
    thin-pool target provides data sharing among devices. This sharing is
    made possible using the persistent-data library in the previous patch.

    The main highlight of this implementation, compared to the previous
    implementation of snapshots, is that it allows many virtual devices to
    be stored on the same data volume, simplifying administration and
    allowing sharing of data between volumes (thus reducing disk usage).

    Another big feature is support for arbitrary depth of recursive
    snapshots (snapshots of snapshots of snapshots ...). The previous
    implementation of snapshots did this by chaining together lookup tables,
    and so performance was O(depth). This new implementation uses a single
    data structure so we don't get this degradation with depth.

    For further information and examples of how to use this, please read
    Documentation/device-mapper/thin-provisioning.txt

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Joe Thornber
     
  • The dm-bufio interface allows you to do cached I/O on devices,
    holding recently-read blocks in memory and performing delayed writes.

    We don't use buffer cache or page cache already present in the kernel, because:
    * we need to handle block sizes larger than a page
    * we can't allocate memory to perform reads or we'd have deadlocks

    Currently, when a cache is required, we limit its size to a fraction of
    available memory. Usage can be viewed and changed in
    /sys/module/dm_bufio/parameters/ .

    The first user is thin provisioning, but more dm users are planned.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

24 Mar, 2011

1 commit

  • This target is the same as the linear target except that it returns I/O
    errors periodically. It's been found useful in simulating failing
    devices for testing purposes.

    I needed a dm target to do some failure testing on btrfs's raid code, and
    Mike pointed me at this.

    Signed-off-by: Josef Bacik
    Signed-off-by: Alasdair G Kergon

    Josef Bacik
     

14 Jan, 2011

1 commit

  • This patch is the skeleton for the DM target that will be
    the bridge from DM to MD (initially RAID456 and later RAID1). It
    provides a way to use device-mapper interfaces to the MD RAID456
    drivers.

    As with all device-mapper targets, the nominal public interfaces are the
    constructor (CTR) tables and the status outputs (both STATUSTYPE_INFO
    and STATUSTYPE_TABLE). The CTR table looks like the following:

    1: raid \
    2: \
    3: ..

    Line 1 contains the standard first three arguments to any device-mapper
    target - the start, length, and target type fields. The target type in
    this case is "raid".

    Line 2 contains the arguments that define the particular raid
    type/personality/level, the required arguments for that raid type, and
    any optional arguments. Possible raid types include: raid4, raid5_la,
    raid5_ls, raid5_rs, raid6_zr, raid6_nr, and raid6_nc. (again, raid1 is
    planned for the future.) The list of required and optional parameters
    is the same for all the current raid types. The required parameters are
    positional, while the optional parameters are given as key/value pairs.
    The possible parameters are as follows:
    Chunk size in sectors.
    [[no]sync] Force/Prevent RAID initialization
    [rebuild ] Rebuild the drive indicated by the index
    [daemon_sleep ] Time between bitmap daemon work to clear bits
    [min_recovery_rate ] Throttle RAID initialization
    [max_recovery_rate ] Throttle RAID initialization
    [max_write_behind ] See '-write-behind=' (man mdadm)
    [stripe_cache ] Stripe cache size for higher RAIDs

    Line 3 contains the list of devices that compose the array in
    metadata/data device pairs. If the metadata is stored separately, a '-'
    is given for the metadata device position. If a drive has failed or is
    missing at creation time, a '-' can be given for both the metadata and
    data drives for a given position.

    Examples:
    # RAID4 - 4 data drives, 1 parity
    # No metadata devices specified to hold superblock/bitmap info
    # Chunk size of 1MiB
    # (Lines separated for easy reading)
    0 1960893648 raid \
    raid4 1 2048 \
    5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81

    # RAID4 - 4 data drives, 1 parity (no metadata devices)
    # Chunk size of 1MiB, force RAID initialization,
    # min recovery rate at 20 kiB/sec/disk
    0 1960893648 raid \
    raid4 4 2048 min_recovery_rate 20 sync\
    5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81

    Performing a 'dmsetup table' should display the CTR table used to
    construct the mapping (with possible reordering of optional
    parameters).

    Performing a 'dmsetup status' will yield information on the state and
    health of the array. The output is as follows:
    1: raid \
    2:

    Line 1 is standard DM output. Line 2 is best shown by example:
    0 1960893648 raid raid4 5 AAAAA 2/490221568
    Here we can see the RAID type is raid4, there are 5 devices - all of
    which are 'A'live, and the array is 2/490221568 complete with recovery.

    Cc: linux-raid@vger.kernel.org
    Signed-off-by: NeilBrown
    Signed-off-by: Jonathan Brassow
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    NeilBrown
     

09 Aug, 2010

1 commit


29 Oct, 2009

1 commit


16 Oct, 2009

1 commit


22 Jun, 2009

3 commits

  • This patch contains a device-mapper mirror log module that forwards
    requests to userspace for processing.

    The structures used for communication between kernel and userspace are
    located in include/linux/dm-log-userspace.h. Due to the frequency,
    diversity, and 2-way communication nature of the exchanges between
    kernel and userspace, 'connector' was chosen as the interface for
    communication.

    The first log implementations written in userspace - "clustered-disk"
    and "clustered-core" - support clustered shared storage. A userspace
    daemon (in the LVM2 source code repository) uses openAIS/corosync to
    process requests in an ordered fashion with the rest of the nodes in the
    cluster so as to prevent log state corruption. Other implementations
    with no association to LVM or openAIS/corosync, are certainly possible.

    (Imagine if two machines are writing to the same region of a mirror.
    They would both mark the region dirty, but you need a cluster-aware
    entity that can handle properly marking the region clean when they are
    done. Otherwise, you might clear the region when the first machine is
    done, not the second.)

    Signed-off-by: Jonathan Brassow
    Cc: Evgeniy Polyakov
    Signed-off-by: Alasdair G Kergon

    Jonthan Brassow
     
  • This patch adds a service time oriented dynamic load balancer,
    dm-service-time, which selects the path with the shortest estimated
    service time for the incoming I/O.
    The service time is estimated by dividing the in-flight I/O size
    by a performance value of each path.

    The performance value can be given as a table argument at the table
    loading time. If no performance value is given, all paths are
    considered equal.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Alasdair G Kergon

    Kiyoshi Ueda
     
  • This patch adds a dynamic load balancer, dm-queue-length, which
    balances the number of in-flight I/Os across the paths.

    The code is based on the patch posted by Stefan Bader:
    https://www.redhat.com/archives/dm-devel/2005-October/msg00050.html

    Signed-off-by: Stefan Bader
    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Alasdair G Kergon

    Kiyoshi Ueda
     

31 Mar, 2009

2 commits

  • Move the raid6 data processing routines into a standalone module
    (raid6_pq) to prepare them to be called from async_tx wrappers and other
    non-md drivers/modules. This precludes a circular dependency of raid456
    needing the async modules for data processing while those modules in
    turn depend on raid456 for the base level synchronous raid6 routines.

    To support this move:
    1/ The exportable definitions in raid6.h move to include/linux/raid/pq.h
    2/ The raid6_call, recovery calls, and table symbols are exported
    3/ Extra #ifdef __KERNEL__ statements to enable the userspace raid6test to
    compile

    Signed-off-by: Dan Williams
    Signed-off-by: NeilBrown

    Dan Williams
     
  • Use the -y variables instead of the old -objs so we can easily add
    conditional objects to the modules. Also always use += to add
    subobjects to avoid problems when placing additional objects in
    some place in the file.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: NeilBrown

    Christoph Hellwig
     

06 Jan, 2009

2 commits

  • Move the existing snapshot exception store implementations out into
    separate files. Later patches will place these behind a new
    interface in preparation for alternative implementations.

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     
  • Implement simple read-only sysfs entry for device-mapper block device.

    This patch adds a simple sysfs directory named "dm" under block device
    properties and implements
    - name attribute (string containing mapped device name)
    - uuid attribute (string containing UUID, or empty string if not set)

    The kobject is embedded in mapped_device struct, so no additional
    memory allocation is needed for initializing sysfs entry.

    During the processing of sysfs attribute we need to lock mapped device
    which is done by a new function dm_get_from_kobj, which returns the md
    associated with kobject and increases the usage count.

    Each 'show attribute' function is responsible for its own locking.

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon

    Milan Broz
     

22 Oct, 2008

1 commit


05 Jun, 2008

2 commits


25 Apr, 2008

2 commits


20 Oct, 2007

2 commits


14 Jul, 2007

1 commit

  • * 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop: (28 commits)
    ioatdma: add the unisys "i/oat" pci vendor/device id
    ARM: Add drivers/dma to arch/arm/Kconfig
    iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver
    iop13xx: surface the iop13xx adma units to the iop-adma driver
    dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
    md: remove raid5 compute_block and compute_parity5
    md: handle_stripe5 - request io processing in raid5_run_ops
    md: handle_stripe5 - add request/completion logic for async expand ops
    md: handle_stripe5 - add request/completion logic for async read ops
    md: handle_stripe5 - add request/completion logic for async check ops
    md: handle_stripe5 - add request/completion logic for async compute ops
    md: handle_stripe5 - add request/completion logic for async write ops
    md: common infrastructure for running operations with raid5_run_ops
    md: raid5_run_ops - run stripe operations outside sh->lock
    raid5: replace custom debug PRINTKs with standard pr_debug
    raid5: refactor handle_stripe5 and handle_stripe6 (v3)
    async_tx: add the async_tx api
    xor: make 'xor_blocks' a library routine for use with async_tx
    dmaengine: make clients responsible for managing channels
    dmaengine: refactor dmaengine around dma_async_tx_descriptor
    ...

    Linus Torvalds
     

13 Jul, 2007

2 commits

  • The async_tx api tries to use a dma engine for an operation, but will fall
    back to an optimized software routine otherwise. Xor support is
    implemented using the raid5 xor routines. For organizational purposes this
    routine is moved to a common area.

    The following fixes are also made:
    * rename xor_block => xor_blocks, suggested by Adrian Bunk
    * ensure that xor.o initializes before md.o in the built-in case
    * checkpatch.pl fixes
    * mark calibrate_xor_blocks __init, Adrian Bunk

    Cc: Adrian Bunk
    Cc: NeilBrown
    Cc: Herbert Xu
    Signed-off-by: Dan Williams

    Dan Williams
     
  • This patch supports LSI/Engenio devices in RDAC mode. Like dm-emc
    it requires userspace support. In your multipath.conf file you must have:

    path_checker rdac
    hardware_handler "1 rdac"
    prio_callout "/sbin/mpath_prio_tpc /dev/%n"

    And you also then must have a updated multipath tools release which
    has rdac support.

    Signed-off-by: Chandra Seetharaman
    Signed-off-by: Mike Christie
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Linus Torvalds

    Chandra Seetharaman
     

10 May, 2007

1 commit

  • New device-mapper target that can delay I/O (for testing). Reads can be
    separated from writes, redirected to different underlying devices and delayed
    by differing amounts of time.

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heinz Mauelshagen
     

27 Jun, 2006

1 commit

  • There is a lot of commonality between raid5.c and raid6main.c. This patches
    merges both into one module called raid456. This saves a lot of code, and
    paves the way for online raid5->raid6 migrations.

    There is still duplication, e.g. between handle_stripe5 and handle_stripe6.
    This will probably be cleaned up later.

    Cc: "H. Peter Anvin"
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

22 Jun, 2005

1 commit

  • With this patch, the intent to write to some block in the array can be logged
    to a bitmap file. Each bit represents some number of sectors and is set
    before any update happens, and only cleared when all writes relating to all
    sectors are complete.

    After an unclean shutdown, information in this bitmap can be used to optimise
    resync - only sectors which could be out-of-sync need to be updated.

    Also if a drive is removed and then added back into an array, the recovery can
    make use of the bitmap to optimise reconstruction. This is not implemented in
    this patch.

    Currently the bitmap is stored in a file which must (obviously) be stored on a
    separate device.

    The patch only provided infrastructure. It does not update any personalities
    to bitmap intent logging.

    Md arrays can still be used with no bitmap file. This patch has minimal
    impact on such arrays.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds