02 Aug, 2011

3 commits

  • If we write a full chunk in the snapshot, skip reading the origin device
    because the whole chunk will be overwritten anyway.

    This patch changes the snapshot write logic when a full chunk is written.
    In this case:
    1. allocate the exception
    2. dispatch the bio (but don't report the bio completion to device mapper)
    3. write the exception record
    4. report bio completed

    Callbacks must be done through the kcopyd thread, because callbacks must not
    race with each other. So we create two new functions:

    dm_kcopyd_prepare_callback: allocate a job structure and prepare the callback.
    (This function must not be called from interrupt context.)

    dm_kcopyd_do_callback: submit callback.
    (This function may be called from interrupt context.)

    Performance test (on snapshots with 4k chunk size):
    without the patch:
    non-direct-io sequential write (dd): 17.7MB/s
    direct-io sequential write (dd): 20.9MB/s
    non-direct-io random write (mkfs.ext2): 0.44s

    with the patch:
    non-direct-io sequential write (dd): 26.5MB/s
    direct-io sequential write (dd): 33.2MB/s
    non-direct-io random write (mkfs.ext2): 0.27s

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Coding style cleanups.

    Signed-off-by: Alasdair G Kergon
    Signed-off-by: Jonathan Brassow

    Jonathan Brassow
     
  • Remove a couple of unused #defines.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

29 May, 2011

2 commits


24 Mar, 2011

1 commit

  • If a table is read-only, also open any log and cow devices it uses read-only.

    Previously, even read-only devices were opened read-write internally.
    After patch 75f1dc0d076d1c1168f2115f1941ea627d38bd5a
    block: check bdev_read_only() from blkdev_get()
    was applied, loading such tables began to fail. The patch
    was reverted by e51900f7d38cbcfb481d84567fd92540e7e1d23a
    block: revert block_dev read-only check
    but this patch fixes this part of the code to work with the original patch.

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon

    Milan Broz
     

14 Jan, 2011

2 commits

  • Use dm_suspended() rather than having each snapshot target maintain a
    private 'suspended' flag in struct dm_snapshot.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • dm_snapshot->queued_bios_work isn't used. Remove ->queued_bios[_work]
    from dm_snapshot structure, the flush_queued_bios work function and
    ksnapd workqueue.

    The DM snapshot changes that were going to use the ksnapd workqueue were
    either superseded (fix for origin write races) or never completed
    (deallocation of invalid snapshot's memory via workqueue).

    Signed-off-by: Tejun Heo
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Tejun Heo
     

23 Oct, 2010

1 commit

  • * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
    xen-blkfront: disable barrier/flush write support
    Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
    block: remove BLKDEV_IFL_WAIT
    aic7xxx_old: removed unused 'req' variable
    block: remove the BH_Eopnotsupp flag
    block: remove the BLKDEV_IFL_BARRIER flag
    block: remove the WRITE_BARRIER flag
    swap: do not send discards as barriers
    fat: do not send discards as barriers
    ext4: do not send discards as barriers
    jbd2: replace barriers with explicit flush / FUA usage
    jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
    jbd: replace barriers with explicit flush / FUA usage
    nilfs2: replace barriers with explicit flush / FUA usage
    reiserfs: replace barriers with explicit flush / FUA usage
    gfs2: replace barriers with explicit flush / FUA usage
    btrfs: replace barriers with explicit flush / FUA usage
    xfs: replace barriers with explicit flush / FUA usage
    block: pass gfp_mask and flags to sb_issue_discard
    dm: convey that all flushes are processed as empty
    ...

    Linus Torvalds
     

11 Sep, 2010

1 commit


10 Sep, 2010

1 commit

  • This patch converts bio-based dm to support REQ_FLUSH/FUA instead of
    now deprecated REQ_HARDBARRIER.

    * -EOPNOTSUPP handling logic dropped.

    * Preflush is handled as before but postflush is dropped and replaced
    with passing down REQ_FUA to member request_queues. This replaces
    one array wide cache flush w/ member specific FUA writes.

    * __split_and_process_bio() now calls __clone_and_map_flush() directly
    for flushes and guarantees all FLUSH bio's going to targets are zero
    ` length.

    * It's now guaranteed that all FLUSH bio's which are passed onto dm
    targets are zero length. bio_empty_barrier() tests are replaced
    with REQ_FLUSH tests.

    * Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes.

    * Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely
    enough to be marked with unlikely().

    * Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue
    doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA
    capability.

    * Request based dm isn't converted yet. dm_init_request_based_queue()
    resets flush support to 0 for now. To avoid disturbing request
    based dm code, dm->flush_error is added for bio based dm while
    requested based dm continues to use dm->barrier_error.

    Lightly tested linear, stripe, raid1, snap and crypt targets. Please
    proceed with caution as I'm not familiar with the code base.

    Signed-off-by: Tejun Heo
    Cc: dm-devel@redhat.com
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     

12 Aug, 2010

4 commits

  • 'target_request_nr' is a more generic name that reflects the fact that
    it will be used for both flush and discard support.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • Implement merge method for the snapshot origin to improve read
    performance.

    Without merge method, dm asks the upper layers to submit smallest possible
    bios --- one page. Submitting such small bios impacts performance negatively
    when reading or writing the origin device.

    Without this patch, CPU consumption when reading the origin on lvm on md-raid0
    was 6 to 12%, with this patch, it drops to 1 to 4%.

    Note: in my testing, it actually degraded performance in some settings, I
    traced it to Maxtor disks having problems with > 512-sector requests.
    Reducing the number of sectors to /sys/block/sd*/queue/max_sectors_kb to
    256 fixed the read performance. I think we don't have to care about weird
    disks that actually degrade performance because of large requests being
    sent to them.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Validate chunk size against both origin and snapshot sector size

    Don't allow chunk size smaller than either origin or snapshot logical
    sector size. Reading or writing data not aligned to sector size is not
    allowed and causes immediate errors.

    This requires us to open the origin before initialising the
    exception store and to export dm_snap_origin.

    Cc: stable@kernel.org
    Signed-off-by: Mikulas Patocka
    Reviewed-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Iterate both origin and snapshot devices

    iterate_devices method should call the callback for all the devices where
    the bio may be remapped. Thus, snapshot_iterate_devices should call the callback
    for both snapshot and origin underlying devices because it remaps some bios
    to the snapshot and some to the origin.

    snapshot_iterate_devices called the callback only for the origin device.
    This led to badly calculated device limits if snapshot and origin were placed
    on different types of disks.

    Cc: stable@kernel.org
    Signed-off-by: Mikulas Patocka
    Reviewed-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

06 Mar, 2010

2 commits


11 Dec, 2009

23 commits

  • If the snapshot we are merging became invalid (e.g. it ran out of
    space) redirect all I/O directly to the origin device.

    Signed-off-by: Mikulas Patocka
    Reviewed-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Set 'merge_failed' flag if a snapshot fails to merge. Update
    snapshot_status() to report "Merge failed" if 'merge_failed' is set.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • s->store->type->prepare_merge returns the number of chunks that can be
    copied linearly working backwards from the returned chunk number.

    For example, if it returns 3 chunks with old_chunk == 10 and new_chunk
    == 20, then chunk 20 can be copied to 10, chunk 19 to 9 and 18 to 8.

    Until now kcopyd only copied one chunk at a time. This patch now copies
    the full set at once.

    Consequently, snapshot_merge_process() needs to delay the merging of all
    chunks if any have writes in progress, not just the first chunk in the
    region that is to be merged.

    snapshot-merge's performance is now comparable to the original
    snapshot-origin target.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • When there is one merging snapshot and other non-merging snapshots,
    snapshot_merge_process() must make exceptions in the non-merging
    snapshots.

    Use a sequence count to resolve the race between I/O to chunks that are
    about to be merged. The count increases each time an exception
    reallocation finishes. Use wait_event() to wait until the count
    changes.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Track writes to chunks that are currently being merged and delay merging
    a chunk until all writes to that chunk finish.

    Signed-off-by: Mikulas Patocka
    Reviewed-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • While a set of chunks is being merged, any overlapping writes need to be
    queued.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Merging is started when origin is resumed and it is stopped when
    origin is suspended or when the merging snapshot is destroyed or
    errors are detected.

    Merging is not yet interlocked with writes: this will be handled in
    subsequent patches.

    The code relies on callbacks from a private kcopyd thread.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Merging more than one snapshot is not supported, so prevent
    this happening.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Sets num_flush_requests=2 to support flushing both the origin and cow
    devices used by the snapshot-merge target.

    Also, snapshot_ctr() now gets the origin device using FMODE_WRITE if the
    target is snapshot-merge (which writes to the origin device).

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • The snapshot-merge target should not allocate new exceptions because the
    intent is to merge all of its exceptions as quickly and safely as
    possible.

    This patch introduces the snapshot-merge mapping function and updates
    __origin_write() so that it doesn't allocate exceptions on any snapshots
    that are being merged.

    If a write request to a merging snapshot device is to be dispatched
    directly to the origin (because the chunk is not remapped or was already
    merged), snapshot_merge_map() must make exceptions in other snapshots so
    calls do_origin().

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • To track the completion of exceptions relating to the same location on
    the device, the current code selects one exception as primary_pe, links
    the other exceptions to it and uses reference counting to wait until all
    the reallocations are complete.

    It is considered too complicated to extend this code to handle the new
    snapshot-merge target, where sets of non-overlapping chunks would also
    need to become linked.

    Instead, a simpler (but less efficient) approach is taken. Bios are
    linked to one exception. When it completes, bios are simply retried,
    and if other related exceptions are still outstanding, they'll get
    queued again to wait for another one.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • The snapshot-merge target allows a snapshot to be merged back into the
    snapshot's origin device.

    One anticipated use of snapshot merging is the rollback of filesystems
    to back out problematic system upgrades.

    This patch adds snapshot-merge target management to both
    dm_snapshot_init() and dm_snapshot_exit(). As an initial place-holder,
    snapshot-merge is identical to the snapshot target. Documentation is
    provided.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Move the __chunk_is_tracked() loop into a separate function as we will
    also need to call it from the write path in the rare case of conflicting
    writes to the same chunk.

    Originally introduced in commit a8d41b59f3f5a7ac19452ef442a7fc1b5fa17366
    ("dm snapshot: fix race during exception creation").

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • To support the merging of snapshots back into their origin we need
    to trigger exceptions in other snapshots not being merged without
    any incoming bio on the origin device. The bio parameter to
    __origin_write() becomes optional and the sector needs supplying
    separately.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Permit in-use snapshot exception data to be 'handed over' from one
    snapshot instance to another. This is a pre-requisite for patches
    that allow the changes made in a snapshot device to be merged back into
    its origin device and also allows device resizing.

    The basic call sequence is:

    dmsetup load new_snapshot (referencing the existing in-use cow device)
    - the ctr code detects that the cow is already in use and allows the
    two snapshot target instances to be linked together
    dmsetup suspend original_snapshot
    dmsetup resume new_snapshot
    - the new_snapshot becomes live, and if anything now tries to access
    the original one it will receive -EIO
    dmsetup remove original_snapshot

    (There can only be two snapshot targets referencing the same cow device
    simultaneously.)

    Signed-off-by: Mike Snitzer
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • Keep track of whether or not the device is suspended within the snapshot
    target module, the same as we do in dm-raid1.

    We will use this later to enforce the correct sequence of ioctls to
    transfer the in-core exceptions from a snapshot target instance in
    one table to a replacement one capable of merging them back
    into the origin.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • Store the reference to the snapshot cow device in the core snapshot
    code instead of each exception store. It can be accessed through the
    new function dm_snap_cow(). Exception stores should each now maintain a
    reference to their parent snapshot struct.

    This is cleaner and makes part of the forthcoming snapshot merge code simpler.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon
    Reviewed-by: Jonathan Brassow
    Cc: Mikulas Patocka

    Mike Snitzer
     
  • Add number of sectors used by metadata to the end of the snapshot's status
    line.

    Renamed dm_exception_store_type's 'fraction_full' to 'usage'. Renamed
    arguments to be clearer about what is being returned. Also added
    'metadata_sectors'.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • Rename exception functions. Preparing to pull them out of
    dm-snap.c for broader use.

    Signed-off-by: Jonathan Brassow
    Reviewed-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Jon Brassow
     
  • Rename exception_table for broader use outside dm-snap.c

    Signed-off-by: Jonathan Brassow
    Reviewed-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Jon Brassow
     
  • The exception structure is not necessarily just a snapshot
    element (especially after we pull it out of dm-snap.c).

    Renaming appropriately.

    Signed-off-by: Jonathan Brassow
    Reviewed-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Jon Brassow
     
  • Consolidate the insert_*exception functions. 'insert_completed_exception'
    already contains all the logic to handle 'insert_exception' (via
    check for a hash_shift of 0), so remove redundant function.

    Signed-off-by: Jonathan Brassow
    Reviewed-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Jon Brassow
     
  • The origin needs to find minimum chunksize of all snapshots. This logic is
    moved to a separate function because it will be used at another place in
    the snapshot merge patches.

    Signed-off-by: Mikulas Patocka
    Reviewed-by: Mike Snitzer
    Reviewed-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka