11 Feb, 2020

1 commit

  • commit 44d8ebf436399a40fcd10dd31b29d37823d62fcc upstream.

    Ensure that the pool is locked during calls to __commit_transaction and
    __destroy_persistent_data_objects. Just being consistent with locking,
    but reality is dm_pool_metadata_close is called once pool is being
    destroyed so access to pool shouldn't be contended.

    Also, use pmd_write_lock_in_core rather than __pmd_write_lock in
    dm_pool_commit_metadata and rename __pmd_write_lock to
    pmd_write_lock_in_core -- there was no need for the alias.

    In addition, verify that the pool is locked in __commit_transaction().

    Fixes: 873f258becca ("dm thin metadata: do not write metadata if no changes occurred")
    Cc: stable@vger.kernel.org
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Mike Snitzer
     

21 Dec, 2019

1 commit

  • commit ecda7c0280e6b3398459dc589b9a41c1adb45529 upstream.

    Add support for one pre-commit callback which is run right before the
    metadata are committed.

    This allows the thin provisioning target to run a callback before the
    metadata are committed and is required by the next commit.

    Cc: stable@vger.kernel.org
    Signed-off-by: Nikos Tsironis
    Acked-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Nikos Tsironis
     

03 Jul, 2019

1 commit

  • Check if in fail_io mode at start of dm_pool_metadata_set_needs_check().
    Otherwise dm_pool_metadata_set_needs_check()'s superblock_lock() can
    crash in dm_bm_write_lock() while accessing the block manager object
    that was previously destroyed as part of a failed
    dm_pool_abort_metadata() that ultimately set fail_io to begin with.

    Also, update DMERR() message to more accurately describe
    superblock_lock() failure.

    Cc: stable@vger.kernel.org
    Reported-by: Zdenek Kabelac
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

19 Apr, 2019

3 commits

  • Otherwise, just activating a thin-pool and thin device and then
    deactivating them will cause the thin-pool metadata to be changed
    (e.g. superblock written) -- even without any metadata being changed.

    Add 'in_service' flag to struct dm_pool_metadata and set it in
    pmd_write_lock() because all on-disk metadata changes must take a write
    lock of pmd->root_lock. Once 'in_service' is set it is never cleared.
    __commit_transaction() will return 0 if 'in_service' is not set.
    dm_pool_commit_metadata() is updated to use __pmd_write_lock() so that
    it isn't the sole reason for putting a thin-pool in service.

    Also fix dm_pool_commit_metadata() to open the next transaction if the
    return from __commit_transaction() is 0. Not seeing why the early
    return ever made since for a return of 0 given that dm-io's async_io(),
    as used by bufio, always returns 0.

    Signed-off-by: Mike Snitzer

    Mike Snitzer
     
  • No functional change, but this prepares to hook off of pmd_write_lock()
    with additional functionality (as provided in next commit).

    Suggested-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     
  • Fix __reserve_metadata_snap() to return early if __commit_transaction()
    fails.

    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

16 Jan, 2019

1 commit

  • Commit 00a0ea33b495 ("dm thin: do not queue freed thin mapping for next
    stage processing") changed process_prepared_discard_passdown_pt1() to
    increment all the blocks being discarded until after the passdown had
    completed to avoid them being prematurely reused.

    IO issued to a thin device that breaks sharing with a snapshot, followed
    by a discard issued to snapshot(s) that previously shared the block(s),
    results in passdown_double_checking_shared_status() being called to
    iterate through the blocks double checking their reference count is zero
    and issuing the passdown if so. So a side effect of commit 00a0ea33b495
    is passdown_double_checking_shared_status() was broken.

    Fix this by checking if the block reference count is greater than 1.
    Also, rename dm_pool_block_is_used() to dm_pool_block_is_shared().

    Fixes: 00a0ea33b495 ("dm thin: do not queue freed thin mapping for next stage processing")
    Cc: stable@vger.kernel.org # 4.9+
    Reported-by: ryan.p.norwood@gmail.com
    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Joe Thornber
     

17 Sep, 2018

1 commit


11 Sep, 2018

1 commit

  • Committing a transaction can consume some metadata of it's own, we now
    reserve a small amount of metadata to cover this. Free metadata
    reported by the kernel will not include this reserve.

    If any of the reserve has been used after a commit we enter a new
    internal state PM_OUT_OF_METADATA_SPACE. This is reported as
    PM_READ_ONLY, so no userland changes are needed. If the metadata
    device is resized the pool will move back to PM_WRITE.

    These changes mean we never need to abort and rollback a transaction due
    to running out of metadata space. This is particularly important
    because there have been a handful of reports of data corruption against
    DM thin-provisioning that can all be attributed to the thin-pool having
    ran out of metadata space.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Joe Thornber
     

23 Jun, 2018

1 commit

  • Commit 5a32083d03fb5 ("dm: take care to copy the space map roots before
    locking the superblock") properly removed the calls to dm_sm_root_size()
    from __write_initial_superblock(). But the dm_sm_root_size() calls were
    left dangling in __commit_transaction().

    Fixes: 5a32083d03fb5 ("dm: take care to copy the space map roots before locking the superblock")
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

17 Jan, 2018

1 commit

  • For btree removal, there is a corner case that a single thread
    could takes 6 locks which is more than THIN_MAX_CONCURRENT_LOCKS(5)
    and leads to deadlock.

    A btree removal might eventually call
    rebalance_children()->rebalance3() to rebalance entries of three
    neighbor child nodes when shadow_spine has already acquired two
    write locks. In rebalance3(), it tries to shadow and acquire the
    write locks of all three child nodes. However, shadowing a child
    node requires acquiring a read lock of the original child node and
    a write lock of the new block. Although the read lock will be
    released after block shadowing, shadowing the third child node
    in rebalance3() could still take the sixth lock.
    (2 write locks for shadow_spine +
    2 write locks for the first two child nodes's shadow +
    1 write lock for the last child node's shadow +
    1 read lock for the last child node)

    Cc: stable@vger.kernel.org
    Signed-off-by: Dennis Yang
    Acked-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Dennis Yang
     

16 May, 2017

1 commit


28 Apr, 2017

1 commit


21 Jul, 2016

1 commit

  • The discard passdown was being issued after the block was unmapped,
    which meant the block could be reprovisioned whilst the passdown discard
    was still in flight.

    We can only identify unshared blocks (safe to do a passdown a discard
    to) once they're unmapped and their ref count hits zero. Block ref
    counts are now used to guard against concurrent allocation of these
    blocks that are being discarded. So now we unmap the block, issue
    passdown discards, and the immediately increment ref counts for regions
    that have been discarded via passed down (this is safe because
    allocation occurs within the same thread). We then decrement ref counts
    once the passdown discard IO is complete -- signaling these blocks may
    now be allocated.

    This fixes the potential for corruption that was reported here:
    https://www.redhat.com/archives/dm-devel/2016-June/msg00311.html

    Reported-by: Dennis Yang
    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Joe Thornber
     

11 Mar, 2016

2 commits


10 Dec, 2015

3 commits

  • Refactor dm_thin_find_mapped_range() so that it takes the read lock on
    the metadata's lock; rather than relying on finer grained locking that
    is pushed down inside dm_thin_find_next_mapped_block() and
    dm_thin_find_block().

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Joe Thornber
     
  • Use dm_btree_lookup_next() to more quickly discard partially mapped
    volumes.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Joe Thornber
     
  • When you take a metadata snapshot the btree roots for the mapping and
    details tree need to have their reference counts incremented so they
    persist for the lifetime of the metadata snap.

    The roots being incremented were those currently written in the
    superblock, which could possibly be out of date if concurrent IO is
    triggering new mappings, breaking of sharing, etc.

    Fix this by performing a commit with the metadata lock held while taking
    a metadata snapshot.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Joe Thornber
     

03 Dec, 2015

1 commit

  • dm_btree_remove_leaves() only unmaps a contiguous region so we need a
    loop, in __remove_range(), to handle ranges that contain multiple
    regions.

    A new btree function, dm_btree_lookup_next(), is introduced which is
    more efficiently able to skip over regions of the thin device which
    aren't mapped. __remove_range() uses dm_btree_lookup_next() for each
    iteration of __remove_range()'s loop.

    Also, improve description of dm_btree_remove_leaves().

    Fixes: 6550f075 ("dm thin metadata: add dm_thin_remove_range()")
    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org # 4.1+

    Joe Thornber
     

01 Nov, 2015

1 commit


12 Aug, 2015

1 commit


12 Jun, 2015

3 commits


30 May, 2015

1 commit


10 Feb, 2015

1 commit


11 Nov, 2014

2 commits


16 Jul, 2014

1 commit

  • The block size for the thin-pool's data device must remained fixed for
    the life of the thin-pool. Disallow any attempt to change the
    thin-pool's data block size.

    It should be noted that attempting to change the data block size via
    thin-pool table reload will be ignored as a side-effect of the thin-pool
    handover that the thin-pool target does during thin-pool table reload.

    Here is an example outcome of attempting to load a thin-pool table that
    reduced the thin-pool's data block size from 1024K to 512K.

    Before:
    kernel: device-mapper: thin: 253:4: growing the data device from 204800 to 409600 blocks

    After:
    kernel: device-mapper: thin metadata: changing the data block size (from 2048 to 1024) is not supported
    kernel: device-mapper: table: 253:4: thin-pool: Error creating metadata object
    kernel: device-mapper: ioctl: error adding target to table

    Signed-off-by: Mike Snitzer
    Acked-by: Joe Thornber
    Cc: stable@vger.kernel.org

    Mike Snitzer
     

28 Mar, 2014

1 commit


06 Mar, 2014

1 commit

  • If a thin metadata operation fails the current transaction will abort,
    whereby causing potential for IO layers up the stack (e.g. filesystems)
    to have data loss. As such, set THIN_METADATA_NEEDS_CHECK_FLAG in the
    thin metadata's superblock which:
    1) requires the user verify the thin metadata is consistent (e.g. use
    thin_check, etc)
    2) suggests the user verify the thin data is consistent (e.g. use fsck)

    The only way to clear the superblock's THIN_METADATA_NEEDS_CHECK_FLAG is
    to run thin_repair.

    On metadata operation failure: abort current metadata transaction, set
    pool in read-only mode, and now set the needs_check flag.

    As part of this change, constraints are introduced or relaxed:
    * don't allow a pool to transition to write mode if needs_check is set
    * don't allow data or metadata space to be resized if needs_check is set
    * if a thin pool's metadata space is exhausted: the kernel will now
    force the user to take the pool offline for repair before the kernel
    will allow the metadata space to be extended.

    Also, update Documentation to include information about when the thin
    provisioning target commits metadata, how it handles metadata failures
    and running out of space.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Joe Thornber

    Mike Snitzer
     

28 Feb, 2014

1 commit

  • It was always intended that a user could provide a thin metadata device
    that is larger than the max supported by the on-disk format. The extra
    space would just go unused.

    Unfortunately that never worked. If the user attempted to use a larger
    metadata device on creation they would get an error like the following:

    device-mapper: space map common: space map too large
    device-mapper: transaction manager: couldn't create metadata space map
    device-mapper: thin metadata: tm_create_with_sm failed
    device-mapper: table: 252:17: thin-pool: Error creating metadata object
    device-mapper: ioctl: error adding target to table

    Fix this by allowing the initial metadata space map creation to cap its
    size at the max number of blocks supported (DM_SM_METADATA_MAX_BLOCKS).
    get_metadata_dev_size() must also impose DM_SM_METADATA_MAX_BLOCKS (via
    THIN_METADATA_MAX_SECTORS), otherwise extending metadata would cap at
    THIN_METADATA_MAX_SECTORS_WARNING (which is larger than supported).

    Also, the calculation for THIN_METADATA_MAX_SECTORS didn't account for
    the sizeof the disk_bitmap_header. So the supported maximum metadata
    size is a bit smaller (reduced from 33423360 to 33292800 sectors).

    Lastly, remove the "excess space will not be used" warning message from
    get_metadata_dev_size(); it resulted in printing the warning multiple
    times. Factor out warn_if_metadata_device_too_big(), call it from
    pool_ctr() and maybe_resize_metadata_dev().

    Signed-off-by: Mike Snitzer
    Acked-by: Joe Thornber

    Mike Snitzer
     

18 Feb, 2014

1 commit

  • Commit 905e51b ("dm thin: commit outstanding data every second")
    introduced a periodic commit. This commit occurs regardless of whether
    any thin devices have made changes.

    Fix the periodic commit to check if any of a pool's thin devices have
    changed using dm_pool_changed_this_transaction().

    Reported-by: Alexander Larsson
    Signed-off-by: Mike Snitzer
    Acked-by: Joe Thornber
    Cc: stable@vger.kernel.org

    Mike Snitzer
     

07 Jan, 2014

1 commit

  • If a snapshot is created and later deleted the origin dm_thin_device's
    snapshotted_time will have been updated to reflect the snapshot's
    creation time. The 'shared' flag in the dm_thin_lookup_result struct
    returned from dm_thin_find_block() is an approximation based on
    snapshotted_time -- this is done to avoid 0(n), or worse, time
    complexity. In this case, the shared flag would be true.

    But because the 'shared' flag reflects an approximation a block can be
    incorrectly assumed to be shared (e.g. false positive for 'shared'
    because the snapshot no longer exists). This could result in discards
    issued to a thin device not being passed down to the pool's underlying
    data device.

    To fix this we double check that a thin block is really still in-use
    after a mapping is removed using dm_pool_block_is_used(). If the
    reference count for a block is now zero the discard is allowed to be
    passed down.

    Also add a 'definitely_not_shared' member to the dm_thin_new_mapping
    structure -- reflects that the 'shared' flag in the response from
    dm_thin_find_block() can only be held as definitive if false is
    returned.

    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1043527

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Joe Thornber
     

11 Dec, 2013

1 commit

  • A thin-pool may be in read-only mode because the pool's data or metadata
    space was exhausted. To allow for recovery, by adding more space to the
    pool, we must allow a pool to transition from PM_READ_ONLY to PM_WRITE
    mode. Otherwise, running out of space will render the pool permanently
    read-only.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Joe Thornber
     

10 May, 2013

3 commits


02 Mar, 2013

1 commit