31 May, 2017

1 commit

  • Commit b685d3d65ac7 ("block: treat REQ_FUA and REQ_PREFLUSH as
    synchronous") removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
    definitions. generic_make_request_checks() however strips REQ_FUA and
    REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
    write cache and thus write effectively becomes asynchronous which can
    lead to performance regressions.

    Fix the problem by making sure all bios which are synchronous are
    properly marked with REQ_SYNC.

    Fixes: b685d3d65ac7 ("block: treat REQ_FUA and REQ_PREFLUSH as synchronous")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jan Kara
    Signed-off-by: Mike Snitzer

    Jan Kara
     

01 Nov, 2016

1 commit


08 Jun, 2016

1 commit


09 Jan, 2016

1 commit

  • When there is an error copying a chunk dm-snapshot can incorrectly hold
    associated bios indefinitely, resulting in hung IO.

    The function copy_callback sets pe->error if there was error copying the
    chunk, and then calls complete_exception. complete_exception calls
    pending_complete on error, otherwise it calls commit_exception with
    commit_callback (and commit_callback calls complete_exception).

    The persistent exception store (dm-snap-persistent.c) assumes that calls
    to prepare_exception and commit_exception are paired.
    persistent_prepare_exception increases ps->pending_count and
    persistent_commit_exception decreases it.

    If there is a copy error, persistent_prepare_exception is called but
    persistent_commit_exception is not. This results in the variable
    ps->pending_count never returning to zero and that causes some pending
    exceptions (and their associated bios) to be held forever.

    Fix this by unconditionally calling commit_exception regardless of
    whether the copy was successful. A new "valid" parameter is added to
    commit_exception -- when the copy fails this parameter is set to zero so
    that the chunk that failed to copy (and all following chunks) is not
    recorded in the snapshot store. Also, remove commit_callback now that
    it is merely a wrapper around pending_complete.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Mikulas Patocka
     

05 Nov, 2015

1 commit

  • Pull device mapper updates from Mike Snitzer:
    "Smaller set of DM changes for this merge. I've based these changes on
    Jens' for-4.4/reservations branch because the associated DM changes
    required it.

    - Revert a dm-multipath change that caused a regression for
    unprivledged users (e.g. kvm guests) that issued ioctls when a
    multipath device had no available paths.

    - Include Christoph's refactoring of DM's ioctl handling and add
    support for passing through persistent reservations with DM
    multipath.

    - All other changes are very simple cleanups"

    * tag 'dm-4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm switch: simplify conditional in alloc_region_table()
    dm delay: document that offsets are specified in sectors
    dm delay: capitalize the start of an delay_ctr() error message
    dm delay: Use DM_MAPIO macros instead of open-coded equivalents
    dm linear: remove redundant target name from error messages
    dm persistent data: eliminate unnecessary return values
    dm: eliminate unused "bioset" process for each bio-based DM device
    dm: convert ffs to __ffs
    dm: drop NULL test before kmem_cache_destroy() and mempool_destroy()
    dm: add support for passing through persistent reservations
    dm: refactor ioctl handling
    Revert "dm mpath: fix stalls when handling invalid ioctls"
    dm: initialize non-blk-mq queue data before queue is used

    Linus Torvalds
     

01 Nov, 2015

1 commit

  • ffs counts bit starting with 1 (for the least significant bit), __ffs
    counts bits starting with 0. This patch changes various occurrences of ffs
    to __ffs and removes subtraction of 1 from the result.

    Note that __ffs (unlike ffs) is not defined when called with zero
    argument, but it is not called with zero argument in any of these cases.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

14 Oct, 2015

1 commit


10 Oct, 2015

1 commit

  • Commit 76c44f6d80 introduced the possibly for "Overflow" to be reported
    by the snapshot device's status. Older userspace (e.g. lvm2) does not
    handle the "Overflow" status response.

    Fix this incompatibility by requiring newer userspace code, that can
    cope with "Overflow", request the persistent store with overflow support
    by using "PO" (Persistent with Overflow) for the snapshot store type.

    Reported-by: Zdenek Kabelac
    Fixes: 76c44f6d80 ("dm snapshot: don't invalidate on-disk image on snapshot write overflow")
    Reviewed-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

12 Aug, 2015

1 commit


10 Feb, 2015

1 commit


04 Mar, 2014

1 commit

  • Commit 55494bf2947dccdf2 ("dm snapshot: use dm-bufio") broke snapshots.
    Before that 3.14-rc1 commit, loading a snapshot's list of exceptions
    involved reading exception areas one by one into ps->area and inserting
    those exceptions into the hash table. Commit 55494bf2947dccdf2 changed
    it so that dm-bufio with prefetch is used to load exceptions in batchs.
    Exceptions are loaded correctly, but ps->area is left uninitialized.
    When a new exception is allocated, it is stored in this uninitialized
    ps->area which will be written to the disk. This causes metadata
    corruption.

    Fix this corruption by copying the last area that was read via dm-bufio
    into ps->area.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

15 Jan, 2014

3 commits

  • This patch modifies dm-snapshot so that it prefetches the buffers when
    loading the exceptions.

    The number of buffers read ahead is specified in the DM_PREFETCH_CHUNKS
    macro. The current value for DM_PREFETCH_CHUNKS (12) was found to
    provide the best performance on a single 15k SCSI spindle. In the
    future we may modify this default or make it configurable.

    Also, introduce the function dm_bufio_set_minimum_buffers to setup
    bufio's number of internal buffers before freeing happens. dm-bufio may
    hold more buffers if enough memory is available. There is no guarantee
    that the specified number of buffers will be available - if you need a
    guarantee, use the argument reserved_buffers for
    dm_bufio_client_create.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     
  • Use dm-bufio for initial loading of the exceptions.
    Introduce a new function dm_bufio_forget that frees the given buffer.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     
  • Change the functions get_exception, read_exception and insert_exceptions
    so that ps->area is passed as an argument.

    This patch doesn't change any functionality, but it refactors the code
    to allow for a cleaner switch over to using dm-bufio.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

07 Jan, 2014

1 commit


16 Oct, 2013

1 commit

  • This patch fixes a particular type of data corruption that has been
    encountered when loading a snapshot's metadata from disk.

    When we allocate a new chunk in persistent_prepare, we increment
    ps->next_free and we make sure that it doesn't point to a metadata area
    by further incrementing it if necessary.

    When we load metadata from disk on device activation, ps->next_free is
    positioned after the last used data chunk. However, if this last used
    data chunk is followed by a metadata area, ps->next_free is positioned
    erroneously to the metadata area. A newly-allocated chunk is placed at
    the same location as the metadata area, resulting in data or metadata
    corruption.

    This patch changes the code so that ps->next_free skips the metadata
    area when metadata are loaded in function read_exceptions.

    The patch also moves a piece of code from persistent_prepare_exception
    to a separate function skip_metadata to avoid code duplication.

    CVE-2013-4299

    Signed-off-by: Mikulas Patocka
    Cc: stable@vger.kernel.org
    Cc: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

20 Sep, 2013

1 commit

  • The kernel reports a lockdep warning if a snapshot is invalidated because
    it runs out of space.

    The lockdep warning was triggered by commit 0976dfc1d0cd80a4e9dfaf87bd87
    ("workqueue: Catch more locking problems with flush_work()") in v3.5.

    The warning is false positive. The real cause for the warning is that
    the lockdep engine treats different instances of md->lock as a single
    lock.

    This patch is a workaround - we use flush_workqueue instead of flush_work.
    This code path is not performance sensitive (it is called only on
    initialization or invalidation), thus it doesn't matter that we flush the
    whole workqueue.

    The real fix for the problem would be to teach the lockdep engine to treat
    different instances of md->lock as separate locks.

    Signed-off-by: Mikulas Patocka
    Acked-by: Alasdair G Kergon
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org # 3.5+

    Mikulas Patocka
     

01 Nov, 2011

1 commit


02 Aug, 2011

4 commits


29 May, 2011

1 commit

  • Replace the arbitrary calculation of an initial io struct mempool size
    with a constant.

    The code calculated the number of reserved structures based on the request
    size and used a "magic" multiplication constant of 4. This patch changes
    it to reserve a fixed number - itself still chosen quite arbitrarily.
    Further testing might show if there is a better number to choose.

    Note that if there is no memory pressure, we can still allocate an
    arbitrary number of "struct io" structures. One structure is enough to
    process the whole request.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

14 Jan, 2011

2 commits

  • metadata_wq serves on-stack work items from chunk_io(). Even if
    multiple chunk_io() are simultaneously in progress, each is
    independent and queued only once, so multithreaded workqueue can be
    safely used.

    Switch metadata_wq to multithread and flush the work item instead of
    the workqueue in chunk_io().

    Signed-off-by: Tejun Heo
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Tejun Heo
     
  • Convert all create[_singlethread]_work() users to the new
    alloc[_ordered]_workqueue(). This conversion is mechanical and
    doesn't introduce any behavior change.

    Signed-off-by: Tejun Heo
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Tejun Heo
     

27 Oct, 2010

1 commit

  • Silly though it is, completions and wait_queue_heads use foo_ONSTACK
    (COMPLETION_INITIALIZER_ONSTACK, DECLARE_COMPLETION_ONSTACK,
    __WAIT_QUEUE_HEAD_INIT_ONSTACK and DECLARE_WAIT_QUEUE_HEAD_ONSTACK) so I
    guess workqueues should do the same thing.

    s/INIT_WORK_ON_STACK/INIT_WORK_ONSTACK/
    s/INIT_DELAYED_WORK_ON_STACK/INIT_DELAYED_WORK_ONSTACK/

    Cc: Peter Zijlstra
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

10 Sep, 2010

1 commit

  • This patch converts bio-based dm to support REQ_FLUSH/FUA instead of
    now deprecated REQ_HARDBARRIER.

    * -EOPNOTSUPP handling logic dropped.

    * Preflush is handled as before but postflush is dropped and replaced
    with passing down REQ_FUA to member request_queues. This replaces
    one array wide cache flush w/ member specific FUA writes.

    * __split_and_process_bio() now calls __clone_and_map_flush() directly
    for flushes and guarantees all FLUSH bio's going to targets are zero
    ` length.

    * It's now guaranteed that all FLUSH bio's which are passed onto dm
    targets are zero length. bio_empty_barrier() tests are replaced
    with REQ_FLUSH tests.

    * Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes.

    * Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely
    enough to be marked with unlikely().

    * Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue
    doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA
    capability.

    * Request based dm isn't converted yet. dm_init_request_based_queue()
    resets flush support to 0 for now. To avoid disturbing request
    based dm code, dm->flush_error is added for bio based dm while
    requested based dm continues to use dm->barrier_error.

    Lightly tested linear, stripe, raid1, snap and crypt targets. Please
    proceed with caution as I'm not familiar with the code base.

    Signed-off-by: Tejun Heo
    Cc: dm-devel@redhat.com
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     

12 Aug, 2010

1 commit


17 Feb, 2010

1 commit

  • chunk_io() declares its 'struct mdata_req' on the stack and then
    initializes its 'struct work_struct' member. Annotate the
    initialization of this workqueue with INIT_WORK_ON_STACK to suppress a
    debugobjects warning seen when CONFIG_DEBUG_OBJECTS_WORK is enabled.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     

11 Dec, 2009

5 commits

  • Add functions that decide how many consecutive chunks of snapshot to
    merge back into the origin next and to update the metadata afterwards.

    prepare_merge provides a pointer to the most recent still-to-be-merged
    chunk and returns how many previous ones are consecutive and can be
    processed together.

    commit_merge removes the nr_merged most-recent chunks permanently from
    the exception store. The number must not exceed that returned by
    prepare_merge.

    Introduce NUM_SNAPSHOT_HDR_CHUNKS to show where the snapshot header
    chunk is accounted for.

    Signed-off-by: Mikulas Patocka
    Reviewed-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Store the reference to the snapshot cow device in the core snapshot
    code instead of each exception store. It can be accessed through the
    new function dm_snap_cow(). Exception stores should each now maintain a
    reference to their parent snapshot struct.

    This is cleaner and makes part of the forthcoming snapshot merge code simpler.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon
    Reviewed-by: Jonathan Brassow
    Cc: Mikulas Patocka

    Mike Snitzer
     
  • Add number of sectors used by metadata to the end of the snapshot's status
    line.

    Renamed dm_exception_store_type's 'fraction_full' to 'usage'. Renamed
    arguments to be clearer about what is being returned. Also added
    'metadata_sectors'.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     
  • The exception structure is not necessarily just a snapshot
    element (especially after we pull it out of dm-snap.c).

    Renaming appropriately.

    Signed-off-by: Jonathan Brassow
    Reviewed-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Jon Brassow
     
  • Minor code touch-up. We don't need the 'else'.

    Signed-off-by: Jonathan Brassow
    Reviewed-by: Mikulas Patocka
    Reviewed-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Jon Brassow
     

17 Oct, 2009

1 commit

  • Use unsigned integer chunk size.

    Maximum chunk size is 512kB, there won't ever be need to use 4GB chunk size,
    so the number can be 32-bit. This fixes compiler failure on 32-bit systems
    with large block devices.

    Cc: stable@kernel.org
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Reviewed-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

05 Sep, 2009

3 commits

  • Fix some problems seen in the chunk size processing when activating a
    pre-existing snapshot.

    For a new snapshot, the chunk size can either be supplied by the creator
    or a default value can be used. For an existing snapshot, the
    chunk size in the snapshot header on disk should always be used.

    If someone attempts to load an existing snapshot and has the 'default
    chunk size' option set, the kernel uses its default value even when it
    is incorrect for the snapshot being loaded. This patch ensures the
    correct on-disk value is always used.

    Secondly, when the code does use the chunk size stored on the disk it is
    prudent to revalidate it, so the code can exit cleanly if it got
    corrupted as happened in
    https://bugzilla.redhat.com/show_bug.cgi?id=461506 .

    Cc: stable@kernel.org
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • If a persistent snapshot fills up, a race can corrupt the on-disk header
    which causes a crash on any future attempt to activate the snapshot
    (typically while booting). This patch fixes the race.

    When the snapshot overflows, __invalidate_snapshot is called, which calls
    snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that
    calls write_header. write_header constructs the new header in the "area"
    location.

    Concurrently, an existing kcopyd job may finish, call copy_callback
    and commit_exception method, that goes to persistent_commit_exception.
    persistent_commit_exception doesn't do locking, relying on the fact that
    callbacks are single-threaded, but it can race with snapshot invalidation and
    overwrite the header that is just being written while the snapshot is being
    invalidated.

    The result of this race is a corrupted header being written that can
    lead to a crash on further reactivation (if chunk_size is zero in the
    corrupted header).

    The fix is to use separate memory areas for each.

    See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506

    Cc: stable@kernel.org
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Refactor chunk_io to prepare for the fix in the following patch.

    Pass an area pointer to chunk_io and simplify zero_disk_area to use
    chunk_io. No functional change.

    Cc: stable@kernel.org
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

22 Jun, 2009

1 commit


23 May, 2009

1 commit

  • Until now we have had a 1:1 mapping between storage device physical
    block size and the logical block sized used when addressing the device.
    With SATA 4KB drives coming out that will no longer be the case. The
    sector size will be 4KB but the logical block size will remain
    512-bytes. Hence we need to distinguish between the physical block size
    and the logical ditto.

    This patch renames hardsect_size to logical_block_size.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen