09 Jul, 2020

1 commit

  • Only triggering reclaim based on the percentage of unmapped cache
    zones can fail to detect cases where reclaim is needed, e.g. if the
    target has only 2 or 3 cache zones and only one unmapped cache zone,
    the percentage of free cache zones is higher than
    DMZ_RECLAIM_LOW_UNMAP_ZONES (30%) and reclaim does not trigger.

    This problem, combined with the fact that dmz_schedule_reclaim() is
    called from dmz_handle_bio() without the map lock held, leads to a
    race between zone allocation and dmz_should_reclaim() result.
    Depending on the workload applied, this race can lead to the write
    path waiting forever for a free zone without reclaim being triggered.

    Fix this by moving dmz_schedule_reclaim() inside dmz_alloc_zone()
    under the map lock. This results in checking the need for zone reclaim
    whenever a new data or buffer zone needs to be allocated.

    Also fix dmz_reclaim_percentage() to always return 0 if the number of
    unmapped cache (or random) zones is less than or equal to 1.

    Suggested-by: Shin'ichiro Kawasaki
    Signed-off-by: Damien Le Moal
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Mike Snitzer

    Damien Le Moal
     

20 Jun, 2020

3 commits

  • When dm zoned has multiple devices, random zones are never selected for
    reclaim if all reserved sequential write zones are in use and no
    sequential write required zones can be selected for reclaim. This can
    lead to deadlocks as selecting a cache zone allows reclaiming a
    sequential zone, ensuring forward progress.

    Fix this by always defaulting to selecting a random zone when no
    sequential write required zone can be selected.

    [Damien: fix commit message]

    Signed-off-by: Shin'ichiro Kawasaki
    Signed-off-by: Damien Le Moal
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Mike Snitzer

    Shin'ichiro Kawasaki
     
  • Commit 2094045fe5b5 ("dm zoned: prefer full zones for reclaim")
    modified dmz_get_rnd_zone_for_reclaim() to add a search for the buffer
    zone with the heaviest weight as an optimal candidate for reclaim. This
    modification uses the zone pointer variabl "last" which is set only once
    and never modified as zones are scanned, resulting in the search being
    inefective. Furthermore, if the selected buffer zone at the end of the
    search loop is active or already locked for reclaim,
    dmz_get_rnd_zone_for_reclaim() returns NULL even if other random zones
    with a lesser weight can be reclaimed.

    To fix the search and to guarantee that reclaim can make forward
    progress, fix dmz_get_rnd_zone_for_reclaim() loop to correctly find
    the buffer zone with the heaviest weight using the variable maxw_z.
    Also make sure to fallback to finding the first random zone that can
    be reclaimed if this best candidate zone cannot be reclaimed.

    While at it, also fix the device index check to consider only random
    zones, ignoring cache zones belonging to the cache device if one is
    used as that device does not have a reclaim process.

    Fixes: 2094045fe5b5 ("dm zoned: prefer full zones for reclaim")
    Signed-off-by: Damien Le Moal
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Mike Snitzer

    Damien Le Moal
     
  • When dm zoned has multiple devices, metadata is on the cache device, not
    in random zones of the zoned devices. Then the number of metadata zones
    shall be checked with the number of cache zones, not random zones.

    Fixes: 34f5affd04c4 ("dm zoned: separate random and cache zones")
    Signed-off-by: Shin'ichiro Kawasaki
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Shin'ichiro Kawasaki
     

06 Jun, 2020

14 commits

  • …device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - The largest change for this cycle is the DM zoned target's metadata
    version 2 feature that adds support for pairing regular block devices
    with a zoned device to ease the performance impact associated with
    finite random zones of zoned device.

    The changes came in three batches: the first prepared for and then
    added the ability to pair a single regular block device, the second
    was a batch of fixes to improve zoned's reclaim heuristic, and the
    third removed the limitation of only adding a single additional
    regular block device to allow many devices.

    Testing has shown linear scaling as more devices are added.

    - Add new emulated block size (ebs) target that emulates a smaller
    logical_block_size than a block device supports

    The primary use-case is to emulate "512e" devices that have 512 byte
    logical_block_size and 4KB physical_block_size. This is useful to
    some legacy applications that otherwise wouldn't be able to be used
    on 4K devices because they depend on issuing IO in 512 byte
    granularity.

    - Add discard interfaces to DM bufio. First consumer of the interface
    is the dm-ebs target that makes heavy use of dm-bufio.

    - Fix DM crypt's block queue_limits stacking to not truncate
    logic_block_size.

    - Add Documentation for DM integrity's status line.

    - Switch DMDEBUG from a compile time config option to instead use
    dynamic debug via pr_debug.

    - Fix DM multipath target's hueristic for how it manages
    "queue_if_no_path" state internally.

    DM multipath now avoids disabling "queue_if_no_path" unless it is
    actually needed (e.g. in response to configure timeout or explicit
    "fail_if_no_path" message).

    This fixes reports of spurious -EIO being reported back to userspace
    application during fault tolerance testing with an NVMe backend.
    Added various dynamic DMDEBUG messages to assist with debugging
    queue_if_no_path in the future.

    - Add a new DM multipath "Historical Service Time" Path Selector.

    - Fix DM multipath's dm_blk_ioctl() to switch paths on IO error.

    - Improve DM writecache target performance by using explicit cache
    flushing for target's single-threaded usecase and a small cleanup to
    remove unnecessary test in persistent_memory_claim.

    - Other small cleanups in DM core, dm-persistent-data, and DM
    integrity.

    * tag 'for-5.8/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits)
    dm crypt: avoid truncating the logical block size
    dm mpath: add DM device name to Failing/Reinstating path log messages
    dm mpath: enhance queue_if_no_path debugging
    dm mpath: restrict queue_if_no_path state machine
    dm mpath: simplify __must_push_back
    dm zoned: check superblock location
    dm zoned: prefer full zones for reclaim
    dm zoned: select reclaim zone based on device index
    dm zoned: allocate zone by device index
    dm zoned: support arbitrary number of devices
    dm zoned: move random and sequential zones into struct dmz_dev
    dm zoned: per-device reclaim
    dm zoned: add metadata pointer to struct dmz_dev
    dm zoned: add device pointer to struct dm_zone
    dm zoned: allocate temporary superblock for tertiary devices
    dm zoned: convert to xarray
    dm zoned: add a 'reserved' zone flag
    dm zoned: improve logging messages for reclaim
    dm zoned: avoid unnecessary device recalulation for secondary superblock
    dm zoned: add debugging message for reading superblocks
    ...

    Linus Torvalds
     
  • When specifying several devices the superblock location must be
    checked to ensure the devices are specified in the correct order.

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • Prefer full zones when selecting the next zone for reclaim.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • per-device reclaim should select zones on that device only.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • When allocating a zone, pass in an indicator on which device the zone
    should be allocated; this increases performance for a multi-device
    setup because reclaim will now allocate zones on the device for which
    reclaim is running.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • Remove the hard-coded limit of two devices and support an unlimited
    number of additional zoned devices.

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • Random and sequential zones should be part of the respective
    device structure to make arbitration between devices possible.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • Add a metadata pointer within struct dmz_dev and use it as argument
    for blkdev_report_zones() instead of the metadata itself.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • Add a pointer, to the containing device, within struct dm_zone and
    kill dmz_zone_to_dev().

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • Checking the tertiary superblock just consists of validating UUIDs,
    crcs, and the generation number; it doesn't have contents which would
    be required during the actual operation.

    So allocate a temporary superblock when checking tertiary devices to
    avoid having to store it together with the 'real' superblocks.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • The zones array is getting really large, and large arrays tend to
    wreak havoc with the CPU caches. So convert it to xarray to become
    more cache friendly.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Colin Ian King # fix leak in dmz_insert
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • Instead of counting the number of reserved zones in dmz_free_zone(),
    mark the zone as 'reserved' during allocation and simplify
    dmz_free_zone().

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • The secondary superblock must reside on the same device as the primary
    superblock, so there is no need to re-calculate the device.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     

23 May, 2020

1 commit


22 May, 2020

1 commit


21 May, 2020

8 commits

  • When dmz_get_chunk_mapping() selects a zone which is under reclaim
    we should terminate the reclaim copy process. Since we're changing
    the zone itself, reclaim needs to run afterwards again anyway.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • When the system is idle we should be starting reclaiming
    random zones, too.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • Instead of lumping emulated zones together with random zones we
    should be handling them as separate 'cache' zones. This improves
    code readability and allows an easier implementation of different
    cache policies.

    Also add additional allocation flags, to separate the type (cache,
    random, or sequential) from the purpose (eg reclaim).

    Also switch the allocation policy to not use random zones as buffer
    zones if cache zones are present. This avoids a performance drop when
    all cache zones are used.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • The only case where dmz_get_zone_for_reclaim() cannot return a zone is
    if the respective lists are empty. So we should just return a simple
    NULL value here as we really don't have an error code which would make
    sense.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • Implement handling for metadata version 2. The new metadata adds a
    label and UUID for the device mapper device, and additional UUID for
    the underlying block devices.

    It also allows for an additional regular drive to be used for
    emulating random access zones. The emulated zones will be placed
    logically in front of the zones from the zoned block device, causing
    the superblocks and metadata to be stored on that device.

    The first zone of the original zoned device will be used to hold
    another, tertiary copy of the metadata; this copy carries a generation
    number of 0 and is never updated; it's just used for identification.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Bob Liu
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • When looking up zones in dmz_alloc_zone() we need to ignore
    metadata zones so as not to accidentally overwrite metadata.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Reviewed-by: Bob Liu
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • dm-zoned is becoming quite chatty during startup; reduce the noise
    by moving some information to 'debug' level.

    Suggested-by: Mike Snitzer
    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     
  • Use the metadata label for logging and not the underlying
    device.

    Signed-off-by: Hannes Reinecke
    Reviewed-by: Damien Le Moal
    Reviewed-by: Bob Liu
    Signed-off-by: Mike Snitzer

    Hannes Reinecke
     

20 May, 2020

1 commit


15 May, 2020

7 commits


25 Mar, 2020

1 commit

  • zmd->nr_rnd_zones was increased twice by mistake. The other place it
    is increased in dmz_init_zone() is the only one needed:

    1131 zmd->nr_useable_zones++;
    1132 if (dmz_is_rnd(zone)) {
    1133 zmd->nr_rnd_zones++;
    ^^^
    Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
    Cc: stable@vger.kernel.org
    Signed-off-by: Bob Liu
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Bob Liu
     

08 Jan, 2020

1 commit

  • dm-zoned is observed to log failed kernel assertions and not work
    correctly when operating against a device with a zone size smaller
    than 128MiB (e.g. 32768 bits per 4K block). The reason is that the
    bitmap size per zone is calculated as zero with such a small zone
    size. Fix this problem and also make the code related to zone bitmap
    management be able to handle per zone bitmaps smaller than a single
    block.

    A dm-zoned-tools patch is required to properly format dm-zoned devices
    with zone sizes smaller than 128MiB.

    Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
    Cc: stable@vger.kernel.org
    Signed-off-by: Dmitry Fomichev
    Reviewed-by: Damien Le Moal
    Signed-off-by: Mike Snitzer

    Dmitry Fomichev
     

26 Nov, 2019

1 commit

  • …device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - Fix DM core to disallow stacking request-based DM on partitions.

    - Fix DM raid target to properly resync raidset even if bitmap needed
    additional pages.

    - Fix DM crypt performance regression due to use of WQ_HIGHPRI for the
    IO and crypt workqueues.

    - Fix DM integrity metadata layout that was aligned on 128K boundary
    rather than the intended 4K boundary (removes 124K of wasted space
    for each metadata block).

    - Improve the DM thin, cache and clone targets to use spin_lock_irq
    rather than spin_lock_irqsave where possible.

    - Fix DM thin single thread performance that was lost due to needless
    workqueue wakeups.

    - Fix DM zoned target performance that was lost due to excessive
    backing device checks.

    - Add ability to trigger write failure with the DM dust test target.

    - Fix whitespace indentation in drivers/md/Kconfig.

    - Various smalls fixes and cleanups (e.g. use struct_size, fix
    uninitialized variable, variable renames, etc).

    * tag 'for-5.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
    Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues"
    dm: Fix Kconfig indentation
    dm thin: wakeup worker only when deferred bios exist
    dm integrity: fix excessive alignment of metadata runs
    dm raid: Remove unnecessary negation of a shift in raid10_format_to_md_layout
    dm zoned: reduce overhead of backing device checks
    dm dust: add limited write failure mode
    dm dust: change ret to r in dust_map_read and dust_map
    dm dust: change result vars to r
    dm cache: replace spin_lock_irqsave with spin_lock_irq
    dm bio prison: replace spin_lock_irqsave with spin_lock_irq
    dm thin: replace spin_lock_irqsave with spin_lock_irq
    dm clone: add bucket_lock_irq/bucket_unlock_irq helpers
    dm clone: replace spin_lock_irqsave with spin_lock_irq
    dm writecache: handle REQ_FUA
    dm writecache: fix uninitialized variable warning
    dm stripe: use struct_size() in kmalloc()
    dm raid: streamline rs_get_progress() and its raid_status() caller side
    dm raid: simplify rs_setup_recovery call chain
    dm raid: to ensure resynchronization, perform raid set grow in preresume
    ...

    Linus Torvalds
     

13 Nov, 2019

1 commit

  • Avoid the need to allocate a potentially large array of struct blk_zone
    in the block layer by switching the ->report_zones method interface to
    a callback model. Now the caller simply supplies a callback that is
    executed on each reported zone, and private data for it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Shin'ichiro Kawasaki
    Signed-off-by: Damien Le Moal
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Christoph Hellwig