09 Jul, 2020
1 commit
-
Only triggering reclaim based on the percentage of unmapped cache
zones can fail to detect cases where reclaim is needed, e.g. if the
target has only 2 or 3 cache zones and only one unmapped cache zone,
the percentage of free cache zones is higher than
DMZ_RECLAIM_LOW_UNMAP_ZONES (30%) and reclaim does not trigger.This problem, combined with the fact that dmz_schedule_reclaim() is
called from dmz_handle_bio() without the map lock held, leads to a
race between zone allocation and dmz_should_reclaim() result.
Depending on the workload applied, this race can lead to the write
path waiting forever for a free zone without reclaim being triggered.Fix this by moving dmz_schedule_reclaim() inside dmz_alloc_zone()
under the map lock. This results in checking the need for zone reclaim
whenever a new data or buffer zone needs to be allocated.Also fix dmz_reclaim_percentage() to always return 0 if the number of
unmapped cache (or random) zones is less than or equal to 1.Suggested-by: Shin'ichiro Kawasaki
Signed-off-by: Damien Le Moal
Reviewed-by: Hannes Reinecke
Signed-off-by: Mike Snitzer
20 Jun, 2020
3 commits
-
When dm zoned has multiple devices, random zones are never selected for
reclaim if all reserved sequential write zones are in use and no
sequential write required zones can be selected for reclaim. This can
lead to deadlocks as selecting a cache zone allows reclaiming a
sequential zone, ensuring forward progress.Fix this by always defaulting to selecting a random zone when no
sequential write required zone can be selected.[Damien: fix commit message]
Signed-off-by: Shin'ichiro Kawasaki
Signed-off-by: Damien Le Moal
Reviewed-by: Hannes Reinecke
Signed-off-by: Mike Snitzer -
Commit 2094045fe5b5 ("dm zoned: prefer full zones for reclaim")
modified dmz_get_rnd_zone_for_reclaim() to add a search for the buffer
zone with the heaviest weight as an optimal candidate for reclaim. This
modification uses the zone pointer variabl "last" which is set only once
and never modified as zones are scanned, resulting in the search being
inefective. Furthermore, if the selected buffer zone at the end of the
search loop is active or already locked for reclaim,
dmz_get_rnd_zone_for_reclaim() returns NULL even if other random zones
with a lesser weight can be reclaimed.To fix the search and to guarantee that reclaim can make forward
progress, fix dmz_get_rnd_zone_for_reclaim() loop to correctly find
the buffer zone with the heaviest weight using the variable maxw_z.
Also make sure to fallback to finding the first random zone that can
be reclaimed if this best candidate zone cannot be reclaimed.While at it, also fix the device index check to consider only random
zones, ignoring cache zones belonging to the cache device if one is
used as that device does not have a reclaim process.Fixes: 2094045fe5b5 ("dm zoned: prefer full zones for reclaim")
Signed-off-by: Damien Le Moal
Reviewed-by: Hannes Reinecke
Signed-off-by: Mike Snitzer -
When dm zoned has multiple devices, metadata is on the cache device, not
in random zones of the zoned devices. Then the number of metadata zones
shall be checked with the number of cache zones, not random zones.Fixes: 34f5affd04c4 ("dm zoned: separate random and cache zones")
Signed-off-by: Shin'ichiro Kawasaki
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer
06 Jun, 2020
14 commits
-
…device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- The largest change for this cycle is the DM zoned target's metadata
version 2 feature that adds support for pairing regular block devices
with a zoned device to ease the performance impact associated with
finite random zones of zoned device.The changes came in three batches: the first prepared for and then
added the ability to pair a single regular block device, the second
was a batch of fixes to improve zoned's reclaim heuristic, and the
third removed the limitation of only adding a single additional
regular block device to allow many devices.Testing has shown linear scaling as more devices are added.
- Add new emulated block size (ebs) target that emulates a smaller
logical_block_size than a block device supportsThe primary use-case is to emulate "512e" devices that have 512 byte
logical_block_size and 4KB physical_block_size. This is useful to
some legacy applications that otherwise wouldn't be able to be used
on 4K devices because they depend on issuing IO in 512 byte
granularity.- Add discard interfaces to DM bufio. First consumer of the interface
is the dm-ebs target that makes heavy use of dm-bufio.- Fix DM crypt's block queue_limits stacking to not truncate
logic_block_size.- Add Documentation for DM integrity's status line.
- Switch DMDEBUG from a compile time config option to instead use
dynamic debug via pr_debug.- Fix DM multipath target's hueristic for how it manages
"queue_if_no_path" state internally.DM multipath now avoids disabling "queue_if_no_path" unless it is
actually needed (e.g. in response to configure timeout or explicit
"fail_if_no_path" message).This fixes reports of spurious -EIO being reported back to userspace
application during fault tolerance testing with an NVMe backend.
Added various dynamic DMDEBUG messages to assist with debugging
queue_if_no_path in the future.- Add a new DM multipath "Historical Service Time" Path Selector.
- Fix DM multipath's dm_blk_ioctl() to switch paths on IO error.
- Improve DM writecache target performance by using explicit cache
flushing for target's single-threaded usecase and a small cleanup to
remove unnecessary test in persistent_memory_claim.- Other small cleanups in DM core, dm-persistent-data, and DM
integrity.* tag 'for-5.8/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits)
dm crypt: avoid truncating the logical block size
dm mpath: add DM device name to Failing/Reinstating path log messages
dm mpath: enhance queue_if_no_path debugging
dm mpath: restrict queue_if_no_path state machine
dm mpath: simplify __must_push_back
dm zoned: check superblock location
dm zoned: prefer full zones for reclaim
dm zoned: select reclaim zone based on device index
dm zoned: allocate zone by device index
dm zoned: support arbitrary number of devices
dm zoned: move random and sequential zones into struct dmz_dev
dm zoned: per-device reclaim
dm zoned: add metadata pointer to struct dmz_dev
dm zoned: add device pointer to struct dm_zone
dm zoned: allocate temporary superblock for tertiary devices
dm zoned: convert to xarray
dm zoned: add a 'reserved' zone flag
dm zoned: improve logging messages for reclaim
dm zoned: avoid unnecessary device recalulation for secondary superblock
dm zoned: add debugging message for reading superblocks
... -
When specifying several devices the superblock location must be
checked to ensure the devices are specified in the correct order.Signed-off-by: Hannes Reinecke
Signed-off-by: Mike Snitzer -
Prefer full zones when selecting the next zone for reclaim.
Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
per-device reclaim should select zones on that device only.
Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
When allocating a zone, pass in an indicator on which device the zone
should be allocated; this increases performance for a multi-device
setup because reclaim will now allocate zones on the device for which
reclaim is running.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
Remove the hard-coded limit of two devices and support an unlimited
number of additional zoned devices.Signed-off-by: Hannes Reinecke
Signed-off-by: Mike Snitzer -
Random and sequential zones should be part of the respective
device structure to make arbitration between devices possible.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
Add a metadata pointer within struct dmz_dev and use it as argument
for blkdev_report_zones() instead of the metadata itself.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
Add a pointer, to the containing device, within struct dm_zone and
kill dmz_zone_to_dev().Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
Checking the tertiary superblock just consists of validating UUIDs,
crcs, and the generation number; it doesn't have contents which would
be required during the actual operation.So allocate a temporary superblock when checking tertiary devices to
avoid having to store it together with the 'real' superblocks.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
The zones array is getting really large, and large arrays tend to
wreak havoc with the CPU caches. So convert it to xarray to become
more cache friendly.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Colin Ian King # fix leak in dmz_insert
Signed-off-by: Mike Snitzer -
Instead of counting the number of reserved zones in dmz_free_zone(),
mark the zone as 'reserved' during allocation and simplify
dmz_free_zone().Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
The secondary superblock must reside on the same device as the primary
superblock, so there is no need to re-calculate the device.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer
23 May, 2020
1 commit
-
Remove a leftover hunk to switch from random zones to sequential
zones when selecting a reclaim zone; the logic has moved into the
caller and this hunk is now pointless.Fixes: 34f5affd04c4 ("dm zoned: separate random and cache zones")
Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer
22 May, 2020
1 commit
-
The argument isn't used by any caller, and drivers don't fill out
bi_sector for flush requests either.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
21 May, 2020
8 commits
-
When dmz_get_chunk_mapping() selects a zone which is under reclaim
we should terminate the reclaim copy process. Since we're changing
the zone itself, reclaim needs to run afterwards again anyway.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
When the system is idle we should be starting reclaiming
random zones, too.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
Instead of lumping emulated zones together with random zones we
should be handling them as separate 'cache' zones. This improves
code readability and allows an easier implementation of different
cache policies.Also add additional allocation flags, to separate the type (cache,
random, or sequential) from the purpose (eg reclaim).Also switch the allocation policy to not use random zones as buffer
zones if cache zones are present. This avoids a performance drop when
all cache zones are used.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
The only case where dmz_get_zone_for_reclaim() cannot return a zone is
if the respective lists are empty. So we should just return a simple
NULL value here as we really don't have an error code which would make
sense.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
Implement handling for metadata version 2. The new metadata adds a
label and UUID for the device mapper device, and additional UUID for
the underlying block devices.It also allows for an additional regular drive to be used for
emulating random access zones. The emulated zones will be placed
logically in front of the zones from the zoned block device, causing
the superblocks and metadata to be stored on that device.The first zone of the original zoned device will be used to hold
another, tertiary copy of the metadata; this copy carries a generation
number of 0 and is never updated; it's just used for identification.Signed-off-by: Hannes Reinecke
Reviewed-by: Bob Liu
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
When looking up zones in dmz_alloc_zone() we need to ignore
metadata zones so as not to accidentally overwrite metadata.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Reviewed-by: Bob Liu
Signed-off-by: Mike Snitzer -
dm-zoned is becoming quite chatty during startup; reduce the noise
by moving some information to 'debug' level.Suggested-by: Mike Snitzer
Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
Use the metadata label for logging and not the underlying
device.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Reviewed-by: Bob Liu
Signed-off-by: Mike Snitzer
20 May, 2020
1 commit
-
Use accessors to retrieve the device pointer in preparation
for adding an additional block device.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Reviewed-by: Bob Liu
Signed-off-by: Mike Snitzer
15 May, 2020
7 commits
-
Introduce accessors dmz_dev_is_dying() and dmz_check_dev() to
avoid having to reference the devices directly.Signed-off-by: Hannes Reinecke
Reviewed-by: Bob Liu
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
Introduce dmz_metadata_label() to format the device-mapper device
name and use it instead of the device name of the underlying device.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Reviewed-by: Bob Liu
Signed-off-by: Mike Snitzer -
Move fields from the device structure into the metadata structure
and provide accessor functions.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Reviewed-by: Bob Liu
Signed-off-by: Mike Snitzer -
Store the device together with the superblock so that
we don't have to recur to the metadata to find it.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Reviewed-by: Bob Liu
Signed-off-by: Mike Snitzer -
Instead of storing just the first superblock zone and calculate
the secondary relative to that we should be using an array for
holding the superblock zones.Signed-off-by: Hannes Reinecke
Reviewed-by: Damien Le Moal
Reviewed-by: Bob Liu
Signed-off-by: Mike Snitzer -
Instead of calculating the zone index by the offset within the
zone array store the index within the structure itself. With that
the helper dmz_id() is pointless and can be replaced with accessing
the ->id value directly.Signed-off-by: Hannes Reinecke
Reviewed-by: Bob Liu
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer -
Add callback to supply information for 'dmsetup status'
and 'dmsetup table'. The output for 'dmsetup status' is0 zoned zones / random / sequential
where is the number of unmapped (ie free) random zones,
the total number of random zones, the number
of unmapped sequential zones, and the total number of
sequential zones.Signed-off-by: Hannes Reinecke
Reviewed-by: Bob Liu
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer
25 Mar, 2020
1 commit
-
zmd->nr_rnd_zones was increased twice by mistake. The other place it
is increased in dmz_init_zone() is the only one needed:1131 zmd->nr_useable_zones++;
1132 if (dmz_is_rnd(zone)) {
1133 zmd->nr_rnd_zones++;
^^^
Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Bob Liu
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer
08 Jan, 2020
1 commit
-
dm-zoned is observed to log failed kernel assertions and not work
correctly when operating against a device with a zone size smaller
than 128MiB (e.g. 32768 bits per 4K block). The reason is that the
bitmap size per zone is calculated as zero with such a small zone
size. Fix this problem and also make the code related to zone bitmap
management be able to handle per zone bitmaps smaller than a single
block.A dm-zoned-tools patch is required to properly format dm-zoned devices
with zone sizes smaller than 128MiB.Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev
Reviewed-by: Damien Le Moal
Signed-off-by: Mike Snitzer
26 Nov, 2019
1 commit
-
…device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Fix DM core to disallow stacking request-based DM on partitions.
- Fix DM raid target to properly resync raidset even if bitmap needed
additional pages.- Fix DM crypt performance regression due to use of WQ_HIGHPRI for the
IO and crypt workqueues.- Fix DM integrity metadata layout that was aligned on 128K boundary
rather than the intended 4K boundary (removes 124K of wasted space
for each metadata block).- Improve the DM thin, cache and clone targets to use spin_lock_irq
rather than spin_lock_irqsave where possible.- Fix DM thin single thread performance that was lost due to needless
workqueue wakeups.- Fix DM zoned target performance that was lost due to excessive
backing device checks.- Add ability to trigger write failure with the DM dust test target.
- Fix whitespace indentation in drivers/md/Kconfig.
- Various smalls fixes and cleanups (e.g. use struct_size, fix
uninitialized variable, variable renames, etc).* tag 'for-5.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues"
dm: Fix Kconfig indentation
dm thin: wakeup worker only when deferred bios exist
dm integrity: fix excessive alignment of metadata runs
dm raid: Remove unnecessary negation of a shift in raid10_format_to_md_layout
dm zoned: reduce overhead of backing device checks
dm dust: add limited write failure mode
dm dust: change ret to r in dust_map_read and dust_map
dm dust: change result vars to r
dm cache: replace spin_lock_irqsave with spin_lock_irq
dm bio prison: replace spin_lock_irqsave with spin_lock_irq
dm thin: replace spin_lock_irqsave with spin_lock_irq
dm clone: add bucket_lock_irq/bucket_unlock_irq helpers
dm clone: replace spin_lock_irqsave with spin_lock_irq
dm writecache: handle REQ_FUA
dm writecache: fix uninitialized variable warning
dm stripe: use struct_size() in kmalloc()
dm raid: streamline rs_get_progress() and its raid_status() caller side
dm raid: simplify rs_setup_recovery call chain
dm raid: to ensure resynchronization, perform raid set grow in preresume
...
13 Nov, 2019
1 commit
-
Avoid the need to allocate a potentially large array of struct blk_zone
in the block layer by switching the ->report_zones method interface to
a callback model. Now the caller simply supplies a callback that is
executed on each reported zone, and private data for it.Signed-off-by: Christoph Hellwig
Signed-off-by: Shin'ichiro Kawasaki
Signed-off-by: Damien Le Moal
Reviewed-by: Hannes Reinecke
Reviewed-by: Mike Snitzer
Signed-off-by: Jens Axboe