19 Aug, 2018

1 commit

  • Pull input updates from Dmitry Torokhov:

    - a new driver for Rohm BU21029 touch controller

    - new bitmap APIs: bitmap_alloc, bitmap_zalloc and bitmap_free

    - updates to Atmel, eeti. pxrc and iforce drivers

    - assorted driver cleanups and fixes.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (57 commits)
    MAINTAINERS: Add PhoenixRC Flight Controller Adapter
    Input: do not use WARN() in input_alloc_absinfo()
    Input: mark expected switch fall-throughs
    Input: raydium_i2c_ts - use true and false for boolean values
    Input: evdev - switch to bitmap API
    Input: gpio-keys - switch to bitmap_zalloc()
    Input: elan_i2c_smbus - cast sizeof to int for comparison
    bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free()
    md: Avoid namespace collision with bitmap API
    dm: Avoid namespace collision with bitmap API
    Input: pm8941-pwrkey - add resin entry
    Input: pm8941-pwrkey - abstract register offsets and event code
    Input: iforce - reorganize joystick configuration lists
    Input: atmel_mxt_ts - move completion to after config crc is updated
    Input: atmel_mxt_ts - don't report zero pressure from T9
    Input: atmel_mxt_ts - zero terminate config firmware file
    Input: atmel_mxt_ts - refactor config update code to add context struct
    Input: atmel_mxt_ts - config CRC may start at T71
    Input: atmel_mxt_ts - remove unnecessary debug on ENOMEM
    Input: atmel_mxt_ts - remove duplicate setup of ABS_MT_PRESSURE
    ...

    Linus Torvalds
     

18 Aug, 2018

1 commit

  • …/device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - A couple stable fixes for the DM writecache target.

    - A stable fix for the DM cache target that fixes the potential for
    data corruption after an unclean shutdown of a cache device using
    writeback mode.

    - Update DM integrity target to allow the metadata to be stored on a
    separate device from data.

    - Fix DM kcopyd and the snapshot target to cond_resched() where
    appropriate and be more efficient with processing completed work.

    - A few fixes and improvements for DM crypt.

    - Add DM delay target feature to configure delay of flushes independent
    of writes.

    - Update DM thin-provisioning target to include metadata_low_watermark
    threshold in pool status.

    - Fix stale DM thin-provisioning Documentation.

    * tag 'for-4.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
    dm writecache: fix a crash due to reading past end of dirty_bitmap
    dm crypt: don't decrease device limits
    dm cache metadata: set dirty on all cache blocks after a crash
    dm snapshot: remove stale FIXME in snapshot_map()
    dm snapshot: improve performance by switching out_of_order_list to rbtree
    dm kcopyd: avoid softlockup in run_complete_job
    dm cache metadata: save in-core policy_hint_size to on-disk superblock
    dm thin: stop no_space_timeout worker when switching to write-mode
    dm kcopyd: return void from dm_kcopyd_copy()
    dm thin: include metadata_low_watermark threshold in pool status
    dm writecache: report start_sector in status line
    dm crypt: convert essiv from ahash to shash
    dm crypt: use wake_up_process() instead of a wait queue
    dm integrity: recalculate checksums on creation
    dm integrity: flush journal on suspend when using separate metadata device
    dm integrity: use version 2 for separate metadata
    dm integrity: allow separate metadata device
    dm integrity: add ic->start in get_data_sector()
    dm integrity: report provided data sectors in the status
    dm integrity: implement fair range locks
    ...

    Linus Torvalds
     

17 Aug, 2018

1 commit

  • wc->dirty_bitmap_size is in bytes so must multiply it by 8, not by
    BITS_PER_LONG, to get number of bitmap_bits.

    Fixes crash in find_next_bit() that was reported:
    https://bugzilla.kernel.org/show_bug.cgi?id=200819

    Reported-by: edo.rus@gmail.com
    Fixes: 48debafe4f2f ("dm: add writecache target")
    Cc: stable@vger.kernel.org # 4.18
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

15 Aug, 2018

1 commit

  • Pull MD updates from Shaohua Li:
    "A few MD fixes for 4.19-rc1:

    - several md-cluster fixes from Guoqing

    - a data corruption fix from BingJing

    - other cleanups"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
    md/raid5: fix data corruption of replacements after originals dropped
    drivers/md/raid5: Do not disable irq on release_inactive_stripe_list() call
    drivers/md/raid5: Use irqsave variant of atomic_dec_and_lock()
    md/r5cache: remove redundant pointer bio
    md-cluster: don't send msg if array is closing
    md-cluster: show array's status more accurate
    md-cluster: clear another node's suspend_area after the copy is finished

    Linus Torvalds
     

14 Aug, 2018

1 commit

  • dm-crypt should only increase device limits, it should not decrease them.

    This fixes a bug where the user could creates a crypt device with 1024
    sector size on the top of scsi device that had 4096 logical block size.
    The limit 4096 would be lost and the user could incorrectly send
    1024-I/Os to the crypt device.

    Cc: stable@vger.kernel.org
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

11 Aug, 2018

1 commit

  • Commit ea8c5356d390 ("bcache: set max writeback rate when I/O request
    is idle") changes struct bch_ratelimit member rate from uint32_t to
    atomic_long_t and uses atomic_long_set() in drivers/md/bcache/sysfs.c
    to set new writeback rate, after the input is converted from memory
    buf to long int by sysfs_strtoul_clamp().

    The above change has a problem because there is an implicit return
    inside sysfs_strtoul_clamp() so the following atomic_long_set()
    won't be called. This error is detected by 0day system with following
    snipped smatch warnings:

    drivers/md/bcache/sysfs.c:271 __cached_dev_store() error: uninitialized
    symbol 'v'.
    270 sysfs_strtoul_clamp(writeback_rate, v, 1, INT_MAX);
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    @271 atomic_long_set(&dc->writeback_rate.rate, v);

    This patch fixes the above error by using strtoul_safe_clamp() to
    convert the input buffer into a long int type result.

    Fixes: ea8c5356d390 ("bcache: set max writeback rate when I/O request is idle")
    Cc: Kai Krakow
    Cc: Stefan Priebe
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Coly Li
     

10 Aug, 2018

1 commit

  • Quoting Documentation/device-mapper/cache.txt:

    The 'dirty' state for a cache block changes far too frequently for us
    to keep updating it on the fly. So we treat it as a hint. In normal
    operation it will be written when the dm device is suspended. If the
    system crashes all cache blocks will be assumed dirty when restarted.

    This got broken in commit f177940a8091 ("dm cache metadata: switch to
    using the new cursor api for loading metadata") in 4.9, which removed
    the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk
    flag) when loading cache blocks. This results in data corruption on an
    unclean shutdown with dirty cache blocks on the fast device. After the
    crash those blocks are considered clean and may get evicted from the
    cache at any time. This can be demonstrated by doing a lot of reads
    to trigger individual evictions, but uncache is more predictable:

    ### Disable auto-activation in lvm.conf to be able to do uncache in
    ### time (i.e. see uncache doing flushing) when the fix is applied.

    # xfs_io -d -c 'pwrite -b 4M -S 0xaa 0 1G' /dev/vdb
    # vgcreate vg_cache /dev/vdb /dev/vdc
    # lvcreate -L 1G -n lv_slowdev vg_cache /dev/vdb
    # lvcreate -L 512M -n lv_cachedev vg_cache /dev/vdc
    # lvcreate -L 256M -n lv_metadev vg_cache /dev/vdc
    # lvconvert --type cache-pool --cachemode writeback vg_cache/lv_cachedev --poolmetadata vg_cache/lv_metadev
    # lvconvert --type cache vg_cache/lv_slowdev --cachepool vg_cache/lv_cachedev
    # xfs_io -d -c 'pwrite -b 4M -S 0xbb 0 512M' /dev/mapper/vg_cache-lv_slowdev
    # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
    0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
    0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
    # dmsetup status vg_cache-lv_slowdev
    0 2097152 cache 8 27/65536 128 8192/8192 1 100 0 0 0 8192 7065 2 metadata2 writeback 2 migration_threshold 2048 smq 0 rw -
    ^^^^
    7065 * 64k = 441M yet to be written to the slow device
    # echo b >/proc/sysrq-trigger

    # vgchange -ay vg_cache
    # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
    0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
    0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
    # lvconvert --uncache vg_cache/lv_slowdev
    Flushing 0 blocks for cache vg_cache/lv_slowdev.
    Logical volume "lv_cachedev" successfully removed
    Logical volume vg_cache/lv_slowdev is not cached.
    # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
    0fe00000: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................
    0fe00010: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................

    This is the case with both v1 and v2 cache pool metatata formats.

    After applying this patch:

    # vgchange -ay vg_cache
    # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
    0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
    0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
    # lvconvert --uncache vg_cache/lv_slowdev
    Flushing 3724 blocks for cache vg_cache/lv_slowdev.
    ...
    Flushing 71 blocks for cache vg_cache/lv_slowdev.
    Logical volume "lv_cachedev" successfully removed
    Logical volume vg_cache/lv_slowdev is not cached.
    # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
    0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
    0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................

    Cc: stable@vger.kernel.org
    Fixes: f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata")
    Signed-off-by: Ilya Dryomov
    Signed-off-by: Mike Snitzer

    Ilya Dryomov
     

09 Aug, 2018

11 commits

  • Remove the tailing backslash in macro BTREE_FLAG in btree.h

    Signed-off-by: Shenghui Wang
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Shenghui Wang
     
  • The pr_err statement in the code for sysfs_attatch section would run
    for various error codes, which maybe confusing.

    E.g,

    Run the command twice:
    echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
    /sys/block/bcache0/bcache/attach
    [the backing dev got attached on the first run]
    echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
    /sys/block/bcache0/bcache/attach

    In dmesg, after the command run twice, we can get:
    bcache: bch_cached_dev_attach() Can't attach sda6: already attached
    bcache: __cached_dev_store() Can't attach 796b5c05-b03c-4bc7-9cbd-\
    a8df5e8be891
    : cache set not found
    The first statement in the message was right, but the second was
    confusing.

    bch_cached_dev_attach has various pr_ statements for various error
    codes, except ENOENT.

    After the change, rerun above command twice:
    echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
    /sys/block/bcache0/bcache/attach
    echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
    /sys/block/bcache0/bcache/attach

    In dmesg we only got:
    bcache: bch_cached_dev_attach() Can't attach sda6: already attached
    No confusing "cache set not found" message anymore.

    And for some not exist SET-UUID:
    echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be898 > \
    /sys/block/bcache0/bcache/attach
    In dmesg we can get:
    bcache: __cached_dev_store() Can't attach 796b5c05-b03c-4bc7-9cbd-\
    a8df5e8be898
    : cache set not found

    Signed-off-by: Shenghui Wang
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Shenghui Wang
     
  • Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
    allows the writeback rate to be faster if there is no I/O request on a
    bcache device. It works well if there is only one bcache device attached
    to the cache set. If there are many bcache devices attached to a cache
    set, it may introduce performance regression because multiple faster
    writeback threads of the idle bcache devices will compete the btree level
    locks with the bcache device who have I/O requests coming.

    This patch fixes the above issue by only permitting fast writebac when
    all bcache devices attached on the cache set are idle. And if one of the
    bcache devices has new I/O request coming, minimized all writeback
    throughput immediately and let PI controller __update_writeback_rate()
    to decide the upcoming writeback rate for each bcache device.

    Also when all bcache devices are idle, limited wrieback rate to a small
    number is wast of thoughput, especially when backing devices are slower
    non-rotation devices (e.g. SATA SSD). This patch sets a max writeback
    rate for each backing device if the whole cache set is idle. A faster
    writeback rate in idle time means new I/Os may have more available space
    for dirty data, and people may observe a better write performance then.

    Please note bcache may change its cache mode in run time, and this patch
    still works if the cache mode is switched from writeback mode and there
    is still dirty data on cache.

    Fixes: Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
    Cc: stable@vger.kernel.org #4.16+
    Signed-off-by: Coly Li
    Tested-by: Kai Krakow
    Tested-by: Stefan Priebe
    Cc: Michael Lyle
    Signed-off-by: Jens Axboe

    Coly Li
     
  • This patch tries to add code comments in bset.c, to make some
    tricky code and designment to be more comprehensible. Most information
    of this patch comes from the discussion between Kent and I, he
    offers very informative details. If there is any mistake
    of the idea behind the code, no doubt that's from me misrepresentation.

    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Coly Li
     
  • This patch updates code comment in bch_keylist_realloc() by fixing
    incorrected function names, to make the code to be more comprehennsible.

    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Coly Li
     
  • This patch updates the code comment in struct cache with correct array
    names, to make the code to be more comprehensible.

    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Coly Li
     
  • This patch adds a line of code comment in super.c:register_bdev(), to
    make code to be more comprehensible.

    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Coly Li
     
  • In bch_btree_node_get() the read-in btree node will be partially
    prefetched into L1 cache for following bset iteration (if there is).
    But if the btree node read is failed, the perfetch operations will
    waste L1 cache space. This patch checkes whether read operation and
    only does cache prefetch when read I/O succeeded.

    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Coly Li
     
  • When writeback is not running, writeback rate should be 0, other value is
    misleading. And the following dyanmic writeback rate debug parameters
    should be 0 too,
    rate, proportional, integral, change
    otherwise they are misleading when writeback is not running.

    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Coly Li
     
  • Greg KH suggests that normal code should not care about debugfs. Therefore
    no matter successful or failed of debugfs_create_dir() execution, it is
    unncessary to check its return value.

    There are two functions called debugfs_create_dir() and check the return
    value, which are bch_debug_init() and closure_debug_init(). This patch
    changes these two functions from int to void type, and ignore return values
    of debugfs_create_dir().

    This patch does not fix exact bug, just makes things work as they should.

    Signed-off-by: Coly Li
    Suggested-by: Greg Kroah-Hartman
    Cc: stable@vger.kernel.org
    Cc: Kai Krakow
    Cc: Kent Overstreet
    Signed-off-by: Jens Axboe

    Coly Li
     
  • Commit ae1093be ("dm snapshot: use mutex instead of rw_semaphore")
    eliminated the need to worry about read vs write locking. So remove a
    FIXME in snapshot_map() that is concerned about selectively taking a
    write lock.

    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

08 Aug, 2018

4 commits

  • copy_complete()'s processing of out_of_order_list can result in
    quadratic complexity in the worst case. As such it was the source of
    consuming too much cpu and the source of significant loss in
    performance.

    Fix this by converting out_of_order_list to an rbtree. This improved
    a dm-snapshot test copy workload from 32 seconds to 4 seconds.

    Signed-off-by: David Jeffery
    Signed-off-by: Mikulas Patocka
    Tested-by: Brett Hull
    Signed-off-by: Mike Snitzer

    David Jeffery
     
  • It was reported that softlockups occur when using dm-snapshot ontop of
    slow (rbd) storage. E.g.:

    [ 4047.990647] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [kworker/10:23:26177]
    ...
    [ 4048.034151] Workqueue: kcopyd do_work [dm_mod]
    [ 4048.034156] RIP: 0010:copy_callback+0x41/0x160 [dm_snapshot]
    ...
    [ 4048.034190] Call Trace:
    [ 4048.034196] ? __chunk_is_tracked+0x70/0x70 [dm_snapshot]
    [ 4048.034200] run_complete_job+0x5f/0xb0 [dm_mod]
    [ 4048.034205] process_jobs+0x91/0x220 [dm_mod]
    [ 4048.034210] ? kcopyd_put_pages+0x40/0x40 [dm_mod]
    [ 4048.034214] do_work+0x46/0xa0 [dm_mod]
    [ 4048.034219] process_one_work+0x171/0x370
    [ 4048.034221] worker_thread+0x1fc/0x3f0
    [ 4048.034224] kthread+0xf8/0x130
    [ 4048.034226] ? max_active_store+0x80/0x80
    [ 4048.034227] ? kthread_bind+0x10/0x10
    [ 4048.034231] ret_from_fork+0x35/0x40
    [ 4048.034233] Kernel panic - not syncing: softlockup: hung tasks

    Fix this by calling cond_resched() after run_complete_job()'s callout to
    the dm_kcopyd_notify_fn (which is dm-snap.c:copy_callback in the above
    trace).

    Signed-off-by: John Pittman
    Signed-off-by: Mike Snitzer

    John Pittman
     
  • policy_hint_size starts as 0 during __write_initial_superblock(). It
    isn't until the policy is loaded that policy_hint_size is set in-core
    (cmd->policy_hint_size). But it never got recorded in the on-disk
    superblock because __commit_transaction() didn't deal with transfering
    the in-core cmd->policy_hint_size to the on-disk superblock.

    The in-core cmd->policy_hint_size gets initialized by metadata_open()'s
    __begin_transaction_flags() which re-reads all superblock fields.
    Because the superblock's policy_hint_size was never properly stored, when
    the cache was created, hints_array_available() would always return false
    when re-activating a previously created cache. This means
    __load_mappings() always considered the hints invalid and never made use
    of the hints (these hints served to optimize).

    Another detremental side-effect of this oversight is the cache_check
    utility would fail with: "invalid hint width: 0"

    Cc: stable@vger.kernel.org
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     
  • Now both check_for_space() and do_no_space_timeout() will read & write
    pool->pf.error_if_no_space. If these functions run concurrently, as
    shown in the following case, the default setting of "queue_if_no_space"
    can get lost.

    precondition:
    * error_if_no_space = false (aka "queue_if_no_space")
    * pool is in Out-of-Data-Space (OODS) mode
    * no_space_timeout worker has been queued

    CPU 0: CPU 1:
    // delete a thin device
    process_delete_mesg()
    // check_for_space() invoked by commit()
    set_pool_mode(pool, PM_WRITE)
    pool->pf.error_if_no_space = \
    pt->requested_pf.error_if_no_space

    // timeout, pool is still in OODS mode
    do_no_space_timeout
    // "queue_if_no_space" config is lost
    pool->pf.error_if_no_space = true
    pool->pf.mode = new_mode

    Fix it by stopping no_space_timeout worker when switching to write mode.

    Fixes: bcc696fac11f ("dm thin: stay in out-of-data-space mode once no_space_timeout expires")
    Cc: stable@vger.kernel.org
    Signed-off-by: Hou Tao
    Signed-off-by: Mike Snitzer

    Hou Tao
     

06 Aug, 2018

1 commit


03 Aug, 2018

1 commit

  • During raid5 replacement, the stripes can be marked with R5_NeedReplace
    flag. Data can be read from being-replaced devices and written to
    replacing spares without reading all other devices. (It's 'replace'
    mode. s.replacing = 1) If a being-replaced device is dropped, the
    replacement progress will be interrupted and resumed with pure recovery
    mode. However, existing stripes before being interrupted cannot read
    from the dropped device anymore. It prints lots of WARN_ON messages.
    And it results in data corruption because existing stripes write
    problematic data into its replacement device and update the progress.

    \# Erase disks (1MB + 2GB)
    dd if=/dev/zero of=/dev/sda bs=1MB count=2049
    dd if=/dev/zero of=/dev/sdb bs=1MB count=2049
    dd if=/dev/zero of=/dev/sdc bs=1MB count=2049
    dd if=/dev/zero of=/dev/sdd bs=1MB count=2049
    mdadm -C /dev/md0 -amd -R -l5 -n3 -x0 /dev/sd[abc] -z 2097152
    \# Ensure array stores non-zero data
    dd if=/root/data_4GB.iso of=/dev/md0 bs=1MB
    \# Start replacement
    mdadm /dev/md0 -a /dev/sdd
    mdadm /dev/md0 --replace /dev/sda

    Then, Hot-plug out /dev/sda during recovery, and wait for recovery done.
    echo check > /sys/block/md0/md/sync_action
    cat /sys/block/md0/md/mismatch_cnt # it will be greater than 0.

    Soon after you hot-plug out /dev/sda, you will see many WARN_ON
    messages. The replacement recovery will be interrupted shortly. After
    the recovery finishes, it will result in data corruption.

    Actually, it's just an unhandled case of replacement. In commit
    (md/raid5: fix interaction of 'replace' and 'recovery'.),
    if a NeedReplace device is not UPTODATE then that is an error, the
    commit just simply print WARN_ON but also mark these corrupted stripes
    with R5_WantReplace. (it means it's ready for writes.)

    To fix this case, we can leverage 'sync and replace' mode mentioned in
    commit (md/raid5: detect and handle replacements during
    recovery.). We can add logics to detect and use 'sync and replace' mode
    for these stripes.

    Reported-by: Alex Chen
    Reviewed-by: Alex Wu
    Reviewed-by: Chung-Chiang Cheng
    Signed-off-by: BingJing Chang
    Signed-off-by: Shaohua Li

    BingJing Chang
     

02 Aug, 2018

2 commits

  • bitmap API (include/linux/bitmap.h) has 'bitmap' prefix for its methods.

    On the other hand MD bitmap API is special case.
    Adding 'md' prefix to it to avoid name space collision.

    No functional changes intended.

    Signed-off-by: Andy Shevchenko
    Acked-by: Shaohua Li
    Signed-off-by: Dmitry Torokhov

    Andy Shevchenko
     
  • bitmap API (include/linux/bitmap.h) has 'bitmap' prefix for its methods.

    On the other hand DM bitmap API is special case.
    Adding 'dm' prefix to it to avoid potential name space collision.

    No functional changes intended.

    Suggested-by: Mike Snitzer
    Signed-off-by: Andy Shevchenko
    Acked-by: Mike Snitzer
    Signed-off-by: Dmitry Torokhov

    Andy Shevchenko
     

01 Aug, 2018

1 commit


30 Jul, 2018

1 commit

  • The metadata low watermark threshold is set by the kernel. But the
    kernel depends on userspace to extend the thinpool metadata device when
    the threshold is crossed.

    Since the metadata low watermark threshold is not visible to userspace,
    upon receiving an event, userspace cannot tell that the kernel wants the
    metadata device extended, instead of some other eventing condition.
    Making it visible (but not settable) enables userspace to affirmatively
    know the kernel is asking for a metadata device extension, by comparing
    metadata_low_watermark against nr_free_blocks_metadata, also reported in
    status.

    Current solutions like dmeventd have their own thresholds for extending
    the data and metadata devices, and both devices are checked against
    their thresholds on each event. This lessens the value of the kernel-set
    threshold, since userspace will either extend the metadata device sooner,
    when receiving another event; or will receive the metadata lowater event
    and do nothing, if dmeventd's threshold is less than the kernel's.
    (This second case is dangerous. The metadata lowater event will not be
    re-sent, so no further event will be generated before the metadata
    device is out if space, unless some other event causes userspace to
    recheck its thresholds.)

    Signed-off-by: Andy Grover
    Signed-off-by: Mike Snitzer

    Andy Grover
     

28 Jul, 2018

12 commits