11 Feb, 2020

1 commit

  • commit 4feaef830de7ffdd8352e1fe14ad3bf13c9688f8 upstream.

    The space-maps track the reference counts for disk blocks allocated by
    both the thin-provisioning and cache targets. There are variants for
    tracking metadata blocks and data blocks.

    Transactionality is implemented by never touching blocks from the
    previous transaction, so we can rollback in the event of a crash.

    When allocating a new block we need to ensure the block is free (has
    reference count of 0) in both the current and previous transaction.
    Prior to this fix we were doing this by searching for a free block in
    the previous transaction, and relying on a 'begin' counter to track
    where the last allocation in the current transaction was. This
    'begin' field was not being updated in all code paths (eg, increment
    of a data block reference count due to breaking sharing of a neighbour
    block in the same btree leaf).

    This fix keeps the 'begin' field, but now it's just a hint to speed up
    the search. Instead the current transaction is searched for a free
    block, and then the old transaction is double checked to ensure it's
    free. Much simpler.

    This fixes reports of sm_disk_new_block()'s BUG_ON() triggering when
    DM thin-provisioning's snapshots are heavily used.

    Reported-by: Eric Wheeler
    Cc: stable@vger.kernel.org
    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Joe Thornber
     

21 Dec, 2019

1 commit

  • commit 474e559567fa631dea8fb8407ab1b6090c903755 upstream.

    We got the following warnings from thin_check during thin-pool setup:

    $ thin_check /dev/vdb
    examining superblock
    examining devices tree
    missing devices: [1, 84]
    too few entries in btree_node: 41, expected at least 42 (block 138, max_entries = 126)
    examining mapping tree

    The phenomenon is the number of entries in one node of details_info tree is
    less than (max_entries / 3). And it can be easily reproduced by the following
    procedures:

    $ new a thin pool
    $ presume the max entries of details_info tree is 126
    $ new 127 thin devices (e.g. 1~127) to make the root node being full
    and then split
    $ remove the first 43 (e.g. 1~43) thin devices to make the children
    reblance repeatedly
    $ stop the thin pool
    $ thin_check

    The root cause is that the B-tree removal procedure in __rebalance2()
    doesn't guarantee the invariance: the minimal number of entries in
    non-root node should be >= (max_entries / 3).

    Simply fix the problem by increasing the rebalance threshold to
    make sure the number of entries in each child will be greater
    than or equal to (max_entries / 3 + 1), so no matter which
    child is used for removal, the number will still be valid.

    Cc: stable@vger.kernel.org
    Signed-off-by: Hou Tao
    Acked-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Hou Tao
     

27 Aug, 2019

1 commit


23 Aug, 2019

2 commits

  • In commit 6096d91af0b6 ("dm space map metadata: fix occasional leak
    of a metadata block on resize"), we refactor the commit logic to a new
    function 'apply_bops'. But when that logic was replaced in out() the
    return value was not stored. This may lead out() returning a wrong
    value to the caller.

    Fixes: 6096d91af0b6 ("dm space map metadata: fix occasional leak of a metadata block on resize")
    Cc: stable@vger.kernel.org
    Signed-off-by: ZhangXiaoxu
    Signed-off-by: Mike Snitzer

    ZhangXiaoxu
     
  • When btree_split_beneath() splits a node to two new children, it will
    allocate two blocks: left and right. If right block's allocation
    failed, the left block will be unlocked and marked dirty. If this
    happened, the left block'ss content is zero, because it wasn't
    initialized with the btree struct before the attempot to allocate the
    right block. Upon return, when flushing the left block to disk, the
    validator will fail when check this block. Then a BUG_ON is raised.

    Fix this by completely initializing the left block before allocating and
    initializing the right block.

    Fixes: 4dcb8b57df359 ("dm btree: fix leak of bufio-backed block in btree_split_beneath error path")
    Cc: stable@vger.kernel.org
    Signed-off-by: ZhangXiaoxu
    Signed-off-by: Mike Snitzer

    ZhangXiaoxu
     

21 May, 2019

1 commit


17 May, 2019

1 commit

  • …it/device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - Improve DM snapshot target's scalability by using finer grained
    locking. Requires some list_bl interface improvements.

    - Add ability for DM integrity to use a bitmap mode, that tracks
    regions where data and metadata are out of sync, instead of using a
    journal.

    - Improve DM thin provisioning target to not write metadata changes to
    disk if the thin-pool and associated thin devices are merely
    activated but not used. This avoids metadata corruption due to
    concurrent activation of thin devices across different OS instances
    (e.g. split brain scenarios, which ultimately would be avoided if
    proper device filters were used -- but not having proper filtering
    has proven a very common configuration mistake)

    - Fix missing call to path selector type->end_io in DM multipath. This
    fixes reported performance problems due to inaccurate path selector
    IO accounting causing an imbalance of IO (e.g. avoiding issuing IO to
    particular path due to it seemingly being heavily used).

    - Fix bug in DM cache metadata's loading of its discard bitset that
    could lead to all cache blocks being discarded if the very first
    cache block was discarded (thankfully in practice the first cache
    block is generally in use; be it FS superblock, partition table, disk
    label, etc).

    - Add testing-only DM dust target which simulates a device that has
    failing sectors and/or read failures.

    - Fix a DM init error path reference count hang that caused boot hangs
    if user supplied malformed input on kernel commandline.

    - Fix a couple issues with DM crypt target's logging being overly
    verbose or lacking context.

    - Various other small fixes to DM init, DM multipath, DM zoned, and DM
    crypt.

    * tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (42 commits)
    dm: fix a couple brace coding style issues
    dm crypt: print device name in integrity error message
    dm crypt: move detailed message into debug level
    dm ioctl: fix hang in early create error condition
    dm integrity: whitespace, coding style and dead code cleanup
    dm integrity: implement synchronous mode for reboot handling
    dm integrity: handle machine reboot in bitmap mode
    dm integrity: add a bitmap mode
    dm integrity: introduce a function add_new_range_and_wait()
    dm integrity: allow large ranges to be described
    dm ingerity: pass size to dm_integrity_alloc_page_list()
    dm integrity: introduce rw_journal_sectors()
    dm integrity: update documentation
    dm integrity: don't report unused options
    dm integrity: don't check null pointer before kvfree and vfree
    dm integrity: correctly calculate the size of metadata area
    dm dust: Make dm_dust_init and dm_dust_exit static
    dm dust: remove redundant unsigned comparison to less than zero
    dm mpath: always free attached_handler_name in parse_path()
    dm init: fix max devices/targets checks
    ...

    Linus Torvalds
     

29 Apr, 2019

1 commit

  • Replace the indirection through struct stack_trace with an invocation of
    the storage array based interface. This results in less storage space and
    indirection.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: dm-devel@redhat.com
    Cc: Mike Snitzer
    Cc: Alasdair Kergon
    Cc: Steven Rostedt
    Cc: Alexander Potapenko
    Cc: Alexey Dobriyan
    Cc: Andrew Morton
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: linux-mm@kvack.org
    Cc: David Rientjes
    Cc: Catalin Marinas
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: kasan-dev@googlegroups.com
    Cc: Mike Rapoport
    Cc: Akinobu Mita
    Cc: Christoph Hellwig
    Cc: iommu@lists.linux-foundation.org
    Cc: Robin Murphy
    Cc: Marek Szyprowski
    Cc: Johannes Thumshirn
    Cc: David Sterba
    Cc: Chris Mason
    Cc: Josef Bacik
    Cc: linux-btrfs@vger.kernel.org
    Cc: Daniel Vetter
    Cc: intel-gfx@lists.freedesktop.org
    Cc: Joonas Lahtinen
    Cc: Maarten Lankhorst
    Cc: dri-devel@lists.freedesktop.org
    Cc: David Airlie
    Cc: Jani Nikula
    Cc: Rodrigo Vivi
    Cc: Tom Zanussi
    Cc: Miroslav Benes
    Cc: linux-arch@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190425094802.533968922@linutronix.de

    Thomas Gleixner
     

19 Apr, 2019

1 commit


06 Mar, 2019

1 commit


02 Aug, 2018

1 commit

  • bitmap API (include/linux/bitmap.h) has 'bitmap' prefix for its methods.

    On the other hand DM bitmap API is special case.
    Adding 'dm' prefix to it to avoid potential name space collision.

    No functional changes intended.

    Suggested-by: Mike Snitzer
    Signed-off-by: Andy Shevchenko
    Acked-by: Mike Snitzer
    Signed-off-by: Dmitry Torokhov

    Andy Shevchenko
     

04 Apr, 2018

1 commit

  • Move dm-bufio.h to include/linux/ so that external GPL'd DM target
    modules can use it.

    It is better to allow the use of dm-bufio than force external modules
    to implement the equivalent buffered IO mechanism in some new way. The
    hope is this will encourage the use of dm-bufio; which will then make it
    easier for a GPL'd external DM target module to be included upstream.

    A couple dm-bufio EXPORT_SYMBOL exports have also been updated to use
    EXPORT_SYMBOL_GPL.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

17 Jan, 2018

1 commit

  • When inserting a new key/value pair into a btree we walk down the spine of
    btree nodes performing the following 2 operations:

    i) space for a new entry
    ii) adjusting the first key entry if the new key is lower than any in the node.

    If the _root_ node is full, the function btree_split_beneath() allocates 2 new
    nodes, and redistibutes the root nodes entries between them. The root node is
    left with 2 entries corresponding to the 2 new nodes.

    btree_split_beneath() then adjusts the spine to point to one of the two new
    children. This means the first key is never adjusted if the new key was lower,
    ie. operation (ii) gets missed out. This can result in the new key being
    'lost' for a period; until another low valued key is inserted that will uncover
    it.

    This is a serious bug, and quite hard to make trigger in normal use. A
    reproducing test case ("thin create devices-in-reverse-order") is
    available as part of the thin-provision-tools project:
    https://github.com/jthornber/thin-provisioning-tools/blob/master/functional-tests/device-mapper/dm-tests.scm#L593

    Fix the issue by changing btree_split_beneath() so it no longer adjusts
    the spine. Instead it unlocks both the new nodes, and lets the main
    loop in btree_insert_raw() relock the appropriate one and make any
    neccessary adjustments.

    Cc: stable@vger.kernel.org
    Reported-by: Monty Pavel
    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Joe Thornber
     

15 Nov, 2017

1 commit

  • Pull device mapper updates from Mike Snitzer:

    - a few conversions from atomic_t to ref_count_t

    - a DM core fix for a race during device destruction that could result
    in a BUG_ON

    - a stable@ fix for a DM cache race condition that could lead to data
    corruption when operating in writeback mode (writethrough is default)

    - various DM cache cleanups and improvements

    - add DAX support to the DM log-writes target

    - a fix for the DM zoned target's ability to deal with the last zone of
    the drive being smaller than all others

    - a stable@ DM crypt and DM integrity fix for a negative check that was
    to restrictive (prevented slab debug with XFS ontop of DM crypt from
    working)

    - a DM raid target fix for a panic that can occur when forcing a raid
    to sync

    * tag 'for-4.15/dm' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (25 commits)
    dm cache: lift common migration preparation code to alloc_migration()
    dm cache: remove usused deferred_cells member from struct cache
    dm cache policy smq: allocate cache blocks in order
    dm cache policy smq: change max background work from 10240 to 4096 blocks
    dm cache background tracker: limit amount of background work that may be issued at once
    dm cache policy smq: take origin idle status into account when queuing writebacks
    dm cache policy smq: handle races with queuing background_work
    dm raid: fix panic when attempting to force a raid to sync
    dm integrity: allow unaligned bv_offset
    dm crypt: allow unaligned bv_offset
    dm: small cleanup in dm_get_md()
    dm: fix race between dm_get_from_kobject() and __dm_destroy()
    dm: allocate struct mapped_device with kvzalloc
    dm zoned: ignore last smaller runt zone
    dm space map metadata: use ARRAY_SIZE
    dm log writes: add support for DAX
    dm log writes: add support for inline data buffers
    dm cache: simplify get_per_bio_data() by removing data_size argument
    dm cache: remove all obsolete writethrough-specific code
    dm cache: submit writethrough writes in parallel to origin and cache
    ...

    Linus Torvalds
     

11 Nov, 2017

1 commit

  • Using the ARRAY_SIZE macro improves the readability of the code.

    Found with Coccinelle with the following semantic patch:
    @r depends on (org || report)@
    type T;
    T[] E;
    position p;
    @@
    (
    (sizeof(E)@p /sizeof(*E))
    |
    (sizeof(E)@p /sizeof(E[...]))
    |
    (sizeof(E)@p /sizeof(T))
    )

    Signed-off-by: Jérémy Lefaure
    Signed-off-by: Mike Snitzer

    Jérémy Lefaure
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

16 May, 2017

1 commit


28 Apr, 2017

1 commit


25 Apr, 2017

1 commit

  • dm_btree_find_lowest_key() is giving incorrect results. find_key()
    traverses the btree correctly for finding the highest key, but there is
    an error in the way it traverses the btree for retrieving the lowest
    key. dm_btree_find_lowest_key() fetches the first key of the rightmost
    block of the btree instead of fetching the first key from the leftmost
    block.

    Fix this by conditionally passing the correct parameter to value64()
    based on the @find_highest flag.

    Cc: stable@vger.kernel.org
    Signed-off-by: Erez Zadok
    Signed-off-by: Vinothkumar Raja
    Signed-off-by: Nidhi Panpalia
    Signed-off-by: Mike Snitzer

    Vinothkumar Raja
     

02 Mar, 2017

1 commit


22 Feb, 2017

1 commit

  • Pull device mapper updates from Mike Snitzer:

    - Fix dm-raid transient device failure processing and other smaller
    tweaks.

    - Add journal support to the DM raid target to close the 'write hole'
    on raid 4/5/6.

    - Fix dm-cache corruption, due to rounding bug, when cache exceeds 2TB.

    - Add 'metadata2' feature to dm-cache to separate the dirty bitset out
    from other cache metadata. This improves speed of shutting down a
    large cache device (which implies writing out dirty bits).

    - Fix a memory leak during dm-stats data structure destruction.

    - Fix a DM multipath round-robin path selector performance regression
    that was caused by less precise balancing across all paths.

    - Lastly, introduce a DM core fix for a long-standing DM snapshot
    deadlock that is rooted in the complexity of the device stack used in
    conjunction with block core maintaining bios on current->bio_list to
    manage recursion in generic_make_request(). A more comprehensive fix
    to block core (and its hook in the cpu scheduler) would be wonderful
    but this DM-specific fix is pragmatic considering how difficult it
    has been to make progress on a generic fix.

    * tag 'dm-4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
    dm: flush queued bios when process blocks to avoid deadlock
    dm round robin: revert "use percpu 'repeat_count' and 'current_path'"
    dm stats: fix a leaked s->histogram_boundaries array
    dm space map metadata: constify dm_space_map structures
    dm cache metadata: use cursor api in blocks_are_clean_separate_dirty()
    dm persistent data: add cursor skip functions to the cursor APIs
    dm cache metadata: use dm_bitset_new() to create the dirty bitset in format 2
    dm bitset: add dm_bitset_new()
    dm cache metadata: name the cache block that couldn't be loaded
    dm cache metadata: add "metadata2" feature
    dm cache metadata: use bitset cursor api to load discard bitset
    dm bitset: introduce cursor api
    dm btree: use GFP_NOFS in dm_btree_del()
    dm space map common: memcpy the disk root to ensure it's arch aligned
    dm block manager: add unlikely() annotations on dm_bufio error paths
    dm cache: fix corruption seen when using cache > 2TB
    dm raid: cleanup awkward branching in raid_message() option processing
    dm raid: use mddev rather than rdev->mddev
    dm raid: use read_disk_sb() throughout
    dm raid: add raid4/5/6 journaling support
    ...

    Linus Torvalds
     

17 Feb, 2017

7 commits


14 Jan, 2017

1 commit

  • This is a nasty interface and setting the state of a foreign task must
    not be done. As of the following commit:

    be628be0956 ("bcache: Make gc wakeup sane, remove set_task_state()")

    ... everyone in the kernel calls set_task_state() with current, allowing
    the helper to be removed.

    However, as the comment indicates, it is still around for those archs
    where computing current is more expensive than using a pointer, at least
    in theory. An important arch that is affected is arm64, however this has
    been addressed now [1] and performance is up to par making no difference
    with either calls.

    Of all the callers, if any, it's the locking bits that would care most
    about this -- ie: we end up passing a tsk pointer to a lot of the lock
    slowpath, and setting ->state on that. The following numbers are based
    on two tests: a custom ad-hoc microbenchmark that just measures
    latencies (for ~65 million calls) between get_task_state() vs
    get_current_state().

    Secondly for a higher overview, an unlink microbenchmark was used,
    which pounds on a single file with open, close,unlink combos with
    increasing thread counts (up to 4x ncpus). While the workload is quite
    unrealistic, it does contend a lot on the inode mutex or now rwsem.

    [1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com

    == 1. x86-64 ==

    Avg runtime set_task_state(): 601 msecs
    Avg runtime set_current_state(): 552 msecs

    vanilla dirty
    Hmean unlink1-processes-2 36089.26 ( 0.00%) 38977.33 ( 8.00%)
    Hmean unlink1-processes-5 28555.01 ( 0.00%) 29832.55 ( 4.28%)
    Hmean unlink1-processes-8 37323.75 ( 0.00%) 44974.57 ( 20.50%)
    Hmean unlink1-processes-12 43571.88 ( 0.00%) 44283.01 ( 1.63%)
    Hmean unlink1-processes-21 34431.52 ( 0.00%) 38284.45 ( 11.19%)
    Hmean unlink1-processes-30 34813.26 ( 0.00%) 37975.17 ( 9.08%)
    Hmean unlink1-processes-48 37048.90 ( 0.00%) 39862.78 ( 7.59%)
    Hmean unlink1-processes-79 35630.01 ( 0.00%) 36855.30 ( 3.44%)
    Hmean unlink1-processes-110 36115.85 ( 0.00%) 39843.91 ( 10.32%)
    Hmean unlink1-processes-141 32546.96 ( 0.00%) 35418.52 ( 8.82%)
    Hmean unlink1-processes-172 34674.79 ( 0.00%) 36899.21 ( 6.42%)
    Hmean unlink1-processes-203 37303.11 ( 0.00%) 36393.04 ( -2.44%)
    Hmean unlink1-processes-224 35712.13 ( 0.00%) 36685.96 ( 2.73%)

    == 2. ppc64le ==

    Avg runtime set_task_state(): 938 msecs
    Avg runtime set_current_state: 940 msecs

    vanilla dirty
    Hmean unlink1-processes-2 19269.19 ( 0.00%) 30704.50 ( 59.35%)
    Hmean unlink1-processes-5 20106.15 ( 0.00%) 21804.15 ( 8.45%)
    Hmean unlink1-processes-8 17496.97 ( 0.00%) 17243.28 ( -1.45%)
    Hmean unlink1-processes-12 14224.15 ( 0.00%) 17240.21 ( 21.20%)
    Hmean unlink1-processes-21 14155.66 ( 0.00%) 15681.23 ( 10.78%)
    Hmean unlink1-processes-30 14450.70 ( 0.00%) 15995.83 ( 10.69%)
    Hmean unlink1-processes-48 16945.57 ( 0.00%) 16370.42 ( -3.39%)
    Hmean unlink1-processes-79 15788.39 ( 0.00%) 14639.27 ( -7.28%)
    Hmean unlink1-processes-110 14268.48 ( 0.00%) 14377.40 ( 0.76%)
    Hmean unlink1-processes-141 14023.65 ( 0.00%) 16271.69 ( 16.03%)
    Hmean unlink1-processes-172 13417.62 ( 0.00%) 16067.55 ( 19.75%)
    Hmean unlink1-processes-203 15293.08 ( 0.00%) 15440.40 ( 0.96%)
    Hmean unlink1-processes-234 13719.32 ( 0.00%) 16190.74 ( 18.01%)
    Hmean unlink1-processes-265 16400.97 ( 0.00%) 16115.22 ( -1.74%)
    Hmean unlink1-processes-296 14388.60 ( 0.00%) 16216.13 ( 12.70%)
    Hmean unlink1-processes-320 15771.85 ( 0.00%) 15905.96 ( 0.85%)

    x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
    and ppc64 (with paca) show similar improvements in the unlink microbenches.
    The small delta for ppc64 (2ms), does not represent the gains on the unlink
    runs. In the case of x86, there was a decent amount of variation in the
    latency runs, but always within a 20 to 50ms increase), ppc was more constant.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: dave@stgolabs.net
    Cc: mark.rutland@arm.com
    Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net
    Signed-off-by: Ingo Molnar

    Davidlohr Bueso
     

09 Dec, 2016

4 commits

  • If no block was allocated or freed, sm_ll_mutate() wasn't setting
    *ev, leaving the variable unitialized. sm_ll_insert(),
    sm_disk_inc_block(), and sm_disk_new_block() all check ev to see
    if there was an allocation event in sm_ll_mutate(), possibly
    reading unitialized data.

    If no allocation event occured, sm_ll_mutate() should set *ev
    to SM_NONE.

    Signed-off-by: Benjamin Marzinski
    Acked-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Benjamin Marzinski
     
  • When metadata_ll_init_index() is called by sm_ll_new_metadata(),
    ll->mi_le hasn't been initialized yet. So, when
    metadata_ll_init_index() copies the contents of ll->mi_le into the
    newly allocated bitmap_root, it is just copying garbage. ll->mi_le
    will be allocated later in sm_ll_extend() and copied into the
    bitmap_root, in sm_ll_commit().

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Mike Snitzer

    Benjamin Marzinski
     
  • In dm_sm_metadata_create() we temporarily change the dm_space_map
    operations from 'ops' (whose .destroy function deallocates the
    sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't).

    If dm_sm_metadata_create() fails in sm_ll_new_metadata() or
    sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls
    dm_sm_destroy() with the intention of freeing the sm_metadata, but it
    doesn't (because the dm_space_map operations is still set to
    'bootstrap_ops').

    Fix this by setting the dm_space_map operations back to 'ops' if
    dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'.

    Signed-off-by: Benjamin Marzinski
    Acked-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Benjamin Marzinski
     
  • A value is assigned to 'nr_entries' but is never used, remove it.

    Signed-off-by: Bart Van Assche
    Signed-off-by: Mike Snitzer

    Bart Van Assche
     

15 Nov, 2016

1 commit

  • The block manager's locking is useful for catching cycles that may
    result from certain btree metadata corruption. But in general it serves
    as a developer tool to catch bugs in code. Unless you're finding that
    DM thin provisioning is hanging due to infinite loops within the block
    manager's access to btree nodes you can safely disable this feature.

    Signed-off-by: Joe Thornber
    Signed-off-by: Arnd Bergmann # do/while(0) macro fix
    Signed-off-by: Mike Snitzer

    Joe Thornber
     

22 Sep, 2016

3 commits


21 Jul, 2016

1 commit

  • dm_btree_find_next_single() can short-circuit the search for a block
    with a return of -ENODATA if all entries are higher than the search key
    passed to lower_bound().

    This hasn't been a problem because of the way the btree has been used by
    DM thinp. But it must be fixed now in preparation for fixing the race
    in DM thinp's handling of simultaneous block discard vs allocation.
    Otherwise, once that fix is in place, some of the blocks in a discard
    would not be unmapped as expected.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Joe Thornber
     

14 Dec, 2015

1 commit

  • Remove the unused struct block_op pointer that was inadvertantly
    introduced, via cut-and-paste of previous brb_op() code, as part of
    commit 50dd842ad.

    (Cc'ing stable@ because commit 50dd842ad did)

    Fixes: 50dd842ad ("dm space map metadata: fix ref counting bug when bootstrapping a new space map")
    Reported-by: David Binderman
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Mike Snitzer
     

10 Dec, 2015

1 commit