Eric Lee / smarc-fsl-linux-kernel

11 Feb, 2020

1 commit

a8d99d630 dm space map common: fix to ensure new block isn't already in use ... Browse Code »

commit 4feaef830de7ffdd8352e1fe14ad3bf13c9688f8 upstream.

The space-maps track the reference counts for disk blocks allocated by
both the thin-provisioning and cache targets. There are variants for
tracking metadata blocks and data blocks.

Transactionality is implemented by never touching blocks from the
previous transaction, so we can rollback in the event of a crash.

When allocating a new block we need to ensure the block is free (has
reference count of 0) in both the current and previous transaction.
Prior to this fix we were doing this by searching for a free block in
the previous transaction, and relying on a 'begin' counter to track
where the last allocation in the current transaction was. This
'begin' field was not being updated in all code paths (eg, increment
of a data block reference count due to breaking sharing of a neighbour
block in the same btree leaf).

This fix keeps the 'begin' field, but now it's just a hint to speed up
the search. Instead the current transaction is searched for a free
block, and then the old transaction is double checked to ensure it's
free. Much simpler.

This fixes reports of sm_disk_new_block()'s BUG_ON() triggering when
DM thin-provisioning's snapshots are heavily used.

Reported-by: Eric Wheeler
Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Greg Kroah-Hartman

Joe Thornber
2020-02-11 20:35:26 +0800

21 Dec, 2019

1 commit

3f07f8a99 dm btree: increase rebalance threshold in __rebalance2() ... Browse Code »

commit 474e559567fa631dea8fb8407ab1b6090c903755 upstream.

We got the following warnings from thin_check during thin-pool setup:

$ thin_check /dev/vdb
examining superblock
examining devices tree
missing devices: [1, 84]
too few entries in btree_node: 41, expected at least 42 (block 138, max_entries = 126)
examining mapping tree

The phenomenon is the number of entries in one node of details_info tree is
less than (max_entries / 3). And it can be easily reproduced by the following
procedures:

$ new a thin pool
$ presume the max entries of details_info tree is 126
$ new 127 thin devices (e.g. 1~127) to make the root node being full
and then split
$ remove the first 43 (e.g. 1~43) thin devices to make the children
reblance repeatedly
$ stop the thin pool
$ thin_check

The root cause is that the B-tree removal procedure in __rebalance2()
doesn't guarantee the invariance: the minimal number of entries in
non-root node should be >= (max_entries / 3).

Simply fix the problem by increasing the rebalance threshold to
make sure the number of entries in each child will be greater
than or equal to (max_entries / 3 + 1), so no matter which
child is used for removal, the number will still be valid.

Cc: stable@vger.kernel.org
Signed-off-by: Hou Tao
Acked-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Greg Kroah-Hartman

Hou Tao
2019-12-21 18:04:57 +0800

27 Aug, 2019

1 commit

c1499a044 dm space map common: remove check for impossible sm_find_free() return value ... Browse Code »

The function sm_find_free() just returns -ENOSPC and 0.
So remove lone caller's check for some other error.

Signed-off-by: ZhangXiaoxu
Signed-off-by: Mike Snitzer

ZhangXiaoxu
2019-08-27 03:39:53 +0800

23 Aug, 2019

2 commits

ae148243d dm space map metadata: fix missing store of apply_bops() return value ... Browse Code »

In commit 6096d91af0b6 ("dm space map metadata: fix occasional leak
of a metadata block on resize"), we refactor the commit logic to a new
function 'apply_bops'. But when that logic was replaced in out() the
return value was not stored. This may lead out() returning a wrong
value to the caller.

Fixes: 6096d91af0b6 ("dm space map metadata: fix occasional leak of a metadata block on resize")
Cc: stable@vger.kernel.org
Signed-off-by: ZhangXiaoxu
Signed-off-by: Mike Snitzer

ZhangXiaoxu
2019-08-23 04:11:24 +0800
e4f9d6013 dm btree: fix order of block initialization in btree_split_beneath ... Browse Code »

When btree_split_beneath() splits a node to two new children, it will
allocate two blocks: left and right. If right block's allocation
failed, the left block will be unlocked and marked dirty. If this
happened, the left block'ss content is zero, because it wasn't
initialized with the btree struct before the attempot to allocate the
right block. Upon return, when flushing the left block to disk, the
validator will fail when check this block. Then a BUG_ON is raised.

Fix this by completely initializing the left block before allocating and
initializing the right block.

Fixes: 4dcb8b57df359 ("dm btree: fix leak of bufio-backed block in btree_split_beneath error path")
Cc: stable@vger.kernel.org
Signed-off-by: ZhangXiaoxu
Signed-off-by: Mike Snitzer

ZhangXiaoxu
2019-08-23 04:11:23 +0800

21 May, 2019

1 commit

ec8f24b7f treewide: Add SPDX license identifier - Makefile/Kconfig ... Browse Code »

Add SPDX license identifiers to all Make/Kconfig files which:

- Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:46 +0800

17 May, 2019

1 commit

311f71281 Merge tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/g… ... Browse Code »

…it/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- Improve DM snapshot target's scalability by using finer grained
locking. Requires some list_bl interface improvements.

- Add ability for DM integrity to use a bitmap mode, that tracks
regions where data and metadata are out of sync, instead of using a
journal.

- Improve DM thin provisioning target to not write metadata changes to
disk if the thin-pool and associated thin devices are merely
activated but not used. This avoids metadata corruption due to
concurrent activation of thin devices across different OS instances
(e.g. split brain scenarios, which ultimately would be avoided if
proper device filters were used -- but not having proper filtering
has proven a very common configuration mistake)

- Fix missing call to path selector type->end_io in DM multipath. This
fixes reported performance problems due to inaccurate path selector
IO accounting causing an imbalance of IO (e.g. avoiding issuing IO to
particular path due to it seemingly being heavily used).

- Fix bug in DM cache metadata's loading of its discard bitset that
could lead to all cache blocks being discarded if the very first
cache block was discarded (thankfully in practice the first cache
block is generally in use; be it FS superblock, partition table, disk
label, etc).

- Add testing-only DM dust target which simulates a device that has
failing sectors and/or read failures.

- Fix a DM init error path reference count hang that caused boot hangs
if user supplied malformed input on kernel commandline.

- Fix a couple issues with DM crypt target's logging being overly
verbose or lacking context.

- Various other small fixes to DM init, DM multipath, DM zoned, and DM
crypt.

* tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (42 commits)
dm: fix a couple brace coding style issues
dm crypt: print device name in integrity error message
dm crypt: move detailed message into debug level
dm ioctl: fix hang in early create error condition
dm integrity: whitespace, coding style and dead code cleanup
dm integrity: implement synchronous mode for reboot handling
dm integrity: handle machine reboot in bitmap mode
dm integrity: add a bitmap mode
dm integrity: introduce a function add_new_range_and_wait()
dm integrity: allow large ranges to be described
dm ingerity: pass size to dm_integrity_alloc_page_list()
dm integrity: introduce rw_journal_sectors()
dm integrity: update documentation
dm integrity: don't report unused options
dm integrity: don't check null pointer before kvfree and vfree
dm integrity: correctly calculate the size of metadata area
dm dust: Make dm_dust_init and dm_dust_exit static
dm dust: remove redundant unsigned comparison to less than zero
dm mpath: always free attached_handler_name in parse_path()
dm init: fix max devices/targets checks
...

Linus Torvalds
2019-05-17 06:55:48 +0800

29 Apr, 2019

1 commit

be9c52ed8 dm persistent data: Simplify stack trace handling ... Browse Code »

Replace the indirection through struct stack_trace with an invocation of
the storage array based interface. This results in less storage space and
indirection.

Signed-off-by: Thomas Gleixner
Reviewed-by: Josh Poimboeuf
Cc: Andy Lutomirski
Cc: dm-devel@redhat.com
Cc: Mike Snitzer
Cc: Alasdair Kergon
Cc: Steven Rostedt
Cc: Alexander Potapenko
Cc: Alexey Dobriyan
Cc: Andrew Morton
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: linux-mm@kvack.org
Cc: David Rientjes
Cc: Catalin Marinas
Cc: Dmitry Vyukov
Cc: Andrey Ryabinin
Cc: kasan-dev@googlegroups.com
Cc: Mike Rapoport
Cc: Akinobu Mita
Cc: Christoph Hellwig
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy
Cc: Marek Szyprowski
Cc: Johannes Thumshirn
Cc: David Sterba
Cc: Chris Mason
Cc: Josef Bacik
Cc: linux-btrfs@vger.kernel.org
Cc: Daniel Vetter
Cc: intel-gfx@lists.freedesktop.org
Cc: Joonas Lahtinen
Cc: Maarten Lankhorst
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie
Cc: Jani Nikula
Cc: Rodrigo Vivi
Cc: Tom Zanussi
Cc: Miroslav Benes
Cc: linux-arch@vger.kernel.org
Link: https://lkml.kernel.org/r/20190425094802.533968922@linutronix.de

Thomas Gleixner
2019-04-29 18:37:52 +0800

19 Apr, 2019

1 commit

c6e086e0c dm space map common: zero entire ll_disk ... Browse Code »

Otherwise, memory that is allocated (and potentially not previously
zeroed) will get written to disk as part of the space maps.

Signed-off-by: Mike Snitzer

Mike Snitzer
2019-04-19 04:18:32 +0800

06 Mar, 2019

1 commit

5941c621d dm block manager: remove redundant unlikely annotation ... Browse Code »

unlikely has already included in IS_ERR(),
so just remove redundant unlikely annotation.

Signed-off-by: Chengguang Xu
Signed-off-by: Mike Snitzer

Chengguang Xu
2019-03-06 03:53:48 +0800

02 Aug, 2018

1 commit

5cc9cdf63 dm: Avoid namespace collision with bitmap API ... Browse Code »

bitmap API (include/linux/bitmap.h) has 'bitmap' prefix for its methods.

On the other hand DM bitmap API is special case.
Adding 'dm' prefix to it to avoid potential name space collision.

No functional changes intended.

Suggested-by: Mike Snitzer
Signed-off-by: Andy Shevchenko
Acked-by: Mike Snitzer
Signed-off-by: Dmitry Torokhov

Andy Shevchenko
2018-08-02 06:49:38 +0800

04 Apr, 2018

1 commit

afa53df86 dm bufio: move dm-bufio.h to include/linux/ ... Browse Code »

Move dm-bufio.h to include/linux/ so that external GPL'd DM target
modules can use it.

It is better to allow the use of dm-bufio than force external modules
to implement the equivalent buffered IO mechanism in some new way. The
hope is this will encourage the use of dm-bufio; which will then make it
easier for a GPL'd external DM target module to be included upstream.

A couple dm-bufio EXPORT_SYMBOL exports have also been updated to use
EXPORT_SYMBOL_GPL.

Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer

Mikulas Patocka
2018-04-04 03:04:23 +0800

17 Jan, 2018

1 commit

bc68d0a43 dm btree: fix serious bug in btree_split_beneath() ... Browse Code »

When inserting a new key/value pair into a btree we walk down the spine of
btree nodes performing the following 2 operations:

i) space for a new entry
ii) adjusting the first key entry if the new key is lower than any in the node.

If the _root_ node is full, the function btree_split_beneath() allocates 2 new
nodes, and redistibutes the root nodes entries between them. The root node is
left with 2 entries corresponding to the 2 new nodes.

btree_split_beneath() then adjusts the spine to point to one of the two new
children. This means the first key is never adjusted if the new key was lower,
ie. operation (ii) gets missed out. This can result in the new key being
'lost' for a period; until another low valued key is inserted that will uncover
it.

This is a serious bug, and quite hard to make trigger in normal use. A
reproducing test case ("thin create devices-in-reverse-order") is
available as part of the thin-provision-tools project:
https://github.com/jthornber/thin-provisioning-tools/blob/master/functional-tests/device-mapper/dm-tests.scm#L593

Fix the issue by changing btree_split_beneath() so it no longer adjusts
the spine. Instead it unlocks both the new nodes, and lets the main
loop in btree_insert_raw() relock the appropriate one and make any
neccessary adjustments.

Cc: stable@vger.kernel.org
Reported-by: Monty Pavel
Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2018-01-17 22:07:55 +0800

15 Nov, 2017

1 commit

b91593fa8 Merge tag 'for-4.15/dm' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm ... Browse Code »

Pull device mapper updates from Mike Snitzer:

- a few conversions from atomic_t to ref_count_t

- a DM core fix for a race during device destruction that could result
in a BUG_ON

- a stable@ fix for a DM cache race condition that could lead to data
corruption when operating in writeback mode (writethrough is default)

- various DM cache cleanups and improvements

- add DAX support to the DM log-writes target

- a fix for the DM zoned target's ability to deal with the last zone of
the drive being smaller than all others

- a stable@ DM crypt and DM integrity fix for a negative check that was
to restrictive (prevented slab debug with XFS ontop of DM crypt from
working)

- a DM raid target fix for a panic that can occur when forcing a raid
to sync

* tag 'for-4.15/dm' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (25 commits)
dm cache: lift common migration preparation code to alloc_migration()
dm cache: remove usused deferred_cells member from struct cache
dm cache policy smq: allocate cache blocks in order
dm cache policy smq: change max background work from 10240 to 4096 blocks
dm cache background tracker: limit amount of background work that may be issued at once
dm cache policy smq: take origin idle status into account when queuing writebacks
dm cache policy smq: handle races with queuing background_work
dm raid: fix panic when attempting to force a raid to sync
dm integrity: allow unaligned bv_offset
dm crypt: allow unaligned bv_offset
dm: small cleanup in dm_get_md()
dm: fix race between dm_get_from_kobject() and __dm_destroy()
dm: allocate struct mapped_device with kvzalloc
dm zoned: ignore last smaller runt zone
dm space map metadata: use ARRAY_SIZE
dm log writes: add support for DAX
dm log writes: add support for inline data buffers
dm cache: simplify get_per_bio_data() by removing data_size argument
dm cache: remove all obsolete writethrough-specific code
dm cache: submit writethrough writes in parallel to origin and cache
...

Linus Torvalds
2017-11-15 07:50:56 +0800

11 Nov, 2017

1 commit

fbc61291d dm space map metadata: use ARRAY_SIZE ... Browse Code »

Using the ARRAY_SIZE macro improves the readability of the code.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
(sizeof(E)@p /sizeof(*E))
|
(sizeof(E)@p /sizeof(E[...]))
|
(sizeof(E)@p /sizeof(T))
)

Signed-off-by: Jérémy Lefaure
Signed-off-by: Mike Snitzer

Jérémy Lefaure
2017-11-11 04:44:52 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

16 May, 2017

1 commit

0377a07c7 dm space map disk: fix some book keeping in the disk space map ... Browse Code »

When decrementing the reference count for a block, the free count wasn't
being updated if the reference count went to zero.

Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2017-05-16 03:09:50 +0800

28 Apr, 2017

1 commit

73cbca6a6 dm block manager: remove an unused argument from dm_block_manager_create() ... Browse Code »

The 'cache_size' argument of dm_block_manager_create() has never been
used. Remove it along with the definitions of the constants passed as
the 'cache_size' argument.

Signed-off-by: Bart Van Assche
Reviewed-by: Hannes Reinecke
Signed-off-by: Mike Snitzer

Bart Van Assche
2017-04-28 05:08:41 +0800

25 Apr, 2017

1 commit

7d1fedb6e dm btree: fix for dm_btree_find_lowest_key() ... Browse Code »

dm_btree_find_lowest_key() is giving incorrect results. find_key()
traverses the btree correctly for finding the highest key, but there is
an error in the way it traverses the btree for retrieving the lowest
key. dm_btree_find_lowest_key() fetches the first key of the rightmost
block of the btree instead of fetching the first key from the leftmost
block.

Fix this by conditionally passing the correct parameter to value64()
based on the @find_highest flag.

Cc: stable@vger.kernel.org
Signed-off-by: Erez Zadok
Signed-off-by: Vinothkumar Raja
Signed-off-by: Nidhi Panpalia
Signed-off-by: Mike Snitzer

Vinothkumar Raja
2017-04-25 02:47:49 +0800

02 Mar, 2017

1 commit

0881e7bd3 sched/headers: Prepare to move the get_task_struct()/put_task_struct() and relat… ... Browse Code »

…ed APIs from <linux/sched.h> to <linux/sched/task.h>

But first update usage sites with the new header dependency.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2017-03-02 15:42:40 +0800

22 Feb, 2017

1 commit

7a771ceac Merge tag 'dm-4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm ... Browse Code »

Pull device mapper updates from Mike Snitzer:

- Fix dm-raid transient device failure processing and other smaller
tweaks.

- Add journal support to the DM raid target to close the 'write hole'
on raid 4/5/6.

- Fix dm-cache corruption, due to rounding bug, when cache exceeds 2TB.

- Add 'metadata2' feature to dm-cache to separate the dirty bitset out
from other cache metadata. This improves speed of shutting down a
large cache device (which implies writing out dirty bits).

- Fix a memory leak during dm-stats data structure destruction.

- Fix a DM multipath round-robin path selector performance regression
that was caused by less precise balancing across all paths.

- Lastly, introduce a DM core fix for a long-standing DM snapshot
deadlock that is rooted in the complexity of the device stack used in
conjunction with block core maintaining bios on current->bio_list to
manage recursion in generic_make_request(). A more comprehensive fix
to block core (and its hook in the cpu scheduler) would be wonderful
but this DM-specific fix is pragmatic considering how difficult it
has been to make progress on a generic fix.

* tag 'dm-4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
dm: flush queued bios when process blocks to avoid deadlock
dm round robin: revert "use percpu 'repeat_count' and 'current_path'"
dm stats: fix a leaked s->histogram_boundaries array
dm space map metadata: constify dm_space_map structures
dm cache metadata: use cursor api in blocks_are_clean_separate_dirty()
dm persistent data: add cursor skip functions to the cursor APIs
dm cache metadata: use dm_bitset_new() to create the dirty bitset in format 2
dm bitset: add dm_bitset_new()
dm cache metadata: name the cache block that couldn't be loaded
dm cache metadata: add "metadata2" feature
dm cache metadata: use bitset cursor api to load discard bitset
dm bitset: introduce cursor api
dm btree: use GFP_NOFS in dm_btree_del()
dm space map common: memcpy the disk root to ensure it's arch aligned
dm block manager: add unlikely() annotations on dm_bufio error paths
dm cache: fix corruption seen when using cache > 2TB
dm raid: cleanup awkward branching in raid_message() option processing
dm raid: use mddev rather than rdev->mddev
dm raid: use read_disk_sb() throughout
dm raid: add raid4/5/6 journaling support
...

Linus Torvalds
2017-02-22 04:11:41 +0800

17 Feb, 2017

7 commits

b79af13ef dm space map metadata: constify dm_space_map structures ... Browse Code »

Declare dm_space_map structures as const as they are only passed as an
argument to the function memcpy. This argument is of type const void *,
so dm_space_map structures having this property can be declared as
const.

File size before:
text data bss dec hex filename
4889 240 0 5129 1409 dm-space-map-metadata.o

File size after:
text data bss dec hex filename
5139 0 0 5139 1413 dm-space-map-metadata.o

Signed-off-by: Bhumika Goyal
Signed-off-by: Mike Snitzer

Bhumika Goyal
2017-02-17 03:14:36 +0800
9b696229a dm persistent data: add cursor skip functions to the cursor APIs ... Browse Code »

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2017-02-17 02:12:50 +0800
2151249ea dm bitset: add dm_bitset_new() ... Browse Code »

A more efficient way of creating a populated bitset.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2017-02-17 02:12:48 +0800
6fe28dbf0 dm bitset: introduce cursor api ... Browse Code »

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2017-02-17 02:12:45 +0800
9f9ef0657 dm btree: use GFP_NOFS in dm_btree_del() ... Browse Code »

dm_btree_del() is called from an ioctl so don't recurse into FS.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2017-02-17 02:09:10 +0800
3ba3ba1e8 dm space map common: memcpy the disk root to ensure it's arch aligned ... Browse Code »

The metadata_space_map_root passed to sm_ll_open_metadata() may or may
not be arch aligned, use memcpy to ensure it is. This is not a fast
path so the extra memcpy doesn't hurt us.

Long-term it'd be better to use the kernel's alignment infrastructure to
remove the memcpy()s that are littered across persistent-data (btree,
array, space-maps, etc).

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2017-02-17 02:02:54 +0800
602548bdd dm block manager: add unlikely() annotations on dm_bufio error paths ... Browse Code »

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2017-02-17 02:01:07 +0800

14 Jan, 2017

1 commit

642fa448a sched/core: Remove set_task_state() ... Browse Code »

This is a nasty interface and setting the state of a foreign task must
not be done. As of the following commit:

be628be0956 ("bcache: Make gc wakeup sane, remove set_task_state()")

... everyone in the kernel calls set_task_state() with current, allowing
the helper to be removed.

However, as the comment indicates, it is still around for those archs
where computing current is more expensive than using a pointer, at least
in theory. An important arch that is affected is arm64, however this has
been addressed now [1] and performance is up to par making no difference
with either calls.

Of all the callers, if any, it's the locking bits that would care most
about this -- ie: we end up passing a tsk pointer to a lot of the lock
slowpath, and setting ->state on that. The following numbers are based
on two tests: a custom ad-hoc microbenchmark that just measures
latencies (for ~65 million calls) between get_task_state() vs
get_current_state().

Secondly for a higher overview, an unlink microbenchmark was used,
which pounds on a single file with open, close,unlink combos with
increasing thread counts (up to 4x ncpus). While the workload is quite
unrealistic, it does contend a lot on the inode mutex or now rwsem.

[1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com

== 1. x86-64 ==

Avg runtime set_task_state(): 601 msecs
Avg runtime set_current_state(): 552 msecs

vanilla dirty
Hmean unlink1-processes-2 36089.26 ( 0.00%) 38977.33 ( 8.00%)
Hmean unlink1-processes-5 28555.01 ( 0.00%) 29832.55 ( 4.28%)
Hmean unlink1-processes-8 37323.75 ( 0.00%) 44974.57 ( 20.50%)
Hmean unlink1-processes-12 43571.88 ( 0.00%) 44283.01 ( 1.63%)
Hmean unlink1-processes-21 34431.52 ( 0.00%) 38284.45 ( 11.19%)
Hmean unlink1-processes-30 34813.26 ( 0.00%) 37975.17 ( 9.08%)
Hmean unlink1-processes-48 37048.90 ( 0.00%) 39862.78 ( 7.59%)
Hmean unlink1-processes-79 35630.01 ( 0.00%) 36855.30 ( 3.44%)
Hmean unlink1-processes-110 36115.85 ( 0.00%) 39843.91 ( 10.32%)
Hmean unlink1-processes-141 32546.96 ( 0.00%) 35418.52 ( 8.82%)
Hmean unlink1-processes-172 34674.79 ( 0.00%) 36899.21 ( 6.42%)
Hmean unlink1-processes-203 37303.11 ( 0.00%) 36393.04 ( -2.44%)
Hmean unlink1-processes-224 35712.13 ( 0.00%) 36685.96 ( 2.73%)

== 2. ppc64le ==

Avg runtime set_task_state(): 938 msecs
Avg runtime set_current_state: 940 msecs

vanilla dirty
Hmean unlink1-processes-2 19269.19 ( 0.00%) 30704.50 ( 59.35%)
Hmean unlink1-processes-5 20106.15 ( 0.00%) 21804.15 ( 8.45%)
Hmean unlink1-processes-8 17496.97 ( 0.00%) 17243.28 ( -1.45%)
Hmean unlink1-processes-12 14224.15 ( 0.00%) 17240.21 ( 21.20%)
Hmean unlink1-processes-21 14155.66 ( 0.00%) 15681.23 ( 10.78%)
Hmean unlink1-processes-30 14450.70 ( 0.00%) 15995.83 ( 10.69%)
Hmean unlink1-processes-48 16945.57 ( 0.00%) 16370.42 ( -3.39%)
Hmean unlink1-processes-79 15788.39 ( 0.00%) 14639.27 ( -7.28%)
Hmean unlink1-processes-110 14268.48 ( 0.00%) 14377.40 ( 0.76%)
Hmean unlink1-processes-141 14023.65 ( 0.00%) 16271.69 ( 16.03%)
Hmean unlink1-processes-172 13417.62 ( 0.00%) 16067.55 ( 19.75%)
Hmean unlink1-processes-203 15293.08 ( 0.00%) 15440.40 ( 0.96%)
Hmean unlink1-processes-234 13719.32 ( 0.00%) 16190.74 ( 18.01%)
Hmean unlink1-processes-265 16400.97 ( 0.00%) 16115.22 ( -1.74%)
Hmean unlink1-processes-296 14388.60 ( 0.00%) 16216.13 ( 12.70%)
Hmean unlink1-processes-320 15771.85 ( 0.00%) 15905.96 ( 0.85%)

x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
and ppc64 (with paca) show similar improvements in the unlink microbenches.
The small delta for ppc64 (2ms), does not represent the gains on the unlink
runs. In the case of x86, there was a decent amount of variation in the
latency runs, but always within a 20 to 50ms increase), ppc was more constant.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: mark.rutland@arm.com
Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar

Davidlohr Bueso
2017-01-14 18:14:16 +0800

09 Dec, 2016

4 commits

b446396b7 dm space map: always set ev if sm_ll_mutate() succeeds ... Browse Code »

If no block was allocated or freed, sm_ll_mutate() wasn't setting
*ev, leaving the variable unitialized. sm_ll_insert(),
sm_disk_inc_block(), and sm_disk_new_block() all check ev to see
if there was an allocation event in sm_ll_mutate(), possibly
reading unitialized data.

If no allocation event occured, sm_ll_mutate() should set *ev
to SM_NONE.

Signed-off-by: Benjamin Marzinski
Acked-by: Joe Thornber
Signed-off-by: Mike Snitzer

Benjamin Marzinski
2016-12-09 03:13:15 +0800
0c79ce0b7 dm space map metadata: skip useless memcpy in metadata_ll_init_index() ... Browse Code »

When metadata_ll_init_index() is called by sm_ll_new_metadata(),
ll->mi_le hasn't been initialized yet. So, when
metadata_ll_init_index() copies the contents of ll->mi_le into the
newly allocated bitmap_root, it is just copying garbage. ll->mi_le
will be allocated later in sm_ll_extend() and copied into the
bitmap_root, in sm_ll_commit().

Signed-off-by: Benjamin Marzinski
Signed-off-by: Mike Snitzer

Benjamin Marzinski
2016-12-09 03:13:15 +0800
314c25c56 dm space map metadata: fix 'struct sm_metadata' leak on failed create ... Browse Code »

In dm_sm_metadata_create() we temporarily change the dm_space_map
operations from 'ops' (whose .destroy function deallocates the
sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't).

If dm_sm_metadata_create() fails in sm_ll_new_metadata() or
sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls
dm_sm_destroy() with the intention of freeing the sm_metadata, but it
doesn't (because the dm_space_map operations is still set to
'bootstrap_ops').

Fix this by setting the dm_space_map operations back to 'ops' if
dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'.

Signed-off-by: Benjamin Marzinski
Acked-by: Joe Thornber
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org

Benjamin Marzinski
2016-12-09 03:13:14 +0800
0637018df dm array: remove a dead assignment in populate_ablock_with_values() ... Browse Code »

A value is assigned to 'nr_entries' but is never used, remove it.

Signed-off-by: Bart Van Assche
Signed-off-by: Mike Snitzer

Bart Van Assche
2016-12-09 03:13:09 +0800

15 Nov, 2016

1 commit

2e8ed7110 dm block manager: make block locking optional ... Browse Code »

The block manager's locking is useful for catching cycles that may
result from certain btree metadata corruption. But in general it serves
as a developer tool to catch bugs in code. Unless you're finding that
DM thin provisioning is hanging due to infinite loops within the block
manager's access to btree nodes you can safely disable this feature.

Signed-off-by: Joe Thornber
Signed-off-by: Arnd Bergmann # do/while(0) macro fix
Signed-off-by: Mike Snitzer

Joe Thornber
2016-11-15 04:17:47 +0800

22 Sep, 2016

3 commits

fdd1315aa dm array: introduce cursor api ... Browse Code »

More efficient way to iterate an array due to prefetching (makes use of
the new dm_btree_cursor_* api).

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2016-09-22 23:15:04 +0800
7d111c81f dm btree: introduce cursor api ... Browse Code »

This uses prefetching to speed up iteration through a btree.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2016-09-22 23:15:04 +0800
dd6a77d99 dm array: add dm_array_new() ... Browse Code »

dm_array_new() creates a new, populated array more efficiently than
starting with an empty one and resizing.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2016-09-22 23:12:23 +0800

21 Jul, 2016

1 commit

e7e0f7304 dm btree: fix a bug in dm_btree_find_next_single() ... Browse Code »

dm_btree_find_next_single() can short-circuit the search for a block
with a return of -ENODATA if all entries are higher than the search key
passed to lower_bound().

This hasn't been a problem because of the way the btree has been used by
DM thinp. But it must be fixed now in preparation for fixing the race
in DM thinp's handling of simultaneous block discard vs allocation.
Otherwise, once that fix is in place, some of the blocks in a discard
would not be unmapped as expected.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2016-07-21 00:43:34 +0800

14 Dec, 2015

1 commit

512167788 dm space map metadata: remove unused variable in brb_pop() ... Browse Code »

Remove the unused struct block_op pointer that was inadvertantly
introduced, via cut-and-paste of previous brb_op() code, as part of
commit 50dd842ad.

(Cc'ing stable@ because commit 50dd842ad did)

Fixes: 50dd842ad ("dm space map metadata: fix ref counting bug when bootstrapping a new space map")
Reported-by: David Binderman
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org

Mike Snitzer
2015-12-14 22:26:01 +0800

10 Dec, 2015

1 commit

ba503835a dm btree: factor out need_insert() helper ... Browse Code »

Eliminates code duplication within insert().

Signed-off-by: Mike Snitzer

Mike Snitzer
2015-12-10 23:38:59 +0800