Eric Lee / smarc-fsl-linux-kernel

31 Jan, 2019

1 commit

294710ddf dm thin: fix passdown_double_checking_shared_status() ... Browse Code »

commit d445bd9cec1a850c2100fcf53684c13b3fd934f2 upstream.

Commit 00a0ea33b495 ("dm thin: do not queue freed thin mapping for next
stage processing") changed process_prepared_discard_passdown_pt1() to
increment all the blocks being discarded until after the passdown had
completed to avoid them being prematurely reused.

IO issued to a thin device that breaks sharing with a snapshot, followed
by a discard issued to snapshot(s) that previously shared the block(s),
results in passdown_double_checking_shared_status() being called to
iterate through the blocks double checking their reference count is zero
and issuing the passdown if so. So a side effect of commit 00a0ea33b495
is passdown_double_checking_shared_status() was broken.

Fix this by checking if the block reference count is greater than 1.
Also, rename dm_pool_block_is_used() to dm_pool_block_is_shared().

Fixes: 00a0ea33b495 ("dm thin: do not queue freed thin mapping for next stage processing")
Cc: stable@vger.kernel.org # 4.9+
Reported-by: ryan.p.norwood@gmail.com
Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Greg Kroah-Hartman

Joe Thornber
2019-01-31 15:13:45 +0800

21 Dec, 2018

1 commit

cd5d8a920 dm thin: send event about thin-pool state change _after_ making it ... Browse Code »

commit f6c367585d0d851349d3a9e607c43e5bea993fa1 upstream.

Sending a DM event before a thin-pool state change is about to happen is
a bug. It wasn't realized until it became clear that userspace response
to the event raced with the actual state change that the event was
meant to notify about.

Fix this by first updating internal thin-pool state to reflect what the
DM event is being issued about. This fixes a long-standing racey/buggy
userspace device-mapper-test-suite 'resize_io' test that would get an
event but not find the state it was looking for -- so it would just go
on to hang because no other events caused the test to reevaluate the
thin-pool's state.

Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer
Signed-off-by: Greg Kroah-Hartman

Mike Snitzer
2018-12-21 21:13:05 +0800

10 Oct, 2018

1 commit

1484d4ff2 dm thin metadata: try to avoid ever aborting transactions ... Browse Code »

[ Upstream commit 3ab91828166895600efd9cdc3a0eb32001f7204a ]

Committing a transaction can consume some metadata of it's own, we now
reserve a small amount of metadata to cover this. Free metadata
reported by the kernel will not include this reserve.

If any of the reserve has been used after a commit we enter a new
internal state PM_OUT_OF_METADATA_SPACE. This is reported as
PM_READ_ONLY, so no userland changes are needed. If the metadata
device is resized the pool will move back to PM_WRITE.

These changes mean we never need to abort and rollback a transaction due
to running out of metadata space. This is particularly important
because there have been a handful of reports of data corruption against
DM thin-provisioning that can all be attributed to the thin-pool having
ran out of metadata space.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Joe Thornber
2018-10-10 14:54:25 +0800

10 Sep, 2018

1 commit

3bef88257 dm thin: stop no_space_timeout worker when switching to write-mode ... Browse Code »

commit 75294442d896f2767be34f75aca7cc2b0d01301f upstream.

Now both check_for_space() and do_no_space_timeout() will read & write
pool->pf.error_if_no_space. If these functions run concurrently, as
shown in the following case, the default setting of "queue_if_no_space"
can get lost.

precondition:
* error_if_no_space = false (aka "queue_if_no_space")
* pool is in Out-of-Data-Space (OODS) mode
* no_space_timeout worker has been queued

CPU 0: CPU 1:
// delete a thin device
process_delete_mesg()
// check_for_space() invoked by commit()
set_pool_mode(pool, PM_WRITE)
pool->pf.error_if_no_space = \
pt->requested_pf.error_if_no_space

// timeout, pool is still in OODS mode
do_no_space_timeout
// "queue_if_no_space" config is lost
pool->pf.error_if_no_space = true
pool->pf.mode = new_mode

Fix it by stopping no_space_timeout worker when switching to write mode.

Fixes: bcc696fac11f ("dm thin: stay in out-of-data-space mode once no_space_timeout expires")
Cc: stable@vger.kernel.org
Signed-off-by: Hou Tao
Signed-off-by: Mike Snitzer
Signed-off-by: Greg Kroah-Hartman

Hou Tao
2018-09-10 01:55:55 +0800

03 Jul, 2018

1 commit

0b19825ff dm thin: handle running out of data space vs concurrent discard ... Browse Code »

commit a685557fbbc3122ed11e8ad3fa63a11ebc5de8c3 upstream.

Discards issued to a DM thin device can complete to userspace (via
fstrim) _before_ the metadata changes associated with the discards is
reflected in the thinp superblock (e.g. free blocks). As such, if a
user constructs a test that loops repeatedly over these steps, block
allocation can fail due to discards not having completed yet:
1) fill thin device via filesystem file
2) remove file
3) fstrim

From initial report, here:
https://www.redhat.com/archives/dm-devel/2018-April/msg00022.html

"The root cause of this issue is that dm-thin will first remove
mapping and increase corresponding blocks' reference count to prevent
them from being reused before DISCARD bios get processed by the
underlying layers. However. increasing blocks' reference count could
also increase the nr_allocated_this_transaction in struct sm_disk
which makes smd->old_ll.nr_allocated +
smd->nr_allocated_this_transaction bigger than smd->old_ll.nr_blocks.
In this case, alloc_data_block() will never commit metadata to reset
the begin pointer of struct sm_disk, because sm_disk_get_nr_free()
always return an underflow value."

While there is room for improvement to the space-map accounting that
thinp is making use of: the reality is this test is inherently racey and
will result in the previous iteration's fstrim's discard(s) completing
vs concurrent block allocation, via dd, in the next iteration of the
loop.

No amount of space map accounting improvements will be able to allow
user's to use a block before a discard of that block has completed.

So the best we can really do is allow DM thinp to gracefully handle such
aggressive use of all the pool's data by degrading the pool into
out-of-data-space (OODS) mode. We _should_ get that behaviour already
(if space map accounting didn't falsely cause alloc_data_block() to
believe free space was available).. but short of that we handle the
current reality that dm_pool_alloc_data_block() can return -ENOSPC.

Reported-by: Dennis Yang
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer
Signed-off-by: Greg Kroah-Hartman

Mike Snitzer
2018-07-03 17:25:05 +0800

20 Dec, 2017

1 commit

fbce429b4 dm: fix various targets to dm_register_target after module __init resources created ... Browse Code »

commit 7e6358d244e4706fe612a77b9c36519a33600ac0 upstream.

A NULL pointer is seen if two concurrent "vgchange -ay -K "
processes race to load the dm-thin-pool module:

PID: 25992 TASK: ffff883cd7d23500 CPU: 4 COMMAND: "vgchange"
#0 [ffff883cd743d600] machine_kexec at ffffffff81038fa9
0000001 [ffff883cd743d660] crash_kexec at ffffffff810c5992
0000002 [ffff883cd743d730] oops_end at ffffffff81515c90
0000003 [ffff883cd743d760] no_context at ffffffff81049f1b
0000004 [ffff883cd743d7b0] __bad_area_nosemaphore at ffffffff8104a1a5
0000005 [ffff883cd743d800] bad_area at ffffffff8104a2ce
0000006 [ffff883cd743d830] __do_page_fault at ffffffff8104aa6f
0000007 [ffff883cd743d950] do_page_fault at ffffffff81517bae
0000008 [ffff883cd743d980] page_fault at ffffffff81514f95
[exception RIP: kmem_cache_alloc+108]
RIP: ffffffff8116ef3c RSP: ffff883cd743da38 RFLAGS: 00010046
RAX: 0000000000000004 RBX: ffffffff81121b90 RCX: ffff881bf1e78cc0
RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000000
RBP: ffff883cd743da68 R8: ffff881bf1a4eb00 R9: 0000000080042000
R10: 0000000000002000 R11: 0000000000000000 R12: 00000000000000d0
R13: 0000000000000000 R14: 00000000000000d0 R15: 0000000000000246
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
0000009 [ffff883cd743da70] mempool_alloc_slab at ffffffff81121ba5
0000010 [ffff883cd743da80] mempool_create_node at ffffffff81122083
0000011 [ffff883cd743dad0] mempool_create at ffffffff811220f4
0000012 [ffff883cd743dae0] pool_ctr at ffffffffa08de049 [dm_thin_pool]
0000013 [ffff883cd743dbd0] dm_table_add_target at ffffffffa0005f2f [dm_mod]
0000014 [ffff883cd743dc30] table_load at ffffffffa0008ba9 [dm_mod]
0000015 [ffff883cd743dc90] ctl_ioctl at ffffffffa0009dc4 [dm_mod]

The race results in a NULL pointer because:

Process A (vgchange -ay -K):
a. send DM_LIST_VERSIONS_CMD ioctl;
b. pool_target not registered;
c. modprobe dm_thin_pool and wait until end.

Process B (vgchange -ay -K):
a. send DM_LIST_VERSIONS_CMD ioctl;
b. pool_target registered;
c. table_load->dm_table_add_target->pool_ctr;
d. _new_mapping_cache is NULL and panic.
Note:
1. process A and process B are two concurrent processes.
2. pool_target can be detected by process B but
_new_mapping_cache initialization has not ended.

To fix dm-thin-pool, and other targets (cache, multipath, and snapshot)
with the same problem, simply dm_register_target() after all resources
created during module init (as labelled with __init) are finished.

Signed-off-by: monty
Signed-off-by: Mike Snitzer
Signed-off-by: Greg Kroah-Hartman

monty_pavel@sina.com
2017-12-20 17:10:21 +0800

15 Sep, 2017

1 commit

dff4d1f6f Merge tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git… ... Browse Code »

…/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- Some request-based DM core and DM multipath fixes and cleanups

- Constify a few variables in DM core and DM integrity

- Add bufio optimization and checksum failure accounting to DM
integrity

- Fix DM integrity to avoid checking integrity of failed reads

- Fix DM integrity to use init_completion

- A couple DM log-writes target fixes

- Simplify DAX flushing by eliminating the unnecessary flush
abstraction that was stood up for DM's use.

* tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dax: remove the pmem_dax_ops->flush abstraction
dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACK
dm integrity: make blk_integrity_profile structure const
dm integrity: do not check integrity for failed read operations
dm log writes: fix >512b sectorsize support
dm log writes: don't use all the cpu while waiting to log blocks
dm ioctl: constify ioctl lookup table
dm: constify argument arrays
dm integrity: count and display checksum failures
dm integrity: optimize writing dm-bufio buffers that are partially changed
dm rq: do not update rq partially in each ending bio
dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior
dm mpath: complain about unsupported __multipath_map_bio() return values
dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-through

Linus Torvalds
2017-09-15 04:43:16 +0800

28 Aug, 2017

1 commit

5916a22b8 dm: constify argument arrays ... Browse Code »

The arrays of 'struct dm_arg' are never modified by the device-mapper
core, so constify them so that they are placed in .rodata.

(Exception: the args array in dm-raid cannot be constified because it is
allocated on the stack and modified.)

Signed-off-by: Eric Biggers
Signed-off-by: Mike Snitzer

Eric Biggers
2017-08-28 23:47:18 +0800

24 Aug, 2017

1 commit

74d46992e block: replace bi_bdev with a gendisk pointer and partitions index ... Browse Code »

This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-08-24 02:49:55 +0800

04 Jul, 2017

1 commit

c6b1e36c8 Merge branch 'for-4.13/block' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block/IO updates from Jens Axboe:
"This is the main pull request for the block layer for 4.13. Not a huge
round in terms of features, but there's a lot of churn related to some
core cleanups.

Note this depends on the UUID tree pull request, that Christoph
already sent out.

This pull request contains:

- A series from Christoph, unifying the error/stats codes in the
block layer. We now use blk_status_t everywhere, instead of using
different schemes for different places.

- Also from Christoph, some cleanups around request allocation and IO
scheduler interactions in blk-mq.

- And yet another series from Christoph, cleaning up how we handle
and do bounce buffering in the block layer.

- A blk-mq debugfs series from Bart, further improving on the support
we have for exporting internal information to aid debugging IO
hangs or stalls.

- Also from Bart, a series that cleans up the request initialization
differences across types of devices.

- A series from Goldwyn Rodrigues, allowing the block layer to return
failure if we will block and the user asked for non-blocking.

- Patch from Hannes for supporting setting loop devices block size to
that of the underlying device.

- Two series of patches from Javier, fixing various issues with
lightnvm, particular around pblk.

- A series from me, adding support for write hints. This comes with
NVMe support as well, so applications can help guide data placement
on flash to improve performance, latencies, and write
amplification.

- A series from Ming, improving and hardening blk-mq support for
stopping/starting and quiescing hardware queues.

- Two pull requests for NVMe updates. Nothing major on the feature
side, but lots of cleanups and bug fixes. From the usual crew.

- A series from Neil Brown, greatly improving the bio rescue set
support. Most notably, this kills the bio rescue work queues, if we
don't really need them.

- Lots of other little bug fixes that are all over the place"

* 'for-4.13/block' of git://git.kernel.dk/linux-block: (217 commits)
lightnvm: pblk: set line bitmap check under debug
lightnvm: pblk: verify that cache read is still valid
lightnvm: pblk: add initialization check
lightnvm: pblk: remove target using async. I/Os
lightnvm: pblk: use vmalloc for GC data buffer
lightnvm: pblk: use right metadata buffer for recovery
lightnvm: pblk: schedule if data is not ready
lightnvm: pblk: remove unused return variable
lightnvm: pblk: fix double-free on pblk init
lightnvm: pblk: fix bad le64 assignations
nvme: Makefile: remove dead build rule
blk-mq: map all HWQ also in hyperthreaded system
nvmet-rdma: register ib_client to not deadlock in device removal
nvme_fc: fix error recovery on link down.
nvmet_fc: fix crashes on bad opcodes
nvme_fc: Fix crash when nvme controller connection fails.
nvme_fc: replace ioabort msleep loop with completion
nvme_fc: fix double calls to nvme_cleanup_cmd()
nvme-fabrics: verify that a controller returns the correct NQN
nvme: simplify nvme_dev_attrs_are_visible
...

Linus Torvalds
2017-07-04 01:34:51 +0800

28 Jun, 2017

1 commit

00a0ea33b dm thin: do not queue freed thin mapping for next stage processing ... Browse Code »

process_prepared_discard_passdown_pt1() should cleanup
dm_thin_new_mapping in cases of error.

dm_pool_inc_data_range() can fail trying to get a block reference:

metadata operation 'dm_pool_inc_data_range' failed: error = -61

When dm_pool_inc_data_range() fails, dm thin aborts current metadata
transaction and marks pool as PM_READ_ONLY. Memory for thin mapping
is released as well. However, current thin mapping will be queued
onto next stage as part of queue_passdown_pt2() or passdown_endio().
This dangling thin mapping memory when processed and accessed in
next stage will lead to device mapper crashing.

Code flow without fix:
-> process_prepared_discard_passdown_pt1(m)
-> dm_thin_remove_range()
-> discard passdown
--> passdown_endio(m) queues m onto next stage
-> dm_pool_inc_data_range() fails, frees memory m
but does not remove it from next stage queue

-> process_prepared_discard_passdown_pt2(m)
-> processes freed memory m and crashes

One such stack:

Call Trace:
[] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison]
[] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool]
[] process_prepared_discard_passdown_pt2+0x4b/0x90 [dm_thin_pool]
[] process_prepared+0x81/0xa0 [dm_thin_pool]
[] do_worker+0xc5/0x820 [dm_thin_pool]
[] ? __schedule+0x244/0x680
[] ? pwq_activate_delayed_work+0x42/0xb0
[] process_one_work+0x153/0x3f0
[] worker_thread+0x12b/0x4b0
[] ? rescuer_thread+0x350/0x350
[] kthread+0xca/0xe0
[] ? kthread_park+0x60/0x60
[] ret_from_fork+0x25/0x30

The fix is to first take the block ref count for discarded block and
then do a passdown discard of this block. If block ref count fails,
then bail out aborting current metadata transaction, mark pool as
PM_READ_ONLY and also free current thin mapping memory (existing error
handling code) without queueing this thin mapping onto next stage of
processing. If block ref count succeeds, then passdown discard of this
block. Discard callback of passdown_endio() will queue this thin mapping
onto next stage of processing.

Code flow with fix:
-> process_prepared_discard_passdown_pt1(m)
-> dm_thin_remove_range()
-> dm_pool_inc_data_range()
--> if fails, free memory m and bail out
-> discard passdown
--> passdown_endio(m) queues m onto next stage

Cc: stable # v4.9+
Reviewed-by: Eduardo Valentin
Reviewed-by: Cristian Gafton
Reviewed-by: Anchal Agarwal
Signed-off-by: Vallish Vaidyeshwara
Reviewed-by: Joe Thornber
Signed-off-by: Mike Snitzer

Vallish Vaidyeshwara
2017-06-28 03:14:34 +0800

09 Jun, 2017

2 commits

4e4cbee93 block: switch bios to blk_status_t ... Browse Code »

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-06-09 23:27:32 +0800
1be569098 dm: change ->end_io calling convention ... Browse Code »

Turn the error paramter into a pointer so that target drivers can change
the value, and make sure only DM_ENDIO_* values are returned from the
methods.

Signed-off-by: Christoph Hellwig
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-06-09 23:27:32 +0800

04 May, 2017

1 commit

d35a878ae Merge tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git… ... Browse Code »

…/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- A major update for DM cache that reduces the latency for deciding
whether blocks should migrate to/from the cache. The bio-prison-v2
interface supports this improvement by enabling direct dispatch of
work to workqueues rather than having to delay the actual work
dispatch to the DM cache core. So the dm-cache policies are much more
nimble by being able to drive IO as they see fit. One immediate
benefit from the improved latency is a cache that should be much more
adaptive to changing workloads.

- Add a new DM integrity target that emulates a block device that has
additional per-sector tags that can be used for storing integrity
information.

- Add a new authenticated encryption feature to the DM crypt target
that builds on the capabilities provided by the DM integrity target.

- Add MD interface for switching the raid4/5/6 journal mode and update
the DM raid target to use it to enable aid4/5/6 journal write-back
support.

- Switch the DM verity target over to using the asynchronous hash
crypto API (this helps work better with architectures that have
access to off-CPU algorithm providers, which should reduce CPU
utilization).

- Various request-based DM and DM multipath fixes and improvements from
Bart and Christoph.

- A DM thinp target fix for a bio structure leak that occurs for each
discard IFF discard passdown is enabled.

- A fix for a possible deadlock in DM bufio and a fix to re-check the
new buffer allocation watermark in the face of competing admin
changes to the 'max_cache_size_bytes' tunable.

- A couple DM core cleanups.

* tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits)
dm bufio: check new buffer allocation watermark every 30 seconds
dm bufio: avoid a possible ABBA deadlock
dm mpath: make it easier to detect unintended I/O request flushes
dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit()
dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH
dm: introduce enum dm_queue_mode to cleanup related code
dm mpath: verify __pg_init_all_paths locking assumptions at runtime
dm: verify suspend_locking assumptions at runtime
dm block manager: remove an unused argument from dm_block_manager_create()
dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue()
dm mpath: delay requeuing while path initialization is in progress
dm mpath: avoid that path removal can trigger an infinite loop
dm mpath: split and rename activate_path() to prepare for its expanded use
dm ioctl: prevent stack leak in dm ioctl call
dm integrity: use previously calculated log2 of sectors_per_block
dm integrity: use hex2bin instead of open-coded variant
dm crypt: replace custom implementation of hex2bin()
dm crypt: remove obsolete references to per-CPU state
dm verity: switch to using asynchronous hash crypto API
dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues
...

Linus Torvalds
2017-05-04 01:31:20 +0800

25 Apr, 2017

1 commit

948f581a5 dm thin: fix a memory leak when passing discard bio down ... Browse Code »

dm-thin does not free the discard_parent bio after all chained sub
bios finished. The following kmemleak report could be observed after
pool with discard_passdown option processes discard bios in
linux v4.11-rc7. To fix this, we drop the discard_parent bio reference
when its endio (passdown_endio) called.

unreferenced object 0xffff8803d6b29700 (size 256):
comm "kworker/u8:0", pid 30349, jiffies 4379504020 (age 143002.776s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01 00 00 00 00 00 00 f0 00 00 00 00 00 00 00 00 ................
backtrace:
[] kmemleak_alloc+0x49/0xa0
[] kmem_cache_alloc+0xb4/0x100
[] mempool_alloc_slab+0x10/0x20
[] mempool_alloc+0x55/0x150
[] bio_alloc_bioset+0xb9/0x260
[] process_prepared_discard_passdown_pt1+0x40/0x1c0 [dm_thin_pool]
[] break_up_discard_bio+0x1a9/0x200 [dm_thin_pool]
[] process_discard_cell_passdown+0x24/0x40 [dm_thin_pool]
[] process_discard_bio+0xdd/0xf0 [dm_thin_pool]
[] do_worker+0xa76/0xd50 [dm_thin_pool]
[] process_one_work+0x139/0x370
[] worker_thread+0x61/0x450
[] kthread+0xd6/0xf0
[] ret_from_fork+0x3f/0x70
[] 0xffffffffffffffff

Cc: stable@vger.kernel.org
Signed-off-by: Dennis Yang
Signed-off-by: Mike Snitzer

Dennis Yang
2017-04-25 02:58:10 +0800

09 Apr, 2017

1 commit

48920ff2a block: remove the discard_zeroes_data flag ... Browse Code »

Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.

Signed-off-by: Christoph Hellwig
Reviewed-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-04-09 01:25:38 +0800

08 Mar, 2017

1 commit

742c8fdc3 dm bio prison v2: new interface for the bio prison ... Browse Code »

The deferred set is gone and all methods have _v2 appended to the end of
their names to allow for continued use of the original bio prison in DM
thin-provisioning.

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2017-03-08 00:30:16 +0800

02 Feb, 2017

1 commit

dc3b17cc8 block: Use pointer to backing_dev_info from request_queue ... Browse Code »

We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.

Reviewed-by: Christoph Hellwig
Signed-off-by: Jan Kara
Signed-off-by: Jens Axboe

Jan Kara
2017-02-02 23:20:48 +0800

28 Jan, 2017

1 commit

f73f44eb0 block: add a op_is_flush helper ... Browse Code »

This centralizes the checks for bios that needs to be go into the flush
state machine.

Signed-off-by: Christoph Hellwig
Reviewed-by: Martin K. Petersen
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe

Christoph Hellwig
2017-01-28 00:01:45 +0800

08 Aug, 2016

1 commit

1eff9d322 block: rename bio bi_rw to bi_opf ... Browse Code »

Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe

Jens Axboe
2016-08-08 04:41:02 +0800

21 Jul, 2016

1 commit

2a0fbffb1 dm thin: fix a race condition between discarding and provisioning a block ... Browse Code »

The discard passdown was being issued after the block was unmapped,
which meant the block could be reprovisioned whilst the passdown discard
was still in flight.

We can only identify unshared blocks (safe to do a passdown a discard
to) once they're unmapped and their ref count hits zero. Block ref
counts are now used to guard against concurrent allocation of these
blocks that are being discarded. So now we unmap the block, issue
passdown discards, and the immediately increment ref counts for regions
that have been discarded via passed down (this is safe because
allocation occurs within the same thread). We then decrement ref counts
once the passdown discard IO is complete -- signaling these blocks may
now be allocated.

This fixes the potential for corruption that was reported here:
https://www.redhat.com/archives/dm-devel/2016-June/msg00311.html

Reported-by: Dennis Yang
Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2016-07-21 00:43:35 +0800

08 Jun, 2016

4 commits

28a8f0d31 block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH ... Browse Code »

To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
e6047149d dm: use bio op accessors ... Browse Code »

Separate the op from the rq_flag_bits and have dm
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
469e3216e block discard: use bio set op accessor ... Browse Code »

This converts the block issue discard helper and users to use
the bio_set_op_attrs accessor and only pass in the operation flags
like REQ_SEQURE.

Signed-off-by: Mike Christie
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
4e49ea4a3 block/fs/drivers: remove rw argument from submit_bio ... Browse Code »

This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.

Signed-off-by: Mike Christie

Fixed up fs/ext4/crypto.c

Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800

13 May, 2016

3 commits

202bae529 dm thin: unroll issue_discard() to create longer discard bio chains ... Browse Code »

There is little benefit to doing this but it does structure DM thinp's
code to more cleanly use the __blkdev_issue_discard() interface --
particularly in passdown_double_checking_shared_status().

Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer

Joe Thornber
2016-05-13 21:04:20 +0800
3dba53a95 dm thin: use __blkdev_issue_discard for async discard support ... Browse Code »

With commit 38f25255330 ("block: add __blkdev_issue_discard") DM thinp
no longer needs to carry its own async discard method.

Signed-off-by: Mike Snitzer
Acked-by: Joe Thornber
Reviewed-by: Christoph Hellwig

Mike Snitzer
2016-05-13 21:03:52 +0800
13e4f8a69 dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining() ... Browse Code »

DM thinp's use of bio_inc_remaining() is critical to ensure the original
parent discard bio isn't completed before sub-discards have. DM thinp
needs this due to the extra quiescing that occurs, via multiple DM thinp
mappings, while processing large discards. As such DM thinp must build
the async discard bio chain after some delay -- so bio_inc_remaining()
is used to enable DM thinp to take a reference on the original parent
discard bio for each mapping. This allows the immediate use of
bio_endio() on that discard bio; but with the understanding that the
actual completion won't occur until each of the sub-discards'
per-mapping references are dropped.

Signed-off-by: Mike Snitzer
Acked-by: Joe Thornber

Mike Snitzer
2016-05-13 21:03:52 +0800

06 May, 2016

1 commit

813923b1a dm thin: Remove return statement from void function ... Browse Code »

Return statement at the end of a void function is useless.

The Coccinelle semantic patch used to make this change is as follows:
//
@@
identifier f;
expression e;
@@
void f(...) {

}
//

Signed-off-by: Amitoj Kaur Chawla
Signed-off-by: Mike Snitzer

Amitoj Kaur Chawla
2016-05-06 03:25:50 +0800

12 Mar, 2016

1 commit

c3667cc61 dm thin: consistently return -ENOSPC if pool has run out of data space ... Browse Code »

Commit 0a927c2f02 ("dm thin: return -ENOSPC when erroring retry list due
to out of data space") was a step in the right direction but didn't go
far enough.

Add a new 'out_of_data_space' flag to 'struct pool' and set it if/when
the pool runs of of data space. This fixes cell_error() and
error_retry_list() to not blindly return -EIO.

We cannot rely on the 'error_if_no_space' feature flag since it is
transient (in that it can be reset once space is added, plus it only
controls whether errors are issued, it doesn't reflect whether the
pool is actually out of space).

Signed-off-by: Mike Snitzer

Mike Snitzer
2016-03-12 05:15:22 +0800

23 Feb, 2016

1 commit

30187e1d4 dm: rename target's per_bio_data_size to per_io_data_size ... Browse Code »

Request-based DM will also make use of per_bio_data_size.

Signed-off-by: Mike Snitzer

Mike Snitzer
2016-02-23 11:34:37 +0800

07 Jan, 2016

1 commit

1c2e54e1e dm thin: bump thin and thin-pool target versions ... Browse Code »

Commit 3d5f6733 ("dm thin metadata: speed up discard of partially mapped
volumes"), or some other dm-thinp change during the Linux 4.5
development window, really should've bumped these target versions.

Signed-off-by: Mike Snitzer

Mike Snitzer
2016-01-07 09:59:40 +0800

18 Dec, 2015

1 commit

18d03e8c2 dm thin: fix race condition when destroying thin pool workqueue ... Browse Code »

When a thin pool is being destroyed delayed work items are
cancelled using cancel_delayed_work(), which doesn't guarantee that on
return the delayed item isn't running. This can cause the work item to
requeue itself on an already destroyed workqueue. Fix this by using
cancel_delayed_work_sync() which guarantees that on return the work item
is not running anymore.

Fixes: 905e51b39a555 ("dm thin: commit outstanding data every second")
Fixes: 85ad643b7e7e5 ("dm thin: add timeout to stop out-of-data-space mode holding IO forever")
Signed-off-by: Nikolay Borisov
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org

Nikolay Borisov
2015-12-18 04:47:20 +0800

24 Nov, 2015

1 commit

0fcb04d59 dm thin: fix regression in advertised discard limits ... Browse Code »

When establishing a thin device's discard limits we cannot rely on the
underlying thin-pool device's discard capabilities (which are inherited
from the thin-pool's underlying data device) given that DM thin devices
must provide discard support even when the thin-pool's underlying data
device doesn't support discards.

Users were exposed to this thin device discard limits regression if
their thin-pool's underlying data device does _not_ support discards.
This regression caused all upper-layers that called the
blkdev_issue_discard() interface to not be able to issue discards to
thin devices (because discard_granularity was 0). This regression
wasn't caught earlier because the device-mapper-test-suite's extensive
'thin-provisioning' discard tests are only ever performed against
thin-pool's with data devices that support discards.

Fix is to have thin_io_hints() test the pool's 'discard_enabled' feature
rather than inferring whether or not a thin device's discard support
should be enabled by looking at the thin-pool's discard_granularity.

Fixes: 216076705 ("dm thin: disable discard support for thin devices if pool's is disabled")
Reported-by: Mike Gerber
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org # 4.1+

Mike Snitzer
2015-11-24 03:54:46 +0800

16 Nov, 2015

1 commit

172c23861 dm thin: restore requested 'error_if_no_space' setting on OODS to WRITE transition ... Browse Code »

A thin-pool that is in out-of-data-space (OODS) mode may transition back
to write mode -- without the admin adding more space to the thin-pool --
if/when blocks are released (either by deleting thin devices or
discarding provisioned blocks).

But as part of the thin-pool's earlier transition to out-of-data-space
mode the thin-pool may have set the 'error_if_no_space' flag to true if
the no_space_timeout expires without more space having been made
available. That implementation detail, of changing the pool's
error_if_no_space setting, needs to be reset back to the default that
the user specified when the thin-pool's table was loaded.

Otherwise we'll drop the user requested behaviour on the floor when this
out-of-data-space to write mode transition occurs.

Reported-by: Vivek Goyal
Signed-off-by: Mike Snitzer
Acked-by: Joe Thornber
Fixes: 2c43fd26e4 ("dm thin: fix missing out-of-data-space to write mode transition if blocks are released")
Cc: stable@vger.kernel.org

Mike Snitzer
2015-11-16 22:36:08 +0800

14 Oct, 2015

1 commit

ba30670f4 dm thin: fix missing pool reference count decrement in pool_ctr error path ... Browse Code »

Fixes: ac8c3f3df ("dm thin: generate event when metadata threshold passed")
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org # 3.10+

Mike Snitzer
2015-10-14 00:20:55 +0800

14 Sep, 2015

1 commit

216076705 dm thin: disable discard support for thin devices if pool's is disabled ... Browse Code »

If the pool is configured with 'ignore_discard' its discard support is
disabled. The pool's thin devices should also have queue_limits that
reflect discards are disabled.

Fixes: 34fbcf62 ("dm thin: range discard support")
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org # 4.1+

Mike Snitzer
2015-09-14 09:32:10 +0800

03 Sep, 2015

2 commits

1e1a4e8f4 Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm ... Browse Code »

Pull device mapper update from Mike Snitzer:

- a couple small cleanups in dm-cache, dm-verity, persistent-data's
dm-btree, and DM core.

- a 4.1-stable fix for dm-cache that fixes the leaking of deferred bio
prison cells

- a 4.2-stable fix that adds feature reporting for the dm-stats
features added in 4.2

- improve DM-snapshot to not invalidate the on-disk snapshot if
snapshot device write overflow occurs; but a write overflow triggered
through the origin device will still invalidate the snapshot.

- optimize DM-thinp's async discard submission a bit now that late bio
splitting has been included in block core.

- switch DM-cache's SMQ policy lock from using a mutex to a spinlock;
improves performance on very low latency devices (eg. NVMe SSD).

- document DM RAID 4/5/6's discard support

[ I did not pull the slab changes, which weren't appropriate for this
tree, and weren't obviously the right thing to do anyway. At the very
least they need some discussion and explanation before getting merged.

Because not pulling the actual tagged commit but doing a partial pull
instead, this merge commit thus also obviously is missing the git
signature from the original tag ]

* tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: fix use after freeing migrations
dm cache: small cleanups related to deferred prison cell cleanup
dm cache: fix leaking of deferred bio prison cells
dm raid: document RAID 4/5/6 discard support
dm stats: report precise_timestamps and histogram in @stats_list output
dm thin: optimize async discard submission
dm snapshot: don't invalidate on-disk image on snapshot write overflow
dm: remove unlikely() before IS_ERR()
dm: do not override error code returned from dm_get_device()
dm: test return value for DM_MAPIO_SUBMITTED
dm verity: remove unused mempool
dm cache: move wake_waker() from free_migrations() to where it is needed
dm btree remove: remove unused function get_nr_entries()
dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling()
dm cache policy smq: change the mutex to a spinlock

Linus Torvalds
2015-09-03 07:35:26 +0800
1081230b7 Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:
"This first core part of the block IO changes contains:

- Cleanup of the bio IO error signaling from Christoph. We used to
rely on the uptodate bit and passing around of an error, now we
store the error in the bio itself.

- Improvement of the above from myself, by shrinking the bio size
down again to fit in two cachelines on x86-64.

- Revert of the max_hw_sectors cap removal from a revision again,
from Jeff Moyer. This caused performance regressions in various
tests. Reinstate the limit, bump it to a more reasonable size
instead.

- Make /sys/block//queue/discard_max_bytes writeable, by me.
Most devices have huge trim limits, which can cause nasty latencies
when deleting files. Enable the admin to configure the size down.
We will look into having a more sane default instead of UINT_MAX
sectors.

- Improvement of the SGP gaps logic from Keith Busch.

- Enable the block core to handle arbitrarily sized bios, which
enables a nice simplification of bio_add_page() (which is an IO hot
path). From Kent.

- Improvements to the partition io stats accounting, making it
faster. From Ming Lei.

- Also from Ming Lei, a basic fixup for overflow of the sysfs pending
file in blk-mq, as well as a fix for a blk-mq timeout race
condition.

- Ming Lin has been carrying Kents above mentioned patches forward
for a while, and testing them. Ming also did a few fixes around
that.

- Sasha Levin found and fixed a use-after-free problem introduced by
the bio->bi_error changes from Christoph.

- Small blk cgroup cleanup from Viresh Kumar"

* 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
blk: Fix bio_io_vec index when checking bvec gaps
block: Replace SG_GAPS with new queue limits mask
block: bump BLK_DEF_MAX_SECTORS to 2560
Revert "block: remove artifical max_hw_sectors cap"
blk-mq: fix race between timeout and freeing request
blk-mq: fix buffer overflow when reading sysfs file of 'pending'
Documentation: update notes in biovecs about arbitrarily sized bios
block: remove bio_get_nr_vecs()
fs: use helper bio_add_page() instead of open coding on bi_io_vec
block: kill merge_bvec_fn() completely
md/raid5: get rid of bio_fits_rdev()
md/raid5: split bio for chunk_aligned_read
block: remove split code in blkdev_issue_{discard,write_same}
btrfs: remove bio splitting and merge_bvec_fn() calls
bcache: remove driver private bio splitting code
block: simplify bio_add_page()
block: make generic_make_request handle arbitrarily sized bios
blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
block: don't access bio->bi_error after bio_put()
block: shrink struct bio down to 2 cache lines again
...

Linus Torvalds
2015-09-03 04:10:25 +0800

18 Aug, 2015

1 commit

84f8bd86c dm thin: optimize async discard submission ... Browse Code »

__blkdev_issue_discard_async() doesn't need to worry about further
splitting because the upper layer blkdev_issue_discard() will have
already handled splitting bios such that the bi_size isn't
overflowed.

Signed-off-by: Mike Snitzer
Acked-by: Joe Thornber

Mike Snitzer
2015-08-18 23:36:11 +0800