24 Aug, 2017
1 commit
-
This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
04 Jul, 2017
1 commit
-
Pull core block/IO updates from Jens Axboe:
"This is the main pull request for the block layer for 4.13. Not a huge
round in terms of features, but there's a lot of churn related to some
core cleanups.Note this depends on the UUID tree pull request, that Christoph
already sent out.This pull request contains:
- A series from Christoph, unifying the error/stats codes in the
block layer. We now use blk_status_t everywhere, instead of using
different schemes for different places.- Also from Christoph, some cleanups around request allocation and IO
scheduler interactions in blk-mq.- And yet another series from Christoph, cleaning up how we handle
and do bounce buffering in the block layer.- A blk-mq debugfs series from Bart, further improving on the support
we have for exporting internal information to aid debugging IO
hangs or stalls.- Also from Bart, a series that cleans up the request initialization
differences across types of devices.- A series from Goldwyn Rodrigues, allowing the block layer to return
failure if we will block and the user asked for non-blocking.- Patch from Hannes for supporting setting loop devices block size to
that of the underlying device.- Two series of patches from Javier, fixing various issues with
lightnvm, particular around pblk.- A series from me, adding support for write hints. This comes with
NVMe support as well, so applications can help guide data placement
on flash to improve performance, latencies, and write
amplification.- A series from Ming, improving and hardening blk-mq support for
stopping/starting and quiescing hardware queues.- Two pull requests for NVMe updates. Nothing major on the feature
side, but lots of cleanups and bug fixes. From the usual crew.- A series from Neil Brown, greatly improving the bio rescue set
support. Most notably, this kills the bio rescue work queues, if we
don't really need them.- Lots of other little bug fixes that are all over the place"
* 'for-4.13/block' of git://git.kernel.dk/linux-block: (217 commits)
lightnvm: pblk: set line bitmap check under debug
lightnvm: pblk: verify that cache read is still valid
lightnvm: pblk: add initialization check
lightnvm: pblk: remove target using async. I/Os
lightnvm: pblk: use vmalloc for GC data buffer
lightnvm: pblk: use right metadata buffer for recovery
lightnvm: pblk: schedule if data is not ready
lightnvm: pblk: remove unused return variable
lightnvm: pblk: fix double-free on pblk init
lightnvm: pblk: fix bad le64 assignations
nvme: Makefile: remove dead build rule
blk-mq: map all HWQ also in hyperthreaded system
nvmet-rdma: register ib_client to not deadlock in device removal
nvme_fc: fix error recovery on link down.
nvmet_fc: fix crashes on bad opcodes
nvme_fc: Fix crash when nvme controller connection fails.
nvme_fc: replace ioabort msleep loop with completion
nvme_fc: fix double calls to nvme_cleanup_cmd()
nvme-fabrics: verify that a controller returns the correct NQN
nvme: simplify nvme_dev_attrs_are_visible
...
22 Jun, 2017
1 commit
-
If only a subset of the devices associated with multiple regions support
a given special operation (eg. DISCARD) then the dec_count() that is
used to set error for the region must increment the io->count.Otherwise, when the dec_count() is called it can cause the dm-io
caller's bio to be completed multiple times. As was reported against
the dm-mirror target that had mirror legs with a mix of discard
capabilities.Bug: https://bugzilla.kernel.org/show_bug.cgi?id=196077
Reported-by: Zhang Yi
Signed-off-by: Mike Snitzer
19 Jun, 2017
2 commits
-
This patch converts bioset_create() to not create a workqueue by
default, so alloctions will never trigger punt_bios_to_rescuer(). It
also introduces a new flag BIOSET_NEED_RESCUER which tells
bioset_create() to preserve the old behavior.All callers of bioset_create() that are inside block device drivers,
are given the BIOSET_NEED_RESCUER flag.biosets used by filesystems or other top-level users do not
need rescuing as the bio can never be queued behind other
bios. This includes fs_bio_set, blkdev_dio_pool,
btrfs_bioset, xfs_ioend_bioset, and one allocated by
target_core_iblock.c.biosets used by md/raid do not need rescuing as
their usage was recently audited and revised to never
risk deadlock.It is hoped that most, if not all, of the remaining biosets
can end up being the non-rescued version.Reviewed-by: Christoph Hellwig
Credit-to: Ming Lei (minor fixes)
Reviewed-by: Ming Lei
Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe -
"flags" arguments are often seen as good API design as they allow
easy extensibility.
bioset_create_nobvec() is implemented internally as a variation in
flags passed to __bioset_create().To support future extension, make the internal structure part of the
API.
i.e. add a 'flags' argument to bioset_create() and discard
bioset_create_nobvec().Note that the bio_split allocations in drivers/md/raid* do not need
the bvec mempool - they should have used bioset_create_nobvec().Suggested-by: Christoph Hellwig
Reviewed-by: Christoph Hellwig
Reviewed-by: Ming Lei
Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe
09 Jun, 2017
1 commit
-
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
09 Apr, 2017
2 commits
-
Copy & paste from the REQ_OP_WRITE_SAME code.
Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
Fix up do_region to not allocate a bio_vec for discards. We've
got rid of the discard payload allocated by the caller years ago.Obviously this wasn't actually harmful given how long it's been
there, but it's still good to avoid the pointless allocation.Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe
21 Nov, 2016
1 commit
-
Firstly we have mature bvec/bio iterator helper for iterate each
page in one bio, not necessary to reinvent a wheel to do that.Secondly the coming multipage bvecs requires this patch.
Also add comments about the direct access to bvec table.
Signed-off-by: Ming Lei
Signed-off-by: Mike Snitzer
08 Aug, 2016
1 commit
-
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.No intended functional changes in this commit.
Signed-off-by: Jens Axboe
11 Jun, 2016
1 commit
-
Add some seperation between bio-based and request-based DM core code.
'struct mapped_device' and other DM core only structures and functions
have been moved to dm-core.h and all relevant DM core .c files have been
updated to include dm-core.h rather than dm.hDM targets should _never_ include dm-core.h!
[block core merge conflict resolution from Stephen Rothwell]
Signed-off-by: Mike Snitzer
Signed-off-by: Stephen Rothwell
08 Jun, 2016
4 commits
-
To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
Separate the op from the rq_flag_bits and have dm
set/get the bio using bio_set_op_attrs/bio_op.Signed-off-by: Mike Christie
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
We currently set REQ_WRITE/WRITE for all non READ IOs
like discard, flush, writesame, etc. In the next patches where we
no longer set up the op as a bitmap, we will not be able to
detect a operation direction like writesame by testing if REQ_WRITE is
set.This has dm use the op_is_write helper which will do the right
thing.Signed-off-by: Mike Christie
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.Signed-off-by: Mike Christie
Fixed up fs/ext4/crypto.c
Signed-off-by: Jens Axboe
04 Jan, 2016
1 commit
-
Signed-off-by: Al Viro
01 Nov, 2015
1 commit
-
Remove DM's unneeded NULL tests before calling these destroy functions,
now that they check for NULL, thanks to these v4.3 commits:
3942d2991 ("mm/slab_common: allow NULL cache pointer in kmem_cache_destroy()")
4e3ca3e03 ("mm/mempool: allow NULL `pool' pointer in mempool_destroy()")The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)//
@@ expression x; @@
-if (x != NULL)
\(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
//Signed-off-by: Julia Lawall
Signed-off-by: Mike Snitzer
14 Aug, 2015
1 commit
-
We can always fill up the bio now, no need to estimate the possible
size based on queue parameters.Acked-by: Steven Whitehouse
Signed-off-by: Kent Overstreet
[hch: rebased and wrote a changelog]
Signed-off-by: Christoph Hellwig
Signed-off-by: Ming Lin
Signed-off-by: Jens Axboe
12 Aug, 2015
1 commit
-
Commit 4246a0b6 ("block: add a bi_error field to struct bio") has added a few
dereferences of 'bio' after a call to bio_put(). This causes use-after-frees
such as:[521120.719695] BUG: KASan: use after free in dio_bio_complete+0x2b3/0x320 at addr ffff880f36b38714
[521120.720638] Read of size 4 by task mount.ocfs2/9644
[521120.721212] =============================================================================
[521120.722056] BUG kmalloc-256 (Not tainted): kasan: bad access detected
[521120.722968] -----------------------------------------------------------------------------
[521120.722968]
[521120.723915] Disabling lock debugging due to kernel taint
[521120.724539] INFO: Slab 0xffffea003cdace00 objects=32 used=25 fp=0xffff880f36b38600 flags=0x46fffff80004080
[521120.726037] INFO: Object 0xffff880f36b38700 @offset=1792 fp=0xffff880f36b38800
[521120.726037]
[521120.726974] Bytes b4 ffff880f36b386f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.727898] Object ffff880f36b38700: 00 88 b3 36 0f 88 ff ff 00 00 d8 de 0b 88 ff ff ...6............
[521120.728822] Object ffff880f36b38710: 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.729705] Object ffff880f36b38720: 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................
[521120.730623] Object ffff880f36b38730: 00 00 00 00 00 00 00 00 01 00 00 00 00 02 00 00 ................
[521120.731621] Object ffff880f36b38740: 00 02 00 00 01 00 00 00 d0 f7 87 ad ff ff ff ff ................
[521120.732776] Object ffff880f36b38750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.733640] Object ffff880f36b38760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.734508] Object ffff880f36b38770: 01 00 03 00 01 00 00 00 88 87 b3 36 0f 88 ff ff ...........6....
[521120.735385] Object ffff880f36b38780: 00 73 22 ad 02 88 ff ff 40 13 e0 3c 00 ea ff ff .s".....@..ffff880f36b38700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.781465] ^
[521120.782083] ffff880f36b38780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.783717] ffff880f36b38800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[521120.784818] ==================================================================This patch fixes a few of those places that I caught while auditing the patch, but the
original patch should be audited further for more occurences of this issue since I'm
not too familiar with the code.Signed-off-by: Sasha Levin
Signed-off-by: Jens Axboe
29 Jul, 2015
1 commit
-
Currently we have two different ways to signal an I/O error on a BIO:
(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the bi_end_io callbackThe first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario. Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: NeilBrown
Signed-off-by: Jens Axboe
28 Feb, 2015
1 commit
-
Since it's possible for the discard and write same queue limits to
change while the upper level command is being sliced and diced, fix up
both of them (a) to reject IO if the special command is unsupported at
the start of the function and (b) read the limits once and let the
commands error out on their own if the status happens to change.Signed-off-by: Darrick J. Wong
Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org
14 Feb, 2015
1 commit
-
I created a dm-raid1 device backed by a device that supports DISCARD
and another device that does NOT support DISCARD with the following
dm configuration:# echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
# lsblk -D
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda 0 4K 1G 0
`-moo (dm-0) 0 4K 1G 0
sdb 0 0B 0B 0
`-moo (dm-0) 0 4K 1G 0Notice that the mirror device /dev/mapper/moo advertises DISCARD
support even though one of the mirror halves doesn't.If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
loop in do_region() when it tries to issue a DISCARD request to sdb.
The problem is that when we call do_region() against sdb, num_sectors
is set to zero because q->limits.max_discard_sectors is zero.
Therefore, "remaining" never decreases and the loop never terminates.To fix this: before entering the loop, check for the combination of
REQ_DISCARD and no discard and return -EOPNOTSUPP to avoid hanging up
the mirror device.This bug was found by the unfortunate coincidence of pvmove and a
discard operation in the RHEL 6.5 kernel; upstream is also affected.Signed-off-by: Darrick J. Wong
Acked-by: "Martin K. Petersen"
Signed-off-by: Mike Snitzer
Cc: stable@vger.kernel.org
02 Aug, 2014
1 commit
-
Remove the io struct off the stack in sync_io() and allocate it from
the mempool like is done in async_io().dec_count() now always calls a callback function and always frees the io
struct back to the mempool (so sync_io and async_io share this pattern).Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
11 Jul, 2014
1 commit
-
There's a race condition between the atomic_dec_and_test(&io->count)
in dec_count() and the waking of the sync_io() thread. If the thread
is spuriously woken immediately after the decrement it may exit,
making the on stack io struct invalid, yet the dec_count could still
be using it.Fix this race by using a completion in sync_io() and dec_count().
Reported-by: Minfei Huang
Signed-off-by: Joe Thornber
Signed-off-by: Mike Snitzer
Acked-by: Mikulas Patocka
Cc: stable@vger.kernel.org
18 Feb, 2014
1 commit
-
Commit 003b5c5719f159f4f4bf97511c4702a0638313dd ("block: Convert drivers
to immutable biovecs") broke dm-mirror due to dm-io breakage.dm-io had three possible iterators (DM_IO_PAGE_LIST, DM_IO_BVEC,
DM_IO_VMA) that iterate over pages where the I/O should be performed.The switch to immutable biovecs changed the DM_IO_BVEC iterator to
DM_IO_BIO. Before this change the iterator stored the pointer to a bio
vector in the dpages structure. The iterator incremented the pointer in
the dpages structure as it advanced over the pages. After the immutable
biovecs change, the DM_IO_BIO iterator stores a pointer to the bio in
the dpages structure and uses bio_advance to change the bio as it
advances.The problem is that the function dispatch_io stores the content of the
dpages structure into the variable old_pages and restores it before
issuing I/O to each of the devices. Before the change, the statement
"*dp = old_pages;" restored the iterator to its starting position.
After the change, struct dpages holds a pointer to the bio, thus the
statement "*dp = old_pages;" doesn't restore the iterator.Consequently, in the context of dm-mirror: only the first mirror leg is
written correctly, the kernel locks up when trying to write the other
mirror legs because the number of sectors to write in the where->count
variable doesn't match the number of sectors returned by the iterator.This patch fixes the bug by partially reverting the original patch - it
changes the code so that struct dpages holds a pointer to the bio vector,
so that the statement "*dp = old_pages;" restores the iterator correctly.The field "context_u" holds the offset from the beginning of the current
bio vector entry, just like the "bio->bi_iter.bi_bvec_done" field.Signed-off-by: Mikulas Patocka
Signed-off-by: Mike Snitzer
24 Nov, 2013
2 commits
-
Now that we've got a mechanism for immutable biovecs -
bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
respect it instead of using the bvec array directly.Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: NeilBrown
Cc: Alasdair Kergon
Cc: dm-devel@redhat.com -
Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: Geert Uytterhoeven
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "Ed L. Cashin"
Cc: Nick Piggin
Cc: Lars Ellenberg
Cc: Jiri Kosina
Cc: Matthew Wilcox
Cc: Geoff Levand
Cc: Yehuda Sadeh
Cc: Sage Weil
Cc: Alex Elder
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris
Cc: Philip Kelleher
Cc: Rusty Russell
Cc: "Michael S. Tsirkin"
Cc: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge
Cc: Neil Brown
Cc: Alasdair Kergon
Cc: Mike Snitzer
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh
Cc: Benny Halevy
Cc: "James E.J. Bottomley"
Cc: Greg Kroah-Hartman
Cc: "Nicholas A. Bellinger"
Cc: Alexander Viro
Cc: Chris Mason
Cc: "Theodore Ts'o"
Cc: Andreas Dilger
Cc: Jaegeuk Kim
Cc: Steven Whitehouse
Cc: Dave Kleikamp
Cc: Joern Engel
Cc: Prasad Joshi
Cc: Trond Myklebust
Cc: KONISHI Ryusuke
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Ben Myers
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Len Brown
Cc: Pavel Machek
Cc: "Rafael J. Wysocki"
Cc: Herton Ronaldo Krzesinski
Cc: Ben Hutchings
Cc: Andrew Morton
Cc: Guo Chao
Cc: Tejun Heo
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Wei Yongjun
Cc: "Roger Pau Monné"
Cc: Jan Beulich
Cc: Stefano Stabellini
Cc: Ian Campbell
Cc: Sebastian Ott
Cc: Christian Borntraeger
Cc: Minchan Kim
Cc: Jiang Liu
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Joe Perches
Cc: Peng Tao
Cc: Andy Adamson
Cc: fanchaoting
Cc: Jie Liu
Cc: Sunil Mushran
Cc: "Martin K. Petersen"
Cc: Namjae Jeon
Cc: Pankaj Kumar
Cc: Dan Magenheimer
Cc: Mel Gorman 6
23 Sep, 2013
1 commit
-
Allow user to change the number of IOs that are reserved by
bio-based DM's mempools by writing to this file:
/sys/module/dm_mod/parameters/reserved_bio_based_iosThe default value is RESERVED_BIO_BASED_IOS (16). The maximum allowed
value is RESERVED_MAX_IOS (1024).Export dm_get_reserved_bio_based_ios() for use by DM targets and core
code. Switch to sizing dm-io's mempool and bioset using DM core's
configurable 'reserved_bio_based_ios'.Signed-off-by: Mike Snitzer
Signed-off-by: Frank Mayhar
22 Dec, 2012
1 commit
-
Add WRITE SAME support to dm-io and make it accessible to
dm_kcopyd_zero(). dm_kcopyd_zero() provides an asynchronous interface
whereas the blkdev_issue_write_same() interface is synchronous.WRITE SAME is a SCSI command that can be leveraged for more efficient
zeroing of a specified logical extent of a device which supports it.
Only a single zeroed logical block is transfered to the target for each
WRITE SAME and the target then writes that same block across the
specified extent.The dm thin target uses this.
Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon
09 Sep, 2012
1 commit
-
With the old code, when you allocate a bio from a bio pool you have to
implement your own destructor that knows how to find the bio pool the
bio was originally allocated from.This adds a new field to struct bio (bi_pool) and changes
bio_alloc_bioset() to use it. This makes various bio destructors
unnecessary, so they're then deleted.v6: Explain the temporary if statement in bio_put
Signed-off-by: Kent Overstreet
CC: Jens Axboe
CC: NeilBrown
CC: Alasdair Kergon
CC: Nicholas Bellinger
CC: Lars Ellenberg
Acked-by: Tejun Heo
Acked-by: Nicholas Bellinger
Signed-off-by: Jens Axboe
08 Mar, 2012
1 commit
-
This patch fixes a crash by recognising discards in dm_io.
Currently dm_mirror can send REQ_DISCARD bios if running over a
discard-enabled device and without support in dm_io the system
crashes badly.BUG: unable to handle kernel paging request at 00800000
IP: __bio_add_page.part.17+0xf5/0x1e0
...
bio_add_page+0x56/0x70
dispatch_io+0x1cf/0x240 [dm_mod]
? km_get_page+0x50/0x50 [dm_mod]
? vm_next_page+0x20/0x20 [dm_mod]
? mirror_flush+0x130/0x130 [dm_mirror]
dm_io+0xdc/0x2b0 [dm_mod]
...Introduced in 2.6.38-rc1 by commit 5fc2ffeabb9ee0fc0e71ff16b49f34f0ed3d05b4
(dm raid1: support discard).Signed-off-by: Milan Broz
Cc: stable@kernel.org
Acked-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon
02 Aug, 2011
1 commit
-
For normal kernel pages, CPU cache is synchronized by the dma layer.
However, this is not done for pages allocated with vmalloc. If we do I/O
to/from vmallocated pages, we must synchronize CPU cache explicitly.Prior to doing I/O on vmallocated page we must call
flush_kernel_vmap_range to flush dirty cache on the virtual address.
After finished read we must call invalidate_kernel_vmap_range to
invalidate cache on the virtual address, so that accesses to the virtual
address return newly read data and not stale data from CPU cache.This patch fixes metadata corruption on dm-snapshots on PA-RISC and
possibly other architectures with caches indexed by virtual address.Cc: stable
Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon
29 May, 2011
1 commit
-
Replace the arbitrary calculation of an initial io struct mempool size
with a constant.The code calculated the number of reserved structures based on the request
size and used a "magic" multiplication constant of 4. This patch changes
it to reserve a fixed number - itself still chosen quite arbitrarily.
Further testing might show if there is a better number to choose.Note that if there is no memory pressure, we can still allocate an
arbitrary number of "struct io" structures. One structure is enough to
process the whole request.Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon
10 Mar, 2011
1 commit
-
With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.Signed-off-by: Jens Axboe
10 Sep, 2010
1 commit
-
This patch converts bio-based dm to support REQ_FLUSH/FUA instead of
now deprecated REQ_HARDBARRIER.* -EOPNOTSUPP handling logic dropped.
* Preflush is handled as before but postflush is dropped and replaced
with passing down REQ_FUA to member request_queues. This replaces
one array wide cache flush w/ member specific FUA writes.* __split_and_process_bio() now calls __clone_and_map_flush() directly
for flushes and guarantees all FLUSH bio's going to targets are zero
` length.* It's now guaranteed that all FLUSH bio's which are passed onto dm
targets are zero length. bio_empty_barrier() tests are replaced
with REQ_FLUSH tests.* Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes.
* Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely
enough to be marked with unlikely().* Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue
doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA
capability.* Request based dm isn't converted yet. dm_init_request_based_queue()
resets flush support to 0 for now. To avoid disturbing request
based dm code, dm->flush_error is added for bio based dm while
requested based dm continues to use dm->barrier_error.Lightly tested linear, stripe, raid1, snap and crypt targets. Please
proceed with caution as I'm not familiar with the code base.Signed-off-by: Tejun Heo
Cc: dm-devel@redhat.com
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe
08 Aug, 2010
1 commit
-
Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
11 Dec, 2009
3 commits
-
Accept empty barriers in dm-io.
dm-io will process empty write barrier requests just like the other
read/write requests.Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon -
Remove the hack where we allocate an extra bi_io_vec to store additional
private data. This hack prevents us from supporting barriers in
dm-raid1 without first making another little block layer change.
Instead of doing that, this patch eliminates the bi_io_vec abuse by
storing the region number directly in the low bits of bi_private.We need to store two things for each bio, the pointer to the main io
structure and, if parallel writes were requested, an index indicating
which of these writes this bio belongs to. There can be at most
BITS_PER_LONG regions - 32 or 64.The index (region number) was stored in the last (hidden) bio vector and
the pointer to struct io was stored in bi_private.This patch now aligns "struct io" on BITS_PER_LONG bytes and stores the
region number in the low BITS_PER_LONG bits of bi_private.Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon -
Allocate "struct io" from a slab.
This patch changes dm-io, so that "struct io" is allocated from a slab cache.
It used to be allocated with kmalloc. Allocating from a slab will be needed
for the next patch, because it requires a special alignment of "struct io"
and kmalloc cannot meet this alignment.Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon
22 Jun, 2009
1 commit
-
If -EOPNOTSUPP was returned and the request was a barrier request, retry it
without barrier.Retry all regions for now. Barriers are submitted only for one-region requests,
so it doesn't matter. (In the future, retries can be limited to the actual
regions that failed.)Signed-off-by: Mikulas Patocka
Signed-off-by: Alasdair G Kergon