24 Aug, 2017

1 commit

  • This way we don't need a block_device structure to submit I/O. The
    block_device has different life time rules from the gendisk and
    request_queue and is usually only available when the block device node
    is open. Other callers need to explicitly create one (e.g. the lightnvm
    passthrough code, or the new nvme multipathing code).

    For the actual I/O path all that we need is the gendisk, which exists
    once per block device. But given that the block layer also does
    partition remapping we additionally need a partition index, which is
    used for said remapping in generic_make_request.

    Note that all the block drivers generally want request_queue or
    sometimes the gendisk, so this removes a layer of indirection all
    over the stack.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

04 Jul, 2017

1 commit

  • Pull core block/IO updates from Jens Axboe:
    "This is the main pull request for the block layer for 4.13. Not a huge
    round in terms of features, but there's a lot of churn related to some
    core cleanups.

    Note this depends on the UUID tree pull request, that Christoph
    already sent out.

    This pull request contains:

    - A series from Christoph, unifying the error/stats codes in the
    block layer. We now use blk_status_t everywhere, instead of using
    different schemes for different places.

    - Also from Christoph, some cleanups around request allocation and IO
    scheduler interactions in blk-mq.

    - And yet another series from Christoph, cleaning up how we handle
    and do bounce buffering in the block layer.

    - A blk-mq debugfs series from Bart, further improving on the support
    we have for exporting internal information to aid debugging IO
    hangs or stalls.

    - Also from Bart, a series that cleans up the request initialization
    differences across types of devices.

    - A series from Goldwyn Rodrigues, allowing the block layer to return
    failure if we will block and the user asked for non-blocking.

    - Patch from Hannes for supporting setting loop devices block size to
    that of the underlying device.

    - Two series of patches from Javier, fixing various issues with
    lightnvm, particular around pblk.

    - A series from me, adding support for write hints. This comes with
    NVMe support as well, so applications can help guide data placement
    on flash to improve performance, latencies, and write
    amplification.

    - A series from Ming, improving and hardening blk-mq support for
    stopping/starting and quiescing hardware queues.

    - Two pull requests for NVMe updates. Nothing major on the feature
    side, but lots of cleanups and bug fixes. From the usual crew.

    - A series from Neil Brown, greatly improving the bio rescue set
    support. Most notably, this kills the bio rescue work queues, if we
    don't really need them.

    - Lots of other little bug fixes that are all over the place"

    * 'for-4.13/block' of git://git.kernel.dk/linux-block: (217 commits)
    lightnvm: pblk: set line bitmap check under debug
    lightnvm: pblk: verify that cache read is still valid
    lightnvm: pblk: add initialization check
    lightnvm: pblk: remove target using async. I/Os
    lightnvm: pblk: use vmalloc for GC data buffer
    lightnvm: pblk: use right metadata buffer for recovery
    lightnvm: pblk: schedule if data is not ready
    lightnvm: pblk: remove unused return variable
    lightnvm: pblk: fix double-free on pblk init
    lightnvm: pblk: fix bad le64 assignations
    nvme: Makefile: remove dead build rule
    blk-mq: map all HWQ also in hyperthreaded system
    nvmet-rdma: register ib_client to not deadlock in device removal
    nvme_fc: fix error recovery on link down.
    nvmet_fc: fix crashes on bad opcodes
    nvme_fc: Fix crash when nvme controller connection fails.
    nvme_fc: replace ioabort msleep loop with completion
    nvme_fc: fix double calls to nvme_cleanup_cmd()
    nvme-fabrics: verify that a controller returns the correct NQN
    nvme: simplify nvme_dev_attrs_are_visible
    ...

    Linus Torvalds
     

22 Jun, 2017

1 commit

  • If only a subset of the devices associated with multiple regions support
    a given special operation (eg. DISCARD) then the dec_count() that is
    used to set error for the region must increment the io->count.

    Otherwise, when the dec_count() is called it can cause the dm-io
    caller's bio to be completed multiple times. As was reported against
    the dm-mirror target that had mirror legs with a mix of discard
    capabilities.

    Bug: https://bugzilla.kernel.org/show_bug.cgi?id=196077
    Reported-by: Zhang Yi
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

19 Jun, 2017

2 commits

  • This patch converts bioset_create() to not create a workqueue by
    default, so alloctions will never trigger punt_bios_to_rescuer(). It
    also introduces a new flag BIOSET_NEED_RESCUER which tells
    bioset_create() to preserve the old behavior.

    All callers of bioset_create() that are inside block device drivers,
    are given the BIOSET_NEED_RESCUER flag.

    biosets used by filesystems or other top-level users do not
    need rescuing as the bio can never be queued behind other
    bios. This includes fs_bio_set, blkdev_dio_pool,
    btrfs_bioset, xfs_ioend_bioset, and one allocated by
    target_core_iblock.c.

    biosets used by md/raid do not need rescuing as
    their usage was recently audited and revised to never
    risk deadlock.

    It is hoped that most, if not all, of the remaining biosets
    can end up being the non-rescued version.

    Reviewed-by: Christoph Hellwig
    Credit-to: Ming Lei (minor fixes)
    Reviewed-by: Ming Lei
    Signed-off-by: NeilBrown
    Signed-off-by: Jens Axboe

    NeilBrown
     
  • "flags" arguments are often seen as good API design as they allow
    easy extensibility.
    bioset_create_nobvec() is implemented internally as a variation in
    flags passed to __bioset_create().

    To support future extension, make the internal structure part of the
    API.
    i.e. add a 'flags' argument to bioset_create() and discard
    bioset_create_nobvec().

    Note that the bio_split allocations in drivers/md/raid* do not need
    the bvec mempool - they should have used bioset_create_nobvec().

    Suggested-by: Christoph Hellwig
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Signed-off-by: NeilBrown
    Signed-off-by: Jens Axboe

    NeilBrown
     

09 Jun, 2017

1 commit

  • Replace bi_error with a new bi_status to allow for a clear conversion.
    Note that device mapper overloaded bi_error with a private value, which
    we'll have to keep arround at least for now and thus propagate to a
    proper blk_status_t value.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

09 Apr, 2017

2 commits

  • Copy & paste from the REQ_OP_WRITE_SAME code.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Fix up do_region to not allocate a bio_vec for discards. We've
    got rid of the discard payload allocated by the caller years ago.

    Obviously this wasn't actually harmful given how long it's been
    there, but it's still good to avoid the pointless allocation.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

21 Nov, 2016

1 commit


08 Aug, 2016

1 commit

  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

11 Jun, 2016

1 commit

  • Add some seperation between bio-based and request-based DM core code.

    'struct mapped_device' and other DM core only structures and functions
    have been moved to dm-core.h and all relevant DM core .c files have been
    updated to include dm-core.h rather than dm.h

    DM targets should _never_ include dm-core.h!

    [block core merge conflict resolution from Stephen Rothwell]
    Signed-off-by: Mike Snitzer
    Signed-off-by: Stephen Rothwell

    Mike Snitzer
     

08 Jun, 2016

4 commits

  • To avoid confusion between REQ_OP_FLUSH, which is handled by
    request_fn drivers, and upper layers requesting the block layer
    perform a flush sequence along with possibly a WRITE, this patch
    renames REQ_FLUSH to REQ_PREFLUSH.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • Separate the op from the rq_flag_bits and have dm
    set/get the bio using bio_set_op_attrs/bio_op.

    Signed-off-by: Mike Christie
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • We currently set REQ_WRITE/WRITE for all non READ IOs
    like discard, flush, writesame, etc. In the next patches where we
    no longer set up the op as a bitmap, we will not be able to
    detect a operation direction like writesame by testing if REQ_WRITE is
    set.

    This has dm use the op_is_write helper which will do the right
    thing.

    Signed-off-by: Mike Christie
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
    instead of passing it in. This makes that use the same as
    generic_make_request and how we set the other bio fields.

    Signed-off-by: Mike Christie

    Fixed up fs/ext4/crypto.c

    Signed-off-by: Jens Axboe

    Mike Christie
     

04 Jan, 2016

1 commit


01 Nov, 2015

1 commit

  • Remove DM's unneeded NULL tests before calling these destroy functions,
    now that they check for NULL, thanks to these v4.3 commits:
    3942d2991 ("mm/slab_common: allow NULL cache pointer in kmem_cache_destroy()")
    4e3ca3e03 ("mm/mempool: allow NULL `pool' pointer in mempool_destroy()")

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@ expression x; @@
    -if (x != NULL)
    \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Mike Snitzer

    Julia Lawall
     

14 Aug, 2015

1 commit

  • We can always fill up the bio now, no need to estimate the possible
    size based on queue parameters.

    Acked-by: Steven Whitehouse
    Signed-off-by: Kent Overstreet
    [hch: rebased and wrote a changelog]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ming Lin
    Signed-off-by: Jens Axboe

    Kent Overstreet
     

12 Aug, 2015

1 commit

  • Commit 4246a0b6 ("block: add a bi_error field to struct bio") has added a few
    dereferences of 'bio' after a call to bio_put(). This causes use-after-frees
    such as:

    [521120.719695] BUG: KASan: use after free in dio_bio_complete+0x2b3/0x320 at addr ffff880f36b38714
    [521120.720638] Read of size 4 by task mount.ocfs2/9644
    [521120.721212] =============================================================================
    [521120.722056] BUG kmalloc-256 (Not tainted): kasan: bad access detected
    [521120.722968] -----------------------------------------------------------------------------
    [521120.722968]
    [521120.723915] Disabling lock debugging due to kernel taint
    [521120.724539] INFO: Slab 0xffffea003cdace00 objects=32 used=25 fp=0xffff880f36b38600 flags=0x46fffff80004080
    [521120.726037] INFO: Object 0xffff880f36b38700 @offset=1792 fp=0xffff880f36b38800
    [521120.726037]
    [521120.726974] Bytes b4 ffff880f36b386f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [521120.727898] Object ffff880f36b38700: 00 88 b3 36 0f 88 ff ff 00 00 d8 de 0b 88 ff ff ...6............
    [521120.728822] Object ffff880f36b38710: 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [521120.729705] Object ffff880f36b38720: 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................
    [521120.730623] Object ffff880f36b38730: 00 00 00 00 00 00 00 00 01 00 00 00 00 02 00 00 ................
    [521120.731621] Object ffff880f36b38740: 00 02 00 00 01 00 00 00 d0 f7 87 ad ff ff ff ff ................
    [521120.732776] Object ffff880f36b38750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [521120.733640] Object ffff880f36b38760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [521120.734508] Object ffff880f36b38770: 01 00 03 00 01 00 00 00 88 87 b3 36 0f 88 ff ff ...........6....
    [521120.735385] Object ffff880f36b38780: 00 73 22 ad 02 88 ff ff 40 13 e0 3c 00 ea ff ff .s".....@..ffff880f36b38700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [521120.781465] ^
    [521120.782083] ffff880f36b38780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [521120.783717] ffff880f36b38800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [521120.784818] ==================================================================

    This patch fixes a few of those places that I caught while auditing the patch, but the
    original patch should be audited further for more occurences of this issue since I'm
    not too familiar with the code.

    Signed-off-by: Sasha Levin
    Signed-off-by: Jens Axboe

    Sasha Levin
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

28 Feb, 2015

1 commit

  • Since it's possible for the discard and write same queue limits to
    change while the upper level command is being sliced and diced, fix up
    both of them (a) to reject IO if the special command is unsupported at
    the start of the function and (b) read the limits once and let the
    commands error out on their own if the status happens to change.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Darrick J. Wong
     

14 Feb, 2015

1 commit

  • I created a dm-raid1 device backed by a device that supports DISCARD
    and another device that does NOT support DISCARD with the following
    dm configuration:

    # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
    # lsblk -D
    NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
    sda 0 4K 1G 0
    `-moo (dm-0) 0 4K 1G 0
    sdb 0 0B 0B 0
    `-moo (dm-0) 0 4K 1G 0

    Notice that the mirror device /dev/mapper/moo advertises DISCARD
    support even though one of the mirror halves doesn't.

    If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
    BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
    loop in do_region() when it tries to issue a DISCARD request to sdb.
    The problem is that when we call do_region() against sdb, num_sectors
    is set to zero because q->limits.max_discard_sectors is zero.
    Therefore, "remaining" never decreases and the loop never terminates.

    To fix this: before entering the loop, check for the combination of
    REQ_DISCARD and no discard and return -EOPNOTSUPP to avoid hanging up
    the mirror device.

    This bug was found by the unfortunate coincidence of pvmove and a
    discard operation in the RHEL 6.5 kernel; upstream is also affected.

    Signed-off-by: Darrick J. Wong
    Acked-by: "Martin K. Petersen"
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Darrick J. Wong
     

02 Aug, 2014

1 commit

  • Remove the io struct off the stack in sync_io() and allocate it from
    the mempool like is done in async_io().

    dec_count() now always calls a callback function and always frees the io
    struct back to the mempool (so sync_io and async_io share this pattern).

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer

    Joe Thornber
     

11 Jul, 2014

1 commit

  • There's a race condition between the atomic_dec_and_test(&io->count)
    in dec_count() and the waking of the sync_io() thread. If the thread
    is spuriously woken immediately after the decrement it may exit,
    making the on stack io struct invalid, yet the dec_count could still
    be using it.

    Fix this race by using a completion in sync_io() and dec_count().

    Reported-by: Minfei Huang
    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Acked-by: Mikulas Patocka
    Cc: stable@vger.kernel.org

    Joe Thornber
     

18 Feb, 2014

1 commit

  • Commit 003b5c5719f159f4f4bf97511c4702a0638313dd ("block: Convert drivers
    to immutable biovecs") broke dm-mirror due to dm-io breakage.

    dm-io had three possible iterators (DM_IO_PAGE_LIST, DM_IO_BVEC,
    DM_IO_VMA) that iterate over pages where the I/O should be performed.

    The switch to immutable biovecs changed the DM_IO_BVEC iterator to
    DM_IO_BIO. Before this change the iterator stored the pointer to a bio
    vector in the dpages structure. The iterator incremented the pointer in
    the dpages structure as it advanced over the pages. After the immutable
    biovecs change, the DM_IO_BIO iterator stores a pointer to the bio in
    the dpages structure and uses bio_advance to change the bio as it
    advances.

    The problem is that the function dispatch_io stores the content of the
    dpages structure into the variable old_pages and restores it before
    issuing I/O to each of the devices. Before the change, the statement
    "*dp = old_pages;" restored the iterator to its starting position.
    After the change, struct dpages holds a pointer to the bio, thus the
    statement "*dp = old_pages;" doesn't restore the iterator.

    Consequently, in the context of dm-mirror: only the first mirror leg is
    written correctly, the kernel locks up when trying to write the other
    mirror legs because the number of sectors to write in the where->count
    variable doesn't match the number of sectors returned by the iterator.

    This patch fixes the bug by partially reverting the original patch - it
    changes the code so that struct dpages holds a pointer to the bio vector,
    so that the statement "*dp = old_pages;" restores the iterator correctly.

    The field "context_u" holds the offset from the beginning of the current
    bio vector entry, just like the "bio->bi_iter.bi_bvec_done" field.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mikulas Patocka
     

24 Nov, 2013

2 commits

  • Now that we've got a mechanism for immutable biovecs -
    bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
    respect it instead of using the bvec array directly.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: NeilBrown
    Cc: Alasdair Kergon
    Cc: dm-devel@redhat.com

    Kent Overstreet
     
  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     

23 Sep, 2013

1 commit

  • Allow user to change the number of IOs that are reserved by
    bio-based DM's mempools by writing to this file:
    /sys/module/dm_mod/parameters/reserved_bio_based_ios

    The default value is RESERVED_BIO_BASED_IOS (16). The maximum allowed
    value is RESERVED_MAX_IOS (1024).

    Export dm_get_reserved_bio_based_ios() for use by DM targets and core
    code. Switch to sizing dm-io's mempool and bioset using DM core's
    configurable 'reserved_bio_based_ios'.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Frank Mayhar

    Mike Snitzer
     

22 Dec, 2012

1 commit

  • Add WRITE SAME support to dm-io and make it accessible to
    dm_kcopyd_zero(). dm_kcopyd_zero() provides an asynchronous interface
    whereas the blkdev_issue_write_same() interface is synchronous.

    WRITE SAME is a SCSI command that can be leveraged for more efficient
    zeroing of a specified logical extent of a device which supports it.
    Only a single zeroed logical block is transfered to the target for each
    WRITE SAME and the target then writes that same block across the
    specified extent.

    The dm thin target uses this.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Mike Snitzer
     

09 Sep, 2012

1 commit

  • With the old code, when you allocate a bio from a bio pool you have to
    implement your own destructor that knows how to find the bio pool the
    bio was originally allocated from.

    This adds a new field to struct bio (bi_pool) and changes
    bio_alloc_bioset() to use it. This makes various bio destructors
    unnecessary, so they're then deleted.

    v6: Explain the temporary if statement in bio_put

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    CC: NeilBrown
    CC: Alasdair Kergon
    CC: Nicholas Bellinger
    CC: Lars Ellenberg
    Acked-by: Tejun Heo
    Acked-by: Nicholas Bellinger
    Signed-off-by: Jens Axboe

    Kent Overstreet
     

08 Mar, 2012

1 commit

  • This patch fixes a crash by recognising discards in dm_io.

    Currently dm_mirror can send REQ_DISCARD bios if running over a
    discard-enabled device and without support in dm_io the system
    crashes badly.

    BUG: unable to handle kernel paging request at 00800000
    IP: __bio_add_page.part.17+0xf5/0x1e0
    ...
    bio_add_page+0x56/0x70
    dispatch_io+0x1cf/0x240 [dm_mod]
    ? km_get_page+0x50/0x50 [dm_mod]
    ? vm_next_page+0x20/0x20 [dm_mod]
    ? mirror_flush+0x130/0x130 [dm_mirror]
    dm_io+0xdc/0x2b0 [dm_mod]
    ...

    Introduced in 2.6.38-rc1 by commit 5fc2ffeabb9ee0fc0e71ff16b49f34f0ed3d05b4
    (dm raid1: support discard).

    Signed-off-by: Milan Broz
    Cc: stable@kernel.org
    Acked-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon

    Milan Broz
     

02 Aug, 2011

1 commit

  • For normal kernel pages, CPU cache is synchronized by the dma layer.
    However, this is not done for pages allocated with vmalloc. If we do I/O
    to/from vmallocated pages, we must synchronize CPU cache explicitly.

    Prior to doing I/O on vmallocated page we must call
    flush_kernel_vmap_range to flush dirty cache on the virtual address.
    After finished read we must call invalidate_kernel_vmap_range to
    invalidate cache on the virtual address, so that accesses to the virtual
    address return newly read data and not stale data from CPU cache.

    This patch fixes metadata corruption on dm-snapshots on PA-RISC and
    possibly other architectures with caches indexed by virtual address.

    Cc: stable
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

29 May, 2011

1 commit

  • Replace the arbitrary calculation of an initial io struct mempool size
    with a constant.

    The code calculated the number of reserved structures based on the request
    size and used a "magic" multiplication constant of 4. This patch changes
    it to reserve a fixed number - itself still chosen quite arbitrarily.
    Further testing might show if there is a better number to choose.

    Note that if there is no memory pressure, we can still allocate an
    arbitrary number of "struct io" structures. One structure is enough to
    process the whole request.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

10 Mar, 2011

1 commit

  • With the plugging now being explicitly controlled by the
    submitter, callers need not pass down unplugging hints
    to the block layer. If they want to unplug, it's because they
    manually plugged on their own - in which case, they should just
    unplug at will.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

10 Sep, 2010

1 commit

  • This patch converts bio-based dm to support REQ_FLUSH/FUA instead of
    now deprecated REQ_HARDBARRIER.

    * -EOPNOTSUPP handling logic dropped.

    * Preflush is handled as before but postflush is dropped and replaced
    with passing down REQ_FUA to member request_queues. This replaces
    one array wide cache flush w/ member specific FUA writes.

    * __split_and_process_bio() now calls __clone_and_map_flush() directly
    for flushes and guarantees all FLUSH bio's going to targets are zero
    ` length.

    * It's now guaranteed that all FLUSH bio's which are passed onto dm
    targets are zero length. bio_empty_barrier() tests are replaced
    with REQ_FLUSH tests.

    * Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes.

    * Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely
    enough to be marked with unlikely().

    * Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue
    doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA
    capability.

    * Request based dm isn't converted yet. dm_init_request_based_queue()
    resets flush support to 0 for now. To avoid disturbing request
    based dm code, dm->flush_error is added for bio based dm while
    requested based dm continues to use dm->barrier_error.

    Lightly tested linear, stripe, raid1, snap and crypt targets. Please
    proceed with caution as I'm not familiar with the code base.

    Signed-off-by: Tejun Heo
    Cc: dm-devel@redhat.com
    Cc: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Tejun Heo
     

08 Aug, 2010

1 commit

  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

11 Dec, 2009

3 commits

  • Accept empty barriers in dm-io.

    dm-io will process empty write barrier requests just like the other
    read/write requests.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Remove the hack where we allocate an extra bi_io_vec to store additional
    private data. This hack prevents us from supporting barriers in
    dm-raid1 without first making another little block layer change.
    Instead of doing that, this patch eliminates the bi_io_vec abuse by
    storing the region number directly in the low bits of bi_private.

    We need to store two things for each bio, the pointer to the main io
    structure and, if parallel writes were requested, an index indicating
    which of these writes this bio belongs to. There can be at most
    BITS_PER_LONG regions - 32 or 64.

    The index (region number) was stored in the last (hidden) bio vector and
    the pointer to struct io was stored in bi_private.

    This patch now aligns "struct io" on BITS_PER_LONG bytes and stores the
    region number in the low BITS_PER_LONG bits of bi_private.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Allocate "struct io" from a slab.

    This patch changes dm-io, so that "struct io" is allocated from a slab cache.
    It used to be allocated with kmalloc. Allocating from a slab will be needed
    for the next patch, because it requires a special alignment of "struct io"
    and kmalloc cannot meet this alignment.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

22 Jun, 2009

1 commit

  • If -EOPNOTSUPP was returned and the request was a barrier request, retry it
    without barrier.

    Retry all regions for now. Barriers are submitted only for one-region requests,
    so it doesn't matter. (In the future, retries can be limited to the actual
    regions that failed.)

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka