24 Aug, 2017

1 commit

  • This way we don't need a block_device structure to submit I/O. The
    block_device has different life time rules from the gendisk and
    request_queue and is usually only available when the block device node
    is open. Other callers need to explicitly create one (e.g. the lightnvm
    passthrough code, or the new nvme multipathing code).

    For the actual I/O path all that we need is the gendisk, which exists
    once per block device. But given that the block layer also does
    partition remapping we additionally need a partition index, which is
    used for said remapping in generic_make_request.

    Note that all the block drivers generally want request_queue or
    sometimes the gendisk, so this removes a layer of indirection all
    over the stack.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

04 Jul, 2017

1 commit

  • Pull core block/IO updates from Jens Axboe:
    "This is the main pull request for the block layer for 4.13. Not a huge
    round in terms of features, but there's a lot of churn related to some
    core cleanups.

    Note this depends on the UUID tree pull request, that Christoph
    already sent out.

    This pull request contains:

    - A series from Christoph, unifying the error/stats codes in the
    block layer. We now use blk_status_t everywhere, instead of using
    different schemes for different places.

    - Also from Christoph, some cleanups around request allocation and IO
    scheduler interactions in blk-mq.

    - And yet another series from Christoph, cleaning up how we handle
    and do bounce buffering in the block layer.

    - A blk-mq debugfs series from Bart, further improving on the support
    we have for exporting internal information to aid debugging IO
    hangs or stalls.

    - Also from Bart, a series that cleans up the request initialization
    differences across types of devices.

    - A series from Goldwyn Rodrigues, allowing the block layer to return
    failure if we will block and the user asked for non-blocking.

    - Patch from Hannes for supporting setting loop devices block size to
    that of the underlying device.

    - Two series of patches from Javier, fixing various issues with
    lightnvm, particular around pblk.

    - A series from me, adding support for write hints. This comes with
    NVMe support as well, so applications can help guide data placement
    on flash to improve performance, latencies, and write
    amplification.

    - A series from Ming, improving and hardening blk-mq support for
    stopping/starting and quiescing hardware queues.

    - Two pull requests for NVMe updates. Nothing major on the feature
    side, but lots of cleanups and bug fixes. From the usual crew.

    - A series from Neil Brown, greatly improving the bio rescue set
    support. Most notably, this kills the bio rescue work queues, if we
    don't really need them.

    - Lots of other little bug fixes that are all over the place"

    * 'for-4.13/block' of git://git.kernel.dk/linux-block: (217 commits)
    lightnvm: pblk: set line bitmap check under debug
    lightnvm: pblk: verify that cache read is still valid
    lightnvm: pblk: add initialization check
    lightnvm: pblk: remove target using async. I/Os
    lightnvm: pblk: use vmalloc for GC data buffer
    lightnvm: pblk: use right metadata buffer for recovery
    lightnvm: pblk: schedule if data is not ready
    lightnvm: pblk: remove unused return variable
    lightnvm: pblk: fix double-free on pblk init
    lightnvm: pblk: fix bad le64 assignations
    nvme: Makefile: remove dead build rule
    blk-mq: map all HWQ also in hyperthreaded system
    nvmet-rdma: register ib_client to not deadlock in device removal
    nvme_fc: fix error recovery on link down.
    nvmet_fc: fix crashes on bad opcodes
    nvme_fc: Fix crash when nvme controller connection fails.
    nvme_fc: replace ioabort msleep loop with completion
    nvme_fc: fix double calls to nvme_cleanup_cmd()
    nvme-fabrics: verify that a controller returns the correct NQN
    nvme: simplify nvme_dev_attrs_are_visible
    ...

    Linus Torvalds
     

15 Jun, 2017

1 commit

  • This reverts commit 12a7cf5ba6c776a2621d8972c7d42e8d3d959d20.

    This commit apparently attempted to fix an issue that didn't really
    exist, furthermore: this commit is the source of deadlocks and crashes
    seen in multiple cases related to failing the primary mirror dev while
    syncing.

    Reported-by: Jonathan Brassow
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

12 Jun, 2017

1 commit

  • We've already got a few conflicts and upcoming work depends on some of the
    changes that have gone into mainline as regression fixes for this series.

    Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream
    trees to continue working on 4.13 changes.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

09 Jun, 2017

4 commits

  • Replace bi_error with a new bi_status to allow for a clear conversion.
    Note that device mapper overloaded bi_error with a private value, which
    we'll have to keep arround at least for now and thus propagate to a
    proper blk_status_t value.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Turn the error paramter into a pointer so that target drivers can change
    the value, and make sure only DM_ENDIO_* values are returned from the
    methods.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Instead use the special DM_MAPIO_KILL return value to return -EIO just
    like we do for the request based path. Note that dm-log-writes returned
    -ENOMEM in a few places, which now becomes -EIO instead. No consumer
    treats -ENOMEM special so this shouldn't be an issue (and it should
    use a mempool to start with to make guaranteed progress).

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • A few (but not all) dm targets use a special EWOULDBLOCK error code for
    failing REQ_RAHEAD requests that fail due to a lack of available resources.
    But no one else knows about this magic code, and lower level drivers also
    don't generate it when failing read-ahead requests for similar reasons.

    So remove this special casing and ignore all additional error handling for
    REQ_RAHEAD - if this was a real underlying error we'd get a normal read
    once the real read comes in.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

31 May, 2017

1 commit

  • Commit b685d3d65ac7 ("block: treat REQ_FUA and REQ_PREFLUSH as
    synchronous") removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
    definitions. generic_make_request_checks() however strips REQ_FUA and
    REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
    write cache and thus write effectively becomes asynchronous which can
    lead to performance regressions.

    Fix the problem by making sure all bios which are synchronous are
    properly marked with REQ_SYNC.

    Fixes: b685d3d65ac7 ("block: treat REQ_FUA and REQ_PREFLUSH as synchronous")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jan Kara
    Signed-off-by: Mike Snitzer

    Jan Kara
     

09 Apr, 2017

1 commit


14 Dec, 2016

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the main block pull request this series. Contrary to previous
    release, I've kept the core and driver changes in the same branch. We
    always ended up having dependencies between the two for obvious
    reasons, so makes more sense to keep them together. That said, I'll
    probably try and keep more topical branches going forward, especially
    for cycles that end up being as busy as this one.

    The major parts of this pull request is:

    - Improved support for O_DIRECT on block devices, with a small
    private implementation instead of using the pig that is
    fs/direct-io.c. From Christoph.

    - Request completion tracking in a scalable fashion. This is utilized
    by two components in this pull, the new hybrid polling and the
    writeback queue throttling code.

    - Improved support for polling with O_DIRECT, adding a hybrid mode
    that combines pure polling with an initial sleep. From me.

    - Support for automatic throttling of writeback queues on the block
    side. This uses feedback from the device completion latencies to
    scale the queue on the block side up or down. From me.

    - Support from SMR drives in the block layer and for SD. From Hannes
    and Shaun.

    - Multi-connection support for nbd. From Josef.

    - Cleanup of request and bio flags, so we have a clear split between
    which are bio (or rq) private, and which ones are shared. From
    Christoph.

    - A set of patches from Bart, that improve how we handle queue
    stopping and starting in blk-mq.

    - Support for WRITE_ZEROES from Chaitanya.

    - Lightnvm updates from Javier/Matias.

    - Supoort for FC for the nvme-over-fabrics code. From James Smart.

    - A bunch of fixes from a whole slew of people, too many to name
    here"

    * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
    blk-stat: fix a few cases of missing batch flushing
    blk-flush: run the queue when inserting blk-mq flush
    elevator: make the rqhash helpers exported
    blk-mq: abstract out blk_mq_dispatch_rq_list() helper
    blk-mq: add blk_mq_start_stopped_hw_queue()
    block: improve handling of the magic discard payload
    blk-wbt: don't throttle discard or write zeroes
    nbd: use dev_err_ratelimited in io path
    nbd: reset the setup task for NBD_CLEAR_SOCK
    nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
    nvme-fabrics: Add target support for FC transport
    nvme-fabrics: Add host support for FC transport
    nvme-fabrics: Add FC transport LLDD api definitions
    nvme-fabrics: Add FC transport FC-NVME definitions
    nvme-fabrics: Add FC transport error codes to nvme.h
    Add type 0x28 NVME type code to scsi fc headers
    nvme-fabrics: patch target code in prep for FC transport support
    nvme-fabrics: set sqe.command_id in core not transports
    parser: add u64 number parser
    nvme-rdma: align to generic ib_event logging helper
    ...

    Linus Torvalds
     

01 Nov, 2016

1 commit


14 Oct, 2016

2 commits

  • When any leg(s) have failed, any read will cause a new operational
    default leg to be selected and the read is resubmitted to it. If that
    new default leg fails the read too, no other still accessible legs are
    used to resubmit the read again -- thus failing the io.

    Fix by allowing the read to get resubmitted until all operational legs
    have been exhausted. Also, remove any details.bi_dev use as a flag.

    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     
  • If a default leg has failed, any read will cause a new operational
    default leg to be selected and the read is resubmitted. But until now
    the read will return failure even though it was successful due to
    resubmission. The reason for this is bio->bi_error was not being
    cleared before resubmitting the bio.

    Fix by clearing bio->bi_error before resubmission.

    Fixes: 4246a0b63bd8 ("block: add a bi_error field to struct bio")
    Cc: stable@vger.kernel.org # 4.3+
    Signed-off-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer

    Heinz Mauelshagen
     

08 Aug, 2016

1 commit

  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

21 Jul, 2016

1 commit

  • These two are confusing leftover of the old world order, combining
    values of the REQ_OP_ and REQ_ namespaces. For callers that don't
    special case we mostly just replace bi_rw with bio_data_dir or
    op_is_write, except for the few cases where a switch over the REQ_OP_
    values makes more sense. Any check for READA is replaced with an
    explicit check for REQ_RAHEAD. Also remove the READA alias for
    REQ_RAHEAD.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

08 Jun, 2016

2 commits

  • To avoid confusion between REQ_OP_FLUSH, which is handled by
    request_fn drivers, and upper layers requesting the block layer
    perform a flush sequence along with possibly a WRITE, this patch
    renames REQ_FLUSH to REQ_PREFLUSH.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • Separate the op from the rq_flag_bits and have dm
    set/get the bio using bio_set_op_attrs/bio_op.

    Signed-off-by: Mike Christie
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     

23 Feb, 2016

1 commit


03 Sep, 2015

1 commit

  • Pull device mapper update from Mike Snitzer:

    - a couple small cleanups in dm-cache, dm-verity, persistent-data's
    dm-btree, and DM core.

    - a 4.1-stable fix for dm-cache that fixes the leaking of deferred bio
    prison cells

    - a 4.2-stable fix that adds feature reporting for the dm-stats
    features added in 4.2

    - improve DM-snapshot to not invalidate the on-disk snapshot if
    snapshot device write overflow occurs; but a write overflow triggered
    through the origin device will still invalidate the snapshot.

    - optimize DM-thinp's async discard submission a bit now that late bio
    splitting has been included in block core.

    - switch DM-cache's SMQ policy lock from using a mutex to a spinlock;
    improves performance on very low latency devices (eg. NVMe SSD).

    - document DM RAID 4/5/6's discard support

    [ I did not pull the slab changes, which weren't appropriate for this
    tree, and weren't obviously the right thing to do anyway. At the very
    least they need some discussion and explanation before getting merged.

    Because not pulling the actual tagged commit but doing a partial pull
    instead, this merge commit thus also obviously is missing the git
    signature from the original tag ]

    * tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm cache: fix use after freeing migrations
    dm cache: small cleanups related to deferred prison cell cleanup
    dm cache: fix leaking of deferred bio prison cells
    dm raid: document RAID 4/5/6 discard support
    dm stats: report precise_timestamps and histogram in @stats_list output
    dm thin: optimize async discard submission
    dm snapshot: don't invalidate on-disk image on snapshot write overflow
    dm: remove unlikely() before IS_ERR()
    dm: do not override error code returned from dm_get_device()
    dm: test return value for DM_MAPIO_SUBMITTED
    dm verity: remove unused mempool
    dm cache: move wake_waker() from free_migrations() to where it is needed
    dm btree remove: remove unused function get_nr_entries()
    dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling()
    dm cache policy smq: change the mutex to a spinlock

    Linus Torvalds
     

12 Aug, 2015

1 commit

  • Some of the device mapper targets override the error code returned by
    dm_get_device() and return either -EINVAL or -ENXIO. There is nothing
    gained by this override. It is better to propagate the returned error
    code unchanged to caller.

    This work was motivated by hitting an issue where the underlying device
    was busy but -EINVAL was being returned. After this change we get
    -EBUSY instead and it is easier to figure out the problem.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Mike Snitzer

    Vivek Goyal
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

30 May, 2015

1 commit

  • Currently if there is a leg failure, the bio will be put into the hold
    list until userspace does a remove/replace on the leg. Doing so in a
    cluster config (clvmd) is problematic because there may be a temporary
    path failure that results in cluster raid1 remove/replace. Such
    recovery takes a long time due to a full resync.

    Update dm-raid1 to optionally ignore these failures so bios continue
    being issued without interrupton. To enable this feature userspace
    must pass "keep_log" when creating the dm-raid1 device.

    Signed-off-by: Lidong Zhong
    Tested-by: Liuhua Wang
    Acked-by: Heinz Mauelshagen
    Signed-off-by: Mike Snitzer

    Lidong Zhong
     

22 May, 2015

1 commit

  • Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
    non-chains") regressed all existing callers that followed this pattern:
    1) saving a bio's original bi_end_io
    2) wiring up an intermediate bi_end_io
    3) restoring the original bi_end_io from intermediate bi_end_io
    4) calling bio_endio() to execute the restored original bi_end_io

    The regression was due to BIO_CHAIN only ever getting set if
    bio_inc_remaining() is called. For the above pattern it isn't set until
    step 3 above (step 2 would've needed to establish BIO_CHAIN). As such
    the first bio_endio(), in step 2 above, never decremented __bi_remaining
    before calling the intermediate bi_end_io -- leaving __bi_remaining with
    the value 1 instead of 0. When bio_inc_remaining() occurred during step
    3 it brought it to a value of 2. When the second bio_endio() was
    called, in step 4 above, it should've called the original bi_end_io but
    it didn't because there was an extra reference that wasn't dropped (due
    to atomic operations being optimized away since BIO_CHAIN wasn't set
    upfront).

    Fix this issue by removing the __bi_remaining management complexity for
    all callers that use the above pattern -- bio_chain() is the only
    interface that _needs_ to be concerned with __bi_remaining. For the
    above pattern callers just expect the bi_end_io they set to get called!
    Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
    that aren't associated with the bio_chain() interface.

    Also, the bio_inc_remaining() interface has been moved local to bio.c.

    Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

06 May, 2015

1 commit

  • Struct bio has an atomic ref count for chained bio's, and we use this
    to know when to end IO on the bio. However, most bio's are not chained,
    so we don't need to always introduce this atomic operation as part of
    ending IO.

    Add a helper to elevate the bi_remaining count, and flag the bio as
    now actually needing the decrement at end_io time. Rename the field
    to __bi_remaining to catch any current users of this doing the
    incrementing manually.

    For high IOPS workloads, this reduces the overhead of bio_endio()
    substantially.

    Tested-by: Robert Elliott
    Acked-by: Kent Overstreet
    Reviewed-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jens Axboe
     

14 Feb, 2015

1 commit

  • It may be possible that a device claims discard support but it rejects
    discards with -EOPNOTSUPP. It happens when using loopback on ext2/ext3
    filesystem driven by the ext4 driver. It may also happen if the
    underlying devices are moved from one disk on another.

    If discard error happens, we reject the bio with -EOPNOTSUPP, but we do
    not degrade the array.

    This patch fixes failed test shell/lvconvert-repair-transient.sh in the
    lvm2 testsuite if the testsuite is extracted on an ext2 or ext3
    filesystem and it is being driven by the ext4 driver.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org

    Mikulas Patocka
     

18 Feb, 2014

1 commit


24 Nov, 2013

2 commits

  • Now that we've got a mechanism for immutable biovecs -
    bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
    respect it instead of using the bvec array directly.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: NeilBrown
    Cc: Alasdair Kergon
    Cc: dm-devel@redhat.com

    Kent Overstreet
     
  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     

23 Aug, 2013

1 commit

  • dbf2576e37 ("workqueue: make all workqueues non-reentrant") made
    WQ_NON_REENTRANT no-op and the flag is going away. Remove its usages.

    This patch doesn't introduce any behavior changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Mike Snitzer
    Signed-off-by: Alasdair G Kergon
    Acked-by: Joe Thornber

    Tejun Heo
     

24 Mar, 2013

1 commit

  • Bunch of places in the code weren't using it where they could be -
    this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
    into a struct bvec_iter.

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    CC: "Ed L. Cashin"
    CC: Nick Piggin
    CC: Jiri Kosina
    CC: Jim Paris
    CC: Geoff Levand
    CC: Alasdair Kergon
    CC: dm-devel@redhat.com
    CC: Neil Brown
    CC: Steven Rostedt
    Acked-by: Ed Cashin

    Kent Overstreet
     

02 Mar, 2013

3 commits

  • This patch allows the administrator to reduce the rate at which kcopyd
    issues I/O.

    Each module that uses kcopyd acquires a throttle parameter that can be
    set in /sys/module/*/parameters.

    We maintain a history of kcopyd usage by each module in the variables
    io_period and total_period in struct dm_kcopyd_throttle. The actual
    kcopyd activity is calculated as a percentage of time equal to
    "(100 * io_period / total_period)". This is compared with the user-defined
    throttle percentage threshold and if it is exceeded, we sleep.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Use 'bio' in the name of variables and functions that deal with
    bios rather than 'request' to avoid confusion with the normal
    block layer use of 'request'.

    No functional changes.

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     
  • Avoid returning a truncated table or status string instead of setting
    the DM_BUFFER_FULL_FLAG when the last target of a table fills the
    buffer.

    When processing a table or status request, the function retrieve_status
    calls ti->type->status. If ti->type->status returns non-zero,
    retrieve_status assumes that the buffer overflowed and sets
    DM_BUFFER_FULL_FLAG.

    However, targets don't return non-zero values from their status method
    on overflow. Most targets returns always zero.

    If a buffer overflow happens in a target that is not the last in the
    table, it gets noticed during the next iteration of the loop in
    retrieve_status; but if a buffer overflow happens in the last target, it
    goes unnoticed and erroneously truncated data is returned.

    In the current code, the targets behave in the following way:
    * dm-crypt returns -ENOMEM if there is not enough space to store the
    key, but it returns 0 on all other overflows.
    * dm-thin returns errors from the status method if a disk error happened.
    This is incorrect because retrieve_status doesn't check the error
    code, it assumes that all non-zero values mean buffer overflow.
    * all the other targets always return 0.

    This patch changes the ti->type->status function to return void (because
    most targets don't use the return code). Overflow is detected in
    retrieve_status: if the status method fills up the remaining space
    completely, it is assumed that buffer overflow happened.

    Cc: stable@vger.kernel.org
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     

22 Dec, 2012

6 commits

  • This patch removes map_info from bio-based device mapper targets.
    map_info is still used for request-based targets.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Don't use map_info any more in dm-raid1.

    map_info was used for writes to hold the region number. For this purpose
    we add a new field dm_bio_details to dm_raid1_bio_record.

    map_info was used for reads to hold a pointer to dm_raid1_bio_record (if
    the pointer was non-NULL, bio details were saved; if the pointer was
    NULL, bio details were not saved). We use
    dm_raid1_bio_record.details->bi_bdev for this purpose. If bi_bdev is
    NULL, details were not saved, if bi_bdev is non-NULL, details were
    saved.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Rename struct read_record to bio_record in dm-raid1.

    In the following patch, the structure will be used for both read and
    write bios, so rename it.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Replace read_record_pool with per_bio_data in dm-raid1.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Use a defined macro DM_ENDIO_INCOMPLETE instead of a numeric constant.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • mempool_alloc can't fail if __GFP_WAIT is specified, so the condition
    that tests if read_record is non-NULL is always true.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka