08 Sep, 2017

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the first pull request for 4.14, containing most of the code
    changes. It's a quiet series this round, which I think we needed after
    the churn of the last few series. This contains:

    - Fix for a registration race in loop, from Anton Volkov.

    - Overflow complaint fix from Arnd for DAC960.

    - Series of drbd changes from the usual suspects.

    - Conversion of the stec/skd driver to blk-mq. From Bart.

    - A few BFQ improvements/fixes from Paolo.

    - CFQ improvement from Ritesh, allowing idling for group idle.

    - A few fixes found by Dan's smatch, courtesy of Dan.

    - A warning fixup for a race between changing the IO scheduler and
    device remova. From David Jeffery.

    - A few nbd fixes from Josef.

    - Support for cgroup info in blktrace, from Shaohua.

    - Also from Shaohua, new features in the null_blk driver to allow it
    to actually hold data, among other things.

    - Various corner cases and error handling fixes from Weiping Zhang.

    - Improvements to the IO stats tracking for blk-mq from me. Can
    drastically improve performance for fast devices and/or big
    machines.

    - Series from Christoph removing bi_bdev as being needed for IO
    submission, in preparation for nvme multipathing code.

    - Series from Bart, including various cleanups and fixes for switch
    fall through case complaints"

    * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
    kernfs: checking for IS_ERR() instead of NULL
    drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
    drbd: Fix allyesconfig build, fix recent commit
    drbd: switch from kmalloc() to kmalloc_array()
    drbd: abort drbd_start_resync if there is no connection
    drbd: move global variables to drbd namespace and make some static
    drbd: rename "usermode_helper" to "drbd_usermode_helper"
    drbd: fix race between handshake and admin disconnect/down
    drbd: fix potential deadlock when trying to detach during handshake
    drbd: A single dot should be put into a sequence.
    drbd: fix rmmod cleanup, remove _all_ debugfs entries
    drbd: Use setup_timer() instead of init_timer() to simplify the code.
    drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
    drbd: new disk-option disable-write-same
    drbd: Fix resource role for newly created resources in events2
    drbd: mark symbols static where possible
    drbd: Send P_NEG_ACK upon write error in protocol != C
    drbd: add explicit plugging when submitting batches
    drbd: change list_for_each_safe to while(list_first_entry_or_null)
    drbd: introduce drbd_recv_header_maybe_unplug
    ...

    Linus Torvalds
     

24 Aug, 2017

2 commits

  • In dm-integrity target we register integrity profile that have
    both generate_fn and verify_fn callbacks set to NULL.

    This is used if dm-integrity is stacked under a dm-crypt device
    for authenticated encryption (integrity payload contains authentication
    tag and IV seed).

    In this case the verification is done through own crypto API
    processing inside dm-crypt; integrity profile is only holder
    of these data. (And memory is owned by dm-crypt as well.)

    After the commit (and previous changes)
    Commit 7c20f11680a441df09de7235206f70115fbf6290
    Author: Christoph Hellwig
    Date: Mon Jul 3 16:58:43 2017 -0600

    bio-integrity: stop abusing bi_end_io

    we get this crash:

    : BUG: unable to handle kernel NULL pointer dereference at (null)
    : IP: (null)
    : *pde = 00000000
    ...
    :
    : Workqueue: kintegrityd bio_integrity_verify_fn
    : task: f48ae180 task.stack: f4b5c000
    : EIP: (null)
    : EFLAGS: 00210286 CPU: 0
    : EAX: f4b5debc EBX: 00001000 ECX: 00000001 EDX: 00000000
    : ESI: 00001000 EDI: ed25f000 EBP: f4b5dee8 ESP: f4b5dea4
    : DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    : CR0: 80050033 CR2: 00000000 CR3: 32823000 CR4: 001406d0
    : Call Trace:
    : ? bio_integrity_process+0xe3/0x1e0
    : bio_integrity_verify_fn+0xea/0x150
    : process_one_work+0x1c7/0x5c0
    : worker_thread+0x39/0x380
    : kthread+0xd6/0x110
    : ? process_one_work+0x5c0/0x5c0
    : ? kthread_worker_fn+0x100/0x100
    : ? kthread_worker_fn+0x100/0x100
    : ret_from_fork+0x19/0x24
    : Code: Bad EIP value.
    : EIP: (null) SS:ESP: 0068:f4b5dea4
    : CR2: 0000000000000000

    Patch just skip the whole verify workqueue if verify_fn is set to NULL.

    Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
    Signed-off-by: Milan Broz
    [hch: trivial whitespace fix]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Milan Broz
     
  • This way we don't need a block_device structure to submit I/O. The
    block_device has different life time rules from the gendisk and
    request_queue and is usually only available when the block device node
    is open. Other callers need to explicitly create one (e.g. the lightnvm
    passthrough code, or the new nvme multipathing code).

    For the actual I/O path all that we need is the gendisk, which exists
    once per block device. But given that the block layer also does
    partition remapping we additionally need a partition index, which is
    used for said remapping in generic_make_request.

    Note that all the block drivers generally want request_queue or
    sometimes the gendisk, so this removes a layer of indirection all
    over the stack.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

10 Aug, 2017

3 commits

  • This gets us back to the behavior in 4.12 and earlier.

    Signed-off-by: Christoph Hellwig
    Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • In dm-integrity target we register integrity profile that have
    both generate_fn and verify_fn callbacks set to NULL.

    This is used if dm-integrity is stacked under a dm-crypt device
    for authenticated encryption (integrity payload contains authentication
    tag and IV seed).

    In this case the verification is done through own crypto API
    processing inside dm-crypt; integrity profile is only holder
    of these data. (And memory is owned by dm-crypt as well.)

    After the commit (and previous changes)
    Commit 7c20f11680a441df09de7235206f70115fbf6290
    Author: Christoph Hellwig
    Date: Mon Jul 3 16:58:43 2017 -0600

    bio-integrity: stop abusing bi_end_io

    we get this crash:

    : BUG: unable to handle kernel NULL pointer dereference at (null)
    : IP: (null)
    : *pde = 00000000
    ...
    :
    : Workqueue: kintegrityd bio_integrity_verify_fn
    : task: f48ae180 task.stack: f4b5c000
    : EIP: (null)
    : EFLAGS: 00210286 CPU: 0
    : EAX: f4b5debc EBX: 00001000 ECX: 00000001 EDX: 00000000
    : ESI: 00001000 EDI: ed25f000 EBP: f4b5dee8 ESP: f4b5dea4
    : DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    : CR0: 80050033 CR2: 00000000 CR3: 32823000 CR4: 001406d0
    : Call Trace:
    : ? bio_integrity_process+0xe3/0x1e0
    : bio_integrity_verify_fn+0xea/0x150
    : process_one_work+0x1c7/0x5c0
    : worker_thread+0x39/0x380
    : kthread+0xd6/0x110
    : ? process_one_work+0x5c0/0x5c0
    : ? kthread_worker_fn+0x100/0x100
    : ? kthread_worker_fn+0x100/0x100
    : ret_from_fork+0x19/0x24
    : Code: Bad EIP value.
    : EIP: (null) SS:ESP: 0068:f4b5dea4
    : CR2: 0000000000000000

    Patch just skip the whole verify workqueue if verify_fn is set to NULL.

    Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
    Signed-off-by: Milan Broz
    [hch: trivial whitespace fix]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Milan Broz
     
  • This makes the code more obvious, and moves the most likely branch first
    in the function.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

05 Jul, 2017

1 commit

  • block/bio-integrity.c:318:10-11: WARNING: return of 0/1 in function 'bio_integrity_prep' with return type bool

    Return statements in functions returning bool should use
    true/false instead of 1/0.
    Generated by: scripts/coccinelle/misc/boolreturn.cocci

    Fixes: e23947bd76f0 ("bio-integrity: fold bio_integrity_enabled to bio_integrity_prep")
    CC: Dmitry Monakhov
    Signed-off-by: Fengguang Wu
    Signed-off-by: Jens Axboe

    kbuild test robot
     

04 Jul, 2017

5 commits

  • And instead call directly into the integrity code from bio_end_io.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Currently ->verify_fn not woks at all because at the moment it is called
    bio->bi_iter.bi_size == 0, so we do not iterate integrity bvecs at all.

    In order to perform verification we need to know original data vector,
    with new bvec rewind API this is trivial.

    testcase: https://github.com/dmonakhov/xfstests/commit/3c6509eaa83b9c17cd0bc95d73fcdd76e1c54a85

    Reviewed-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dmitry Monakhov
    [hch: adopted for new status values]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     
  • Currently all integrity prep hooks are open-coded, and if prepare fails
    we ignore it's code and fail bio with EIO. Let's return real error to
    upper layer, so later caller may react accordingly.

    In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
    so it is reasonable to fold it in to one function.

    Signed-off-by: Dmitry Monakhov
    Reviewed-by: Martin K. Petersen
    [hch: merged with the latest block tree,
    return bool from bio_integrity_prep]
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     
  • bio_integrity_trim inherent it's interface from bio_trim and accept
    offset and size, but this API is error prone because data offset
    must always be insync with bio's data offset. That is why we have
    integrity update hook in bio_advance()

    So only meaningful values are: offset == 0, sectors == bio_sectors(bio)
    Let's just remove them completely.

    Reviewed-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     
  • SCSI drivers do care about bip_seed so we must update it accordingly.

    Reviewed-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

12 Jun, 2017

1 commit

  • We've already got a few conflicts and upcoming work depends on some of the
    changes that have gone into mainline as regression fixes for this series.

    Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream
    trees to continue working on 4.13 changes.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

09 Jun, 2017

1 commit

  • Replace bi_error with a new bi_status to allow for a clear conversion.
    Note that device mapper overloaded bi_error with a private value, which
    we'll have to keep arround at least for now and thus propagate to a
    proper blk_status_t value.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

03 Jun, 2017

1 commit

  • If bio has no data, such as ones from blkdev_issue_flush(),
    then we have nothing to protect.

    This patch prevent bugon like follows:

    kfree_debugcheck: out of range ptr ac1fa1d106742a5ah
    kernel BUG at mm/slab.c:2773!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: bcache
    CPU: 0 PID: 4428 Comm: xfs_io Tainted: G W 4.11.0-rc4-ext4-00041-g2ef0043-dirty #43
    Hardware name: Virtuozzo KVM, BIOS seabios-1.7.5-11.vz7.4 04/01/2014
    task: ffff880137786440 task.stack: ffffc90000ba8000
    RIP: 0010:kfree_debugcheck+0x25/0x2a
    RSP: 0018:ffffc90000babde0 EFLAGS: 00010082
    RAX: 0000000000000034 RBX: ac1fa1d106742a5a RCX: 0000000000000007
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88013f3ccb40
    RBP: ffffc90000babde8 R08: 0000000000000000 R09: 0000000000000000
    R10: 00000000fcb76420 R11: 00000000725172ed R12: 0000000000000282
    R13: ffffffff8150e766 R14: ffff88013a145e00 R15: 0000000000000001
    FS: 00007fb09384bf40(0000) GS:ffff88013f200000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fd0172f9e40 CR3: 0000000137fa9000 CR4: 00000000000006f0
    Call Trace:
    kfree+0xc8/0x1b3
    bio_integrity_free+0xc3/0x16b
    bio_free+0x25/0x66
    bio_put+0x14/0x26
    blkdev_issue_flush+0x7a/0x85
    blkdev_fsync+0x35/0x42
    vfs_fsync_range+0x8e/0x9f
    vfs_fsync+0x1c/0x1e
    do_fsync+0x31/0x4a
    SyS_fsync+0x10/0x14
    entry_SYSCALL_64_fastpath+0x1f/0xc2

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

28 Oct, 2016

1 commit

  • With the addition of the zoned operations the tests in this function
    became incorrect. But I think it's much better to just open code the
    allow operations in the only caller anyway.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Shaun Tancheff
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

08 Aug, 2016

1 commit

  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

27 Jul, 2016

1 commit

  • Pull block driver updates from Jens Axboe:
    "This branch also contains core changes. I've come to the conclusion
    that from 4.9 and forward, I'll be doing just a single branch. We
    often have dependencies between core and drivers, and it's hard to
    always split them up appropriately without pulling core into drivers
    when that happens.

    That said, this contains:

    - separate secure erase type for the core block layer, from
    Christoph.

    - set of discard fixes, from Christoph.

    - bio shrinking fixes from Christoph, as a followup up to the
    op/flags change in the core branch.

    - map and append request fixes from Christoph.

    - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
    exciting!

    - nvme-loop fixes from Arnd.

    - removal of ->driverfs_dev from Dan, after providing a
    device_add_disk() helper.

    - bcache fixes from Bhaktipriya and Yijing.

    - cdrom subchannel read fix from Vchannaiah.

    - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

    - set of drbd updates and fixes from Fabian, Lars, and Philipp.

    - mg_disk error path fix from Bart.

    - user notification for failed device add for loop, from Minfei.

    - NVMe in general:
    + NVMe delay quirk from Guilherme.
    + SR-IOV support and command retry limits from Keith.
    + fix for memory-less NUMA node from Masayoshi.
    + use UINT_MAX for discard sectors, from Minfei.
    + cancel IO fixes from Ming.
    + don't allocate unused major, from Neil.
    + error code fixup from Dan.
    + use constants for PSDT/FUSE from James.
    + variable init fix from Jay.
    + fabrics fixes from Ming, Sagi, and Wei.
    + various fixes"

    * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
    nvme/pci: Provide SR-IOV support
    nvme: initialize variable before logical OR'ing it
    block: unexport various bio mapping helpers
    scsi/osd: open code blk_make_request
    target: stop using blk_make_request
    block: simplify and export blk_rq_append_bio
    block: ensure bios return from blk_get_request are properly initialized
    virtio_blk: use blk_rq_map_kern
    memstick: don't allow REQ_TYPE_BLOCK_PC requests
    block: shrink bio size again
    block: simplify and cleanup bvec pool handling
    block: get rid of bio_rw and READA
    block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
    block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
    NVMe: don't allocate unused nvme_major
    nvme: avoid crashes when node 0 is memoryless node.
    nvme: Limit command retries
    loop: Make user notify for adding loop device failed
    nvme-loop: fix nvme-loop Kconfig dependencies
    nvmet: fix return value check in nvmet_subsys_alloc()
    ...

    Linus Torvalds
     

21 Jul, 2016

1 commit

  • Instead of a flag and an index just make sure an index of 0 means
    no need to free the bvec array. Also move the constants related
    to the bvec pools together and use a consistent naming scheme for
    them.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

14 Jun, 2016

1 commit


09 Dec, 2015

1 commit


04 Dec, 2015

1 commit

  • This patch moves the blk_integrity_payload definition outside the
    CONFIG_BLK_DEV_INTERITY dependency and provides empty function
    implementations when the kernel configuration disables integrity
    extensions. This simplifies drivers that make use of these to map user
    data so they don't need to repeat the same configuration checks.

    Signed-off-by: Keith Busch

    Updated by Jens to pass an error pointer return from
    bio_integrity_alloc(), otherwise if CONFIG_BLK_DEV_INTEGRITY isn't
    set, we return a weird ENOMEM from __nvme_submit_user_cmd()
    if a meta buffer is set.

    Signed-off-by: Jens Axboe

    Keith Busch
     

22 Oct, 2015

3 commits

  • Since they lack requests to pin the request_queue active, synchronous
    bio-based drivers may have in-flight integrity work from
    bio_integrity_endio() that is not flushed by blk_freeze_queue(). Flush
    that work to prevent races to free the queue and the final usage of the
    blk_integrity profile.

    This is temporary unless/until bio-based drivers start to generically
    take a q_usage_counter reference while a bio is in-flight.

    Cc: Martin K. Petersen
    [martin: fix the CONFIG_BLK_DEV_INTEGRITY=n case]
    Tested-by: Ross Zwisler
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Dan Williams
     
  • The per-device properties in the blk_integrity structure were previously
    unsigned short. However, most of the values fit inside a char. The only
    exception is the data interval size and we can work around that by
    storing it as a power of two.

    This cuts the size of the dynamic portion of blk_integrity in half.

    Signed-off-by: Martin K. Petersen
    Reported-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • We previously made a complete copy of a device's data integrity profile
    even though several of the fields inside the blk_integrity struct are
    pointers to fixed template entries in t10-pi.c.

    Split the static and per-device portions so that we can reference the
    template directly.

    Signed-off-by: Martin K. Petersen
    Reported-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Cc: Dan Williams
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

11 Sep, 2015

1 commit

  • This is only theoretical at the moment given that the only
    subsystems that generate integrity payloads are the block layer
    itself and the scsi target (which generate well aligned integrity
    payloads). But when we will expose integrity meta-data to user-space,
    we'll need to refuse appending a page with a gap (if the queue
    virtual boundary is set).

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

07 Jul, 2015

1 commit

  • bio_integrity_alloc() and bio_integrity_free() assume that if a bio was
    allocated from a bioset that that bioset also had its bio_integrity_pool
    allocated using bioset_integrity_create(). This is a very bad
    assumption given that bioset_create() and bioset_integrity_create() are
    completely disjoint. Not all callers of bioset_create() have been
    trained to also call bioset_integrity_create() -- and they may not care
    to be.

    Fix this by falling back to kmalloc'ing 'struct bio_integrity_payload'
    rather than force all bioset consumers to (wastefully) preallocate a
    bio_integrity_pool that they very likely won't actually need (given the
    niche nature of the current block integrity support).

    Otherwise, a NULL pointer "Kernel BUG" with a trace like the following
    will be observed (as seen on s390x using zfcp storage) because dm-io
    doesn't use bioset_integrity_create() when creating its bioset:

    [ 791.643338] Call Trace:
    [ 791.643339] ([] 0x3df98b848)
    [ 791.643341] [] bio_integrity_alloc+0x48/0xf8
    [ 791.643348] [] bio_integrity_prep+0xae/0x2f0
    [ 791.643349] [] blk_queue_bio+0x1c8/0x3d8
    [ 791.643355] [] generic_make_request+0xc0/0x100
    [ 791.643357] [] submit_bio+0xa2/0x198
    [ 791.643406] [] dispatch_io+0x15c/0x3b0 [dm_mod]
    [ 791.643419] [] dm_io+0x176/0x2f0 [dm_mod]
    [ 791.643423] [] do_reads+0x13a/0x1a8 [dm_mirror]
    [ 791.643425] [] do_mirror+0x142/0x298 [dm_mirror]
    [ 791.643428] [] process_one_work+0x18a/0x3f8
    [ 791.643432] [] worker_thread+0x132/0x3b0
    [ 791.643435] [] kthread+0xd2/0xd8
    [ 791.643438] [] kernel_thread_starter+0x6/0xc
    [ 791.643446] [] kernel_thread_starter+0x0/0xc

    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

22 May, 2015

1 commit

  • Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
    non-chains") regressed all existing callers that followed this pattern:
    1) saving a bio's original bi_end_io
    2) wiring up an intermediate bi_end_io
    3) restoring the original bi_end_io from intermediate bi_end_io
    4) calling bio_endio() to execute the restored original bi_end_io

    The regression was due to BIO_CHAIN only ever getting set if
    bio_inc_remaining() is called. For the above pattern it isn't set until
    step 3 above (step 2 would've needed to establish BIO_CHAIN). As such
    the first bio_endio(), in step 2 above, never decremented __bi_remaining
    before calling the intermediate bi_end_io -- leaving __bi_remaining with
    the value 1 instead of 0. When bio_inc_remaining() occurred during step
    3 it brought it to a value of 2. When the second bio_endio() was
    called, in step 4 above, it should've called the original bi_end_io but
    it didn't because there was an extra reference that wasn't dropped (due
    to atomic operations being optimized away since BIO_CHAIN wasn't set
    upfront).

    Fix this issue by removing the __bi_remaining management complexity for
    all callers that use the above pattern -- bio_chain() is the only
    interface that _needs_ to be concerned with __bi_remaining. For the
    above pattern callers just expect the bi_end_io they set to get called!
    Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
    that aren't associated with the bio_chain() interface.

    Also, the bio_inc_remaining() interface has been moved local to bio.c.

    Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

02 Dec, 2014

1 commit

  • bio integrity handling is broken on a system with LVM layered atop a
    DIF/DIX SCSI drive because device mapper clones the bio, modifies the
    clone, and sends the clone to the lower layers for processing.
    However, the clone bio has bi_vcnt == 0, which means that when the sd
    driver calls bio_integrity_process to attach DIX data, the
    for_each_segment_all() call (which uses bi_vcnt) returns immediately
    and random garbage is sent to the disk on a disk write. The disk of
    course returns an error.

    Therefore, teach bio_integrity_process() to use bio_for_each_segment()
    to iterate the bio_vecs, since the per-bio iterator tracks which
    bio_vecs are associated with that particular bio. The integrity
    handling code is effectively part of the "driver" (it's not the bio
    owner), so it must use the correct iterator function.

    v2: Fix a compiler warning about abandoned local variables. This
    patch supersedes "block: bio_integrity_process uses wrong bio_vec
    iterator". Patch applies against 3.18-rc6.

    Signed-off-by: Darrick J. Wong
    Acked-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Darrick J. Wong
     

14 Oct, 2014

1 commit


27 Sep, 2014

9 commits

  • Make the choice of checksum a per-I/O property by introducing a flag
    that can be inspected by the SCSI layer. There are several reasons for
    this:

    1. It allows us to switch choice of checksum without unloading and
    reloading the HBA driver.

    2. During error recovery we need to be able to tell the HBA that
    checksums read from disk should not be verified and converted to IP
    checksums.

    3. For error injection purposes we need to be able to write a bad guard
    tag to storage. Since the storage device only supports T10 CRC we
    need to be able to disable IP checksum conversion on the HBA.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Move flags affecting the integrity code out of the bio bi_flags and into
    the block integrity payload.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Add a BLK_ prefix to the integrity profile flags. Also rename the flags
    to be more consistent with the generate/verify terminology in the rest
    of the integrity code.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Instead of the "operate" parameter we pass in a seed value and a pointer
    to a function that can be used to process the integrity metadata. The
    generation function is changed to have a return value to fit into this
    scheme.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Now that the protection interval has been detached from the sector size
    we need to be able to handle sizes that are different from 4K and
    512. Make the interval calculation generic.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • The protection interval is not necessarily tied to the logical block
    size of a block device. Stop using the terms "sector" and "sectors".

    Going forward we will use the term "seed" to describe the initial
    reference tag value for a given I/O. "Interval" will be used to describe
    the portion of the data buffer that a given piece of protection
    information is associated with.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • bip_buf is not really needed so we can remove it.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • None of the filesystems appear interested in using the integrity tagging
    feature. Potentially because very few storage devices actually permit
    using the application tag space.

    Remove the tagging functions.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • For commands like REQ_COPY we need a way to pass extra information along
    with each bio. Like integrity metadata this information must be
    available at the bottom of the stack so bi_private does not suffice.

    Rename the existing bi_integrity field to bi_special and make it a union
    so we can have different bio extensions for each class of command.

    We previously used bi_integrity != NULL as a way to identify whether a
    bio had integrity metadata or not. Introduce a REQ_INTEGRITY to be the
    indicator now that bi_special can contain different things.

    In addition, bio_integrity(bio) will now return a pointer to the
    integrity payload (when applicable).

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen