22 Oct, 2015

3 commits

  • Since they lack requests to pin the request_queue active, synchronous
    bio-based drivers may have in-flight integrity work from
    bio_integrity_endio() that is not flushed by blk_freeze_queue(). Flush
    that work to prevent races to free the queue and the final usage of the
    blk_integrity profile.

    This is temporary unless/until bio-based drivers start to generically
    take a q_usage_counter reference while a bio is in-flight.

    Cc: Martin K. Petersen
    [martin: fix the CONFIG_BLK_DEV_INTEGRITY=n case]
    Tested-by: Ross Zwisler
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Dan Williams
     
  • The per-device properties in the blk_integrity structure were previously
    unsigned short. However, most of the values fit inside a char. The only
    exception is the data interval size and we can work around that by
    storing it as a power of two.

    This cuts the size of the dynamic portion of blk_integrity in half.

    Signed-off-by: Martin K. Petersen
    Reported-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • We previously made a complete copy of a device's data integrity profile
    even though several of the fields inside the blk_integrity struct are
    pointers to fixed template entries in t10-pi.c.

    Split the static and per-device portions so that we can reference the
    template directly.

    Signed-off-by: Martin K. Petersen
    Reported-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Cc: Dan Williams
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

11 Sep, 2015

1 commit

  • This is only theoretical at the moment given that the only
    subsystems that generate integrity payloads are the block layer
    itself and the scsi target (which generate well aligned integrity
    payloads). But when we will expose integrity meta-data to user-space,
    we'll need to refuse appending a page with a gap (if the queue
    virtual boundary is set).

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

07 Jul, 2015

1 commit

  • bio_integrity_alloc() and bio_integrity_free() assume that if a bio was
    allocated from a bioset that that bioset also had its bio_integrity_pool
    allocated using bioset_integrity_create(). This is a very bad
    assumption given that bioset_create() and bioset_integrity_create() are
    completely disjoint. Not all callers of bioset_create() have been
    trained to also call bioset_integrity_create() -- and they may not care
    to be.

    Fix this by falling back to kmalloc'ing 'struct bio_integrity_payload'
    rather than force all bioset consumers to (wastefully) preallocate a
    bio_integrity_pool that they very likely won't actually need (given the
    niche nature of the current block integrity support).

    Otherwise, a NULL pointer "Kernel BUG" with a trace like the following
    will be observed (as seen on s390x using zfcp storage) because dm-io
    doesn't use bioset_integrity_create() when creating its bioset:

    [ 791.643338] Call Trace:
    [ 791.643339] ([] 0x3df98b848)
    [ 791.643341] [] bio_integrity_alloc+0x48/0xf8
    [ 791.643348] [] bio_integrity_prep+0xae/0x2f0
    [ 791.643349] [] blk_queue_bio+0x1c8/0x3d8
    [ 791.643355] [] generic_make_request+0xc0/0x100
    [ 791.643357] [] submit_bio+0xa2/0x198
    [ 791.643406] [] dispatch_io+0x15c/0x3b0 [dm_mod]
    [ 791.643419] [] dm_io+0x176/0x2f0 [dm_mod]
    [ 791.643423] [] do_reads+0x13a/0x1a8 [dm_mirror]
    [ 791.643425] [] do_mirror+0x142/0x298 [dm_mirror]
    [ 791.643428] [] process_one_work+0x18a/0x3f8
    [ 791.643432] [] worker_thread+0x132/0x3b0
    [ 791.643435] [] kthread+0xd2/0xd8
    [ 791.643438] [] kernel_thread_starter+0x6/0xc
    [ 791.643446] [] kernel_thread_starter+0x0/0xc

    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

22 May, 2015

1 commit

  • Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
    non-chains") regressed all existing callers that followed this pattern:
    1) saving a bio's original bi_end_io
    2) wiring up an intermediate bi_end_io
    3) restoring the original bi_end_io from intermediate bi_end_io
    4) calling bio_endio() to execute the restored original bi_end_io

    The regression was due to BIO_CHAIN only ever getting set if
    bio_inc_remaining() is called. For the above pattern it isn't set until
    step 3 above (step 2 would've needed to establish BIO_CHAIN). As such
    the first bio_endio(), in step 2 above, never decremented __bi_remaining
    before calling the intermediate bi_end_io -- leaving __bi_remaining with
    the value 1 instead of 0. When bio_inc_remaining() occurred during step
    3 it brought it to a value of 2. When the second bio_endio() was
    called, in step 4 above, it should've called the original bi_end_io but
    it didn't because there was an extra reference that wasn't dropped (due
    to atomic operations being optimized away since BIO_CHAIN wasn't set
    upfront).

    Fix this issue by removing the __bi_remaining management complexity for
    all callers that use the above pattern -- bio_chain() is the only
    interface that _needs_ to be concerned with __bi_remaining. For the
    above pattern callers just expect the bi_end_io they set to get called!
    Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
    that aren't associated with the bio_chain() interface.

    Also, the bio_inc_remaining() interface has been moved local to bio.c.

    Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

02 Dec, 2014

1 commit

  • bio integrity handling is broken on a system with LVM layered atop a
    DIF/DIX SCSI drive because device mapper clones the bio, modifies the
    clone, and sends the clone to the lower layers for processing.
    However, the clone bio has bi_vcnt == 0, which means that when the sd
    driver calls bio_integrity_process to attach DIX data, the
    for_each_segment_all() call (which uses bi_vcnt) returns immediately
    and random garbage is sent to the disk on a disk write. The disk of
    course returns an error.

    Therefore, teach bio_integrity_process() to use bio_for_each_segment()
    to iterate the bio_vecs, since the per-bio iterator tracks which
    bio_vecs are associated with that particular bio. The integrity
    handling code is effectively part of the "driver" (it's not the bio
    owner), so it must use the correct iterator function.

    v2: Fix a compiler warning about abandoned local variables. This
    patch supersedes "block: bio_integrity_process uses wrong bio_vec
    iterator". Patch applies against 3.18-rc6.

    Signed-off-by: Darrick J. Wong
    Acked-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Darrick J. Wong
     

14 Oct, 2014

1 commit


27 Sep, 2014

10 commits

  • Make the choice of checksum a per-I/O property by introducing a flag
    that can be inspected by the SCSI layer. There are several reasons for
    this:

    1. It allows us to switch choice of checksum without unloading and
    reloading the HBA driver.

    2. During error recovery we need to be able to tell the HBA that
    checksums read from disk should not be verified and converted to IP
    checksums.

    3. For error injection purposes we need to be able to write a bad guard
    tag to storage. Since the storage device only supports T10 CRC we
    need to be able to disable IP checksum conversion on the HBA.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Move flags affecting the integrity code out of the bio bi_flags and into
    the block integrity payload.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Add a BLK_ prefix to the integrity profile flags. Also rename the flags
    to be more consistent with the generate/verify terminology in the rest
    of the integrity code.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Instead of the "operate" parameter we pass in a seed value and a pointer
    to a function that can be used to process the integrity metadata. The
    generation function is changed to have a return value to fit into this
    scheme.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Now that the protection interval has been detached from the sector size
    we need to be able to handle sizes that are different from 4K and
    512. Make the interval calculation generic.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • The protection interval is not necessarily tied to the logical block
    size of a block device. Stop using the terms "sector" and "sectors".

    Going forward we will use the term "seed" to describe the initial
    reference tag value for a given I/O. "Interval" will be used to describe
    the portion of the data buffer that a given piece of protection
    information is associated with.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • bip_buf is not really needed so we can remove it.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • None of the filesystems appear interested in using the integrity tagging
    feature. Potentially because very few storage devices actually permit
    using the application tag space.

    Remove the tagging functions.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • For commands like REQ_COPY we need a way to pass extra information along
    with each bio. Like integrity metadata this information must be
    available at the bottom of the stack so bi_private does not suffice.

    Rename the existing bi_integrity field to bi_special and make it a union
    so we can have different bio extensions for each class of command.

    We previously used bi_integrity != NULL as a way to identify whether a
    bio had integrity metadata or not. Introduce a REQ_INTEGRITY to be the
    indicator now that bi_special can contain different things.

    In addition, bio_integrity(bio) will now return a pointer to the
    integrity payload (when applicable).

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • bdev_integrity_enabled() is only used by bio_integrity_enabled().
    Combine these two functions.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

22 Aug, 2014

1 commit

  • When getting a pi error we get to bio_integrity_end_io with
    bi_remaining already decremented to 0 where we will eventually
    need to call bio_endio with restored original bio completion handler.
    Calling bio_endio invokes a BUG_ON(). We should call bio_endio_nodec
    instead, like what is done in bio_integrity_verify_fn.

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     

02 Jul, 2014

1 commit

  • Commit 08778795 ("block: Fix nr_vecs for inline integrity vectors") from
    Martin introduces the function bip_integrity_vecs(get the useful vectors)
    to fix the issue about nr_vecs for inline integrity vectors that reported
    by David Milburn.

    But it seems that bip_integrity_vecs() will return the wrong number if the
    bio is not based on any bio_set for some reason(bio->bi_pool == NULL),
    because in that case, the bip_inline_vecs[0] is malloced directly. So
    here we add the bip_max_vcnt to record the count of vector slots, and
    cleanup the function bip_integrity_vecs().

    Signed-off-by: Gu Zheng
    Cc: Martin K. Petersen
    Cc: Kent Overstreet
    Signed-off-by: Jens Axboe

    Gu Zheng
     

19 May, 2014

1 commit