08 Aug, 2016

1 commit

  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

27 Jul, 2016

1 commit

  • Pull block driver updates from Jens Axboe:
    "This branch also contains core changes. I've come to the conclusion
    that from 4.9 and forward, I'll be doing just a single branch. We
    often have dependencies between core and drivers, and it's hard to
    always split them up appropriately without pulling core into drivers
    when that happens.

    That said, this contains:

    - separate secure erase type for the core block layer, from
    Christoph.

    - set of discard fixes, from Christoph.

    - bio shrinking fixes from Christoph, as a followup up to the
    op/flags change in the core branch.

    - map and append request fixes from Christoph.

    - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
    exciting!

    - nvme-loop fixes from Arnd.

    - removal of ->driverfs_dev from Dan, after providing a
    device_add_disk() helper.

    - bcache fixes from Bhaktipriya and Yijing.

    - cdrom subchannel read fix from Vchannaiah.

    - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

    - set of drbd updates and fixes from Fabian, Lars, and Philipp.

    - mg_disk error path fix from Bart.

    - user notification for failed device add for loop, from Minfei.

    - NVMe in general:
    + NVMe delay quirk from Guilherme.
    + SR-IOV support and command retry limits from Keith.
    + fix for memory-less NUMA node from Masayoshi.
    + use UINT_MAX for discard sectors, from Minfei.
    + cancel IO fixes from Ming.
    + don't allocate unused major, from Neil.
    + error code fixup from Dan.
    + use constants for PSDT/FUSE from James.
    + variable init fix from Jay.
    + fabrics fixes from Ming, Sagi, and Wei.
    + various fixes"

    * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
    nvme/pci: Provide SR-IOV support
    nvme: initialize variable before logical OR'ing it
    block: unexport various bio mapping helpers
    scsi/osd: open code blk_make_request
    target: stop using blk_make_request
    block: simplify and export blk_rq_append_bio
    block: ensure bios return from blk_get_request are properly initialized
    virtio_blk: use blk_rq_map_kern
    memstick: don't allow REQ_TYPE_BLOCK_PC requests
    block: shrink bio size again
    block: simplify and cleanup bvec pool handling
    block: get rid of bio_rw and READA
    block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
    block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
    NVMe: don't allocate unused nvme_major
    nvme: avoid crashes when node 0 is memoryless node.
    nvme: Limit command retries
    loop: Make user notify for adding loop device failed
    nvme-loop: fix nvme-loop Kconfig dependencies
    nvmet: fix return value check in nvmet_subsys_alloc()
    ...

    Linus Torvalds
     

21 Jul, 2016

1 commit

  • Instead of a flag and an index just make sure an index of 0 means
    no need to free the bvec array. Also move the constants related
    to the bvec pools together and use a consistent naming scheme for
    them.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Mike Christie
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

14 Jun, 2016

1 commit


09 Dec, 2015

1 commit


04 Dec, 2015

1 commit

  • This patch moves the blk_integrity_payload definition outside the
    CONFIG_BLK_DEV_INTERITY dependency and provides empty function
    implementations when the kernel configuration disables integrity
    extensions. This simplifies drivers that make use of these to map user
    data so they don't need to repeat the same configuration checks.

    Signed-off-by: Keith Busch

    Updated by Jens to pass an error pointer return from
    bio_integrity_alloc(), otherwise if CONFIG_BLK_DEV_INTEGRITY isn't
    set, we return a weird ENOMEM from __nvme_submit_user_cmd()
    if a meta buffer is set.

    Signed-off-by: Jens Axboe

    Keith Busch
     

22 Oct, 2015

3 commits

  • Since they lack requests to pin the request_queue active, synchronous
    bio-based drivers may have in-flight integrity work from
    bio_integrity_endio() that is not flushed by blk_freeze_queue(). Flush
    that work to prevent races to free the queue and the final usage of the
    blk_integrity profile.

    This is temporary unless/until bio-based drivers start to generically
    take a q_usage_counter reference while a bio is in-flight.

    Cc: Martin K. Petersen
    [martin: fix the CONFIG_BLK_DEV_INTEGRITY=n case]
    Tested-by: Ross Zwisler
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Dan Williams
     
  • The per-device properties in the blk_integrity structure were previously
    unsigned short. However, most of the values fit inside a char. The only
    exception is the data interval size and we can work around that by
    storing it as a power of two.

    This cuts the size of the dynamic portion of blk_integrity in half.

    Signed-off-by: Martin K. Petersen
    Reported-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • We previously made a complete copy of a device's data integrity profile
    even though several of the fields inside the blk_integrity struct are
    pointers to fixed template entries in t10-pi.c.

    Split the static and per-device portions so that we can reference the
    template directly.

    Signed-off-by: Martin K. Petersen
    Reported-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Cc: Dan Williams
    Signed-off-by: Dan Williams
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

11 Sep, 2015

1 commit

  • This is only theoretical at the moment given that the only
    subsystems that generate integrity payloads are the block layer
    itself and the scsi target (which generate well aligned integrity
    payloads). But when we will expose integrity meta-data to user-space,
    we'll need to refuse appending a page with a gap (if the queue
    virtual boundary is set).

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

07 Jul, 2015

1 commit

  • bio_integrity_alloc() and bio_integrity_free() assume that if a bio was
    allocated from a bioset that that bioset also had its bio_integrity_pool
    allocated using bioset_integrity_create(). This is a very bad
    assumption given that bioset_create() and bioset_integrity_create() are
    completely disjoint. Not all callers of bioset_create() have been
    trained to also call bioset_integrity_create() -- and they may not care
    to be.

    Fix this by falling back to kmalloc'ing 'struct bio_integrity_payload'
    rather than force all bioset consumers to (wastefully) preallocate a
    bio_integrity_pool that they very likely won't actually need (given the
    niche nature of the current block integrity support).

    Otherwise, a NULL pointer "Kernel BUG" with a trace like the following
    will be observed (as seen on s390x using zfcp storage) because dm-io
    doesn't use bioset_integrity_create() when creating its bioset:

    [ 791.643338] Call Trace:
    [ 791.643339] ([] 0x3df98b848)
    [ 791.643341] [] bio_integrity_alloc+0x48/0xf8
    [ 791.643348] [] bio_integrity_prep+0xae/0x2f0
    [ 791.643349] [] blk_queue_bio+0x1c8/0x3d8
    [ 791.643355] [] generic_make_request+0xc0/0x100
    [ 791.643357] [] submit_bio+0xa2/0x198
    [ 791.643406] [] dispatch_io+0x15c/0x3b0 [dm_mod]
    [ 791.643419] [] dm_io+0x176/0x2f0 [dm_mod]
    [ 791.643423] [] do_reads+0x13a/0x1a8 [dm_mirror]
    [ 791.643425] [] do_mirror+0x142/0x298 [dm_mirror]
    [ 791.643428] [] process_one_work+0x18a/0x3f8
    [ 791.643432] [] worker_thread+0x132/0x3b0
    [ 791.643435] [] kthread+0xd2/0xd8
    [ 791.643438] [] kernel_thread_starter+0x6/0xc
    [ 791.643446] [] kernel_thread_starter+0x0/0xc

    Signed-off-by: Mike Snitzer
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

22 May, 2015

1 commit

  • Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
    non-chains") regressed all existing callers that followed this pattern:
    1) saving a bio's original bi_end_io
    2) wiring up an intermediate bi_end_io
    3) restoring the original bi_end_io from intermediate bi_end_io
    4) calling bio_endio() to execute the restored original bi_end_io

    The regression was due to BIO_CHAIN only ever getting set if
    bio_inc_remaining() is called. For the above pattern it isn't set until
    step 3 above (step 2 would've needed to establish BIO_CHAIN). As such
    the first bio_endio(), in step 2 above, never decremented __bi_remaining
    before calling the intermediate bi_end_io -- leaving __bi_remaining with
    the value 1 instead of 0. When bio_inc_remaining() occurred during step
    3 it brought it to a value of 2. When the second bio_endio() was
    called, in step 4 above, it should've called the original bi_end_io but
    it didn't because there was an extra reference that wasn't dropped (due
    to atomic operations being optimized away since BIO_CHAIN wasn't set
    upfront).

    Fix this issue by removing the __bi_remaining management complexity for
    all callers that use the above pattern -- bio_chain() is the only
    interface that _needs_ to be concerned with __bi_remaining. For the
    above pattern callers just expect the bi_end_io they set to get called!
    Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
    that aren't associated with the bio_chain() interface.

    Also, the bio_inc_remaining() interface has been moved local to bio.c.

    Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

02 Dec, 2014

1 commit

  • bio integrity handling is broken on a system with LVM layered atop a
    DIF/DIX SCSI drive because device mapper clones the bio, modifies the
    clone, and sends the clone to the lower layers for processing.
    However, the clone bio has bi_vcnt == 0, which means that when the sd
    driver calls bio_integrity_process to attach DIX data, the
    for_each_segment_all() call (which uses bi_vcnt) returns immediately
    and random garbage is sent to the disk on a disk write. The disk of
    course returns an error.

    Therefore, teach bio_integrity_process() to use bio_for_each_segment()
    to iterate the bio_vecs, since the per-bio iterator tracks which
    bio_vecs are associated with that particular bio. The integrity
    handling code is effectively part of the "driver" (it's not the bio
    owner), so it must use the correct iterator function.

    v2: Fix a compiler warning about abandoned local variables. This
    patch supersedes "block: bio_integrity_process uses wrong bio_vec
    iterator". Patch applies against 3.18-rc6.

    Signed-off-by: Darrick J. Wong
    Acked-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Darrick J. Wong
     

14 Oct, 2014

1 commit


27 Sep, 2014

10 commits

  • Make the choice of checksum a per-I/O property by introducing a flag
    that can be inspected by the SCSI layer. There are several reasons for
    this:

    1. It allows us to switch choice of checksum without unloading and
    reloading the HBA driver.

    2. During error recovery we need to be able to tell the HBA that
    checksums read from disk should not be verified and converted to IP
    checksums.

    3. For error injection purposes we need to be able to write a bad guard
    tag to storage. Since the storage device only supports T10 CRC we
    need to be able to disable IP checksum conversion on the HBA.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Move flags affecting the integrity code out of the bio bi_flags and into
    the block integrity payload.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Add a BLK_ prefix to the integrity profile flags. Also rename the flags
    to be more consistent with the generate/verify terminology in the rest
    of the integrity code.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Instead of the "operate" parameter we pass in a seed value and a pointer
    to a function that can be used to process the integrity metadata. The
    generation function is changed to have a return value to fit into this
    scheme.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Now that the protection interval has been detached from the sector size
    we need to be able to handle sizes that are different from 4K and
    512. Make the interval calculation generic.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • The protection interval is not necessarily tied to the logical block
    size of a block device. Stop using the terms "sector" and "sectors".

    Going forward we will use the term "seed" to describe the initial
    reference tag value for a given I/O. "Interval" will be used to describe
    the portion of the data buffer that a given piece of protection
    information is associated with.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • bip_buf is not really needed so we can remove it.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • None of the filesystems appear interested in using the integrity tagging
    feature. Potentially because very few storage devices actually permit
    using the application tag space.

    Remove the tagging functions.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • For commands like REQ_COPY we need a way to pass extra information along
    with each bio. Like integrity metadata this information must be
    available at the bottom of the stack so bi_private does not suffice.

    Rename the existing bi_integrity field to bi_special and make it a union
    so we can have different bio extensions for each class of command.

    We previously used bi_integrity != NULL as a way to identify whether a
    bio had integrity metadata or not. Introduce a REQ_INTEGRITY to be the
    indicator now that bi_special can contain different things.

    In addition, bio_integrity(bio) will now return a pointer to the
    integrity payload (when applicable).

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • bdev_integrity_enabled() is only used by bio_integrity_enabled().
    Combine these two functions.

    Signed-off-by: Martin K. Petersen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

22 Aug, 2014

1 commit

  • When getting a pi error we get to bio_integrity_end_io with
    bi_remaining already decremented to 0 where we will eventually
    need to call bio_endio with restored original bio completion handler.
    Calling bio_endio invokes a BUG_ON(). We should call bio_endio_nodec
    instead, like what is done in bio_integrity_verify_fn.

    Signed-off-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Sagi Grimberg
     

02 Jul, 2014

1 commit

  • Commit 08778795 ("block: Fix nr_vecs for inline integrity vectors") from
    Martin introduces the function bip_integrity_vecs(get the useful vectors)
    to fix the issue about nr_vecs for inline integrity vectors that reported
    by David Milburn.

    But it seems that bip_integrity_vecs() will return the wrong number if the
    bio is not based on any bio_set for some reason(bio->bi_pool == NULL),
    because in that case, the bip_inline_vecs[0] is malloced directly. So
    here we add the bip_max_vcnt to record the count of vector slots, and
    cleanup the function bip_integrity_vecs().

    Signed-off-by: Gu Zheng
    Cc: Martin K. Petersen
    Cc: Kent Overstreet
    Signed-off-by: Jens Axboe

    Gu Zheng
     

19 May, 2014

1 commit