03 Sep, 2020

2 commits

  • commit e4b469c66f3cbb81c2e94d31123d7bcdf3c1dabd upstream.

    A previous commit aligning splits to physical block sizes inadvertently
    modified one return case such that that it now returns 0 length splits
    when the number of sectors doesn't exceed the physical offset. This
    later hits a BUG in bio_split(). Restore the previous working behavior.

    Fixes: 9cc5169cd478b ("block: Improve physical block alignment of split bios")
    Reported-by: Eric Deal
    Signed-off-by: Keith Busch
    Cc: Bart Van Assche
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Keith Busch
     
  • [ Upstream commit 943b40c832beb71115e38a1c4d99b640b5342738 ]

    When queue_max_discard_segments(q) is 1, blk_discard_mergable() will
    return false for discard request, then normal request merge is applied.
    However, only queue_max_segments() is checked, so max discard segment
    limit isn't respected.

    Check max discard segment limit in the request merge code for fixing
    the issue.

    Discard request failure of virtio_blk is fixed.

    Fixes: 69840466086d ("block: fix the DISCARD request merge")
    Signed-off-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Cc: Stefano Garzarella
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Ming Lei
     

22 Jul, 2020

2 commits

  • commit 4a2f704eb2d831a2d73d7f4cdd54f45c49c3c353 upstream.

    Commit 429120f3df2d starts to take account of segment's start dma address
    when computing max segment size, and data type of 'unsigned long'
    is used to do that. However, the segment mask may be 0xffffffff, so
    the figured out segment size may be overflowed in case of zero physical
    address on 32bit arch.

    Fix the issue by returning queue_max_segment_size() directly when that
    happens.

    Fixes: 429120f3df2d ("block: fix splitting segments on boundary masks")
    Reported-by: Guenter Roeck
    Tested-by: Guenter Roeck
    Cc: Christoph Hellwig
    Tested-by: Steven Rostedt (VMware)
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Ming Lei
     
  • commit 429120f3df2dba2bf3a4a19f4212a53ecefc7102 upstream.

    We ran into a problem with a mpt3sas based controller, where we would
    see random (and hard to reproduce) file corruption). The issue seemed
    specific to this controller, but wasn't specific to the file system.
    After a lot of debugging, we find out that it's caused by segments
    spanning a 4G memory boundary. This shouldn't happen, as the default
    setting for segment boundary masks is 4G.

    Turns out there are two issues in get_max_segment_size():

    1) The default segment boundary mask is bypassed

    2) The segment start address isn't taken into account when checking
    segment boundary limit

    Fix these two issues by removing the bypass of the segment boundary
    check even if the mask is set to the default value, and taking into
    account the actual start address of the request when checking if a
    segment needs splitting.

    Cc: stable@vger.kernel.org # v5.1+
    Reviewed-by: Chris Mason
    Tested-by: Chris Mason
    Fixes: dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count")
    Signed-off-by: Ming Lei
    Signed-off-by: Greg Kroah-Hartman

    Dropped const on the page pointer, ppc page_to_phys() doesn't mark the
    page as const...

    Signed-off-by: Jens Axboe

    Ming Lei
     

05 Aug, 2019

5 commits

  • Consider the following example:
    * The logical block size is 4 KB.
    * The physical block size is 8 KB.
    * max_sectors equals (16 KB >> 9) sectors.
    * A non-aligned 4 KB and an aligned 64 KB bio are merged into a single
    non-aligned 68 KB bio.

    The current behavior is to split such a bio into (16 KB + 16 KB + 16 KB
    + 16 KB + 4 KB). The start of none of these five bio's is aligned to a
    physical block boundary.

    This patch ensures that such a bio is split into four aligned and
    one non-aligned bio instead of being split into five non-aligned bios.
    This improves performance because most block devices can handle aligned
    requests faster than non-aligned requests.

    Since the physical block size is larger than or equal to the logical
    block size, this patch preserves the guarantee that the returned
    value is a multiple of the logical block size.

    Cc: Christoph Hellwig
    Cc: Ming Lei
    Cc: Hannes Reinecke
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Move the max_sectors check into bvec_split_segs() such that a single
    call to that function can do all the necessary checks. This patch
    optimizes the fast path further, namely if a bvec fits in a page.

    Cc: Christoph Hellwig
    Cc: Ming Lei
    Cc: Hannes Reinecke
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Simplify this function by by removing two if-tests. Other than requiring
    that the @sectors pointer is not NULL, this patch does not change the
    behavior of bvec_split_segs().

    Reviewed-by: Johannes Thumshirn
    Cc: Christoph Hellwig
    Cc: Ming Lei
    Cc: Hannes Reinecke
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Since what the bio splitting functions do is nontrivial, document these
    functions.

    Reviewed-by: Johannes Thumshirn
    Cc: Christoph Hellwig
    Cc: Ming Lei
    Cc: Hannes Reinecke
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Make it clear to the compiler and also to humans that the functions
    that query request queue properties do not modify any member of the
    request_queue data structure.

    Reviewed-by: Johannes Thumshirn
    Cc: Christoph Hellwig
    Cc: Ming Lei
    Cc: Hannes Reinecke
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

03 Jul, 2019

1 commit


21 Jun, 2019

3 commits

  • Now that we don't need to assign the front/back segment sizes, we can
    duplicating the segs assignment for the split vs no-split case and
    remove a whole chunk of boilerplate code.

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Return the segement and let the callers assign them, which makes the code
    a littler more obvious. Also pass the request instead of q plus bio
    chain, allowing for the use of rq_for_each_bvec.

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • We only need the number of segments in the blk-mq submission path.
    Remove the field from struct bio, and return it from a variant of
    blk_queue_split instead of that it can passed as an argument to
    those functions that need the value.

    This also means we stop recounting segments except for cloning
    and partial segments.

    To keep the number of arguments in this how path down remove
    pointless struct request_queue arguments from any of the functions
    that had it and grew a nr_segs argument.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

24 May, 2019

3 commits

  • At this point these fields aren't used for anything, so we can remove
    them.

    Reviewed-by: Ming Lei
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • We fundamentally do not have a maximum segement size for devices with a
    virt boundary. So don't bother checking it, especially given that the
    existing checks didn't properly work to start with as we never fully
    update the front/back segment size and miss the bi_seg_front_size that
    wuld have been required for some cases.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Currently ll_merge_requests_fn, unlike all other merge functions,
    reduces nr_phys_segments by one if the last segment of the previous,
    and the first segment of the next segement are contigous. While this
    seems like a nice solution to avoid building smaller than possible
    requests it causes a mismatch between the segments actually present
    in the request and those iterated over by the bvec iterators, including
    __rq_for_each_bio. This can for example mistrigger the single segment
    optimization in the nvme-pci driver, and might lead to mismatching
    nr_phys_segments number when recalculating the number of request
    when inserting a cloned request.

    We could possibly work around this by making the bvec iterators take
    the front and back segment size into account, but that would require
    moving them from the bio to the bio_iter and spreading this mess
    over all users of bvecs. Or we could simply remove this optimization
    under the assumption that most users already build good enough bvecs,
    and that the bio merge patch never cared about this optimization
    either. The latter is what this patch does.

    dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests").
    Reviewed-by: Ming Lei
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

22 Apr, 2019

1 commit

  • While we generally allow scatterlists to have offsets larger than page
    size for an entry, and other subsystems like the crypto code make use of
    that, the block layer isn't quite ready for that. Flip the switch back
    to avoid them for now, and revisit that decision early in a merge window
    once the known offenders are fixed.

    Fixes: 8a96a0e40810 ("block: rewrite blk_bvec_map_sg to avoid a nth_page call")
    Reviewed-by: Ming Lei
    Tested-by: Guenter Roeck
    Reported-by: Guenter Roeck
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

12 Apr, 2019

1 commit


09 Apr, 2019

1 commit

  • Commit f6970f83ef79 ("block: don't check if adjacent bvecs in one bio can
    be mergeable") changes bvec merge by only considering two bvecs from
    different bios. However, if the former bio doesn't inlcude any io bvec,
    then the following warning may be triggered:

    warning: ‘bvec.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]

    In practice, it shouldn't be triggered.

    Fixes it by adding check on former bio, the check shouldn't add any cost
    given 'bio->bi_iter' can be hit in cache.

    Reported-by: Jens Axboe
    Fixes: f6970f83ef79 ("block: don't check if adjacent bvecs in one bio can be mergeable")
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

02 Apr, 2019

4 commits


07 Mar, 2019

1 commit

  • blk_recount_segments() can be called in bio_add_pc_page() for
    calculating how many segments this bio will has after one page is added
    to this bio. If the resulted segment number is beyond the queue limit,
    the added page will be removed.

    The try-and-fix policy requires blk_recount_segments(__blk_recalc_rq_segments)
    to not consider the segment number limit. Unfortunately bvec_split_segs()
    does check this limit, and causes small segment number returned to
    bio_add_pc_page(), then page still may be added to the bio even though
    segment number limit becomes broken.

    Fixes this issue by not considering segment number limit when calcualting
    bio's segment number.

    Fixes: dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count")
    Cc: Christoph Hellwig
    Cc: Omar Sandoval
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

03 Mar, 2019

1 commit

  • When the current bvec can be merged to the 1st segment, the bio's front
    segment size has to be updated.

    However, dcebd755926b doesn't consider that case, then bio's front
    segment size may not be correct.

    This patch fixes this issue.

    Cc: Christoph Hellwig
    Cc: Omar Sandoval
    Fixes: dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count")
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

27 Feb, 2019

3 commits


20 Feb, 2019

1 commit

  • rq->bio can be NULL sometimes, such as flush request, so don't
    read bio->bi_seg_front_size until this 'bio' is checked as valid.

    Cc: Bart Van Assche
    Reported-by: Bart Van Assche
    Fixes: dcebd755926b0f39dd1e ("block: use bio_for_each_bvec() to compute multi-page bvec count")
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

15 Feb, 2019

4 commits

  • Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"),
    physical segment number is mainly figured out in blk_queue_split() for
    fast path, and the flag of BIO_SEG_VALID is set there too.

    Now only blk_recount_segments() and blk_recalc_rq_segments() use this
    flag.

    Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID
    is set in blk_queue_split().

    For another user of blk_recalc_rq_segments():

    - run in partial completion branch of blk_update_request, which is an unusual case

    - run in blk_cloned_rq_check_limits(), still not a big problem if the flag is killed
    since dm-rq is the only user.

    Multi-page bvec is enabled now, not doing S/G merging is rather pointless with the
    current setup of the I/O path, as it isn't going to save you a significant amount
    of cycles.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Omar Sandoval
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • It is more efficient to use bio_for_each_bvec() to map sg, meantime
    we have to consider splitting multipage bvec as done in blk_bio_segment_split().

    Reviewed-by: Omar Sandoval
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • First it is more efficient to use bio_for_each_bvec() in both
    blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how
    many multi-page bvecs there are in the bio.

    Secondly once bio_for_each_bvec() is used, the bvec may need to be
    splitted because its length can be very longer than max segment size,
    so we have to split the big bvec into several segments.

    Thirdly when splitting multi-page bvec into segments, the max segment
    limit may be reached, so the bio split need to be considered under
    this situation too.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Omar Sandoval
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • It is wrong to use bio->bi_vcnt to figure out how many segments
    there are in the bio even though CLONED flag isn't set on this bio,
    because this bio may be splitted or advanced.

    So always use bio_segments() in blk_recount_segments(), and it shouldn't
    cause any performance loss now because the physical segment number is figured
    out in blk_queue_split() and BIO_SEG_VALID is set meantime since
    bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting").

    Reviewed-by: Omar Sandoval
    Reviewed-by: Christoph Hellwig
    Fixes: 76d8137a3113 ("blk-merge: recaculate segment if it isn't less than max segments")
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

27 Jan, 2019

1 commit


23 Jan, 2019

1 commit

  • Except for blk_queue_split(), bio_split() is used for splitting bio too,
    then the remained bio is often resubmit to queue via generic_make_request().
    So the same queue enter recursion exits in this case too. Unfortunatley
    commit cd4a4ae4683dc2 doesn't help this case.

    This patch covers the above case by setting BIO_QUEUE_ENTERED before calling
    q->make_request_fn.

    In theory the per-bio flag is used to simulate one stack variable, it is
    just fine to clear it after q->make_request_fn is returned. Especially
    the same bio can't be submitted from another context.

    Fixes: cd4a4ae4683dc2 ("block: don't use blocking queue entered for recursive bio submits")
    Cc: Tetsuo Handa
    Cc: NeilBrown
    Reviewed-by: Mike Snitzer
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

29 Dec, 2018

1 commit

  • Pull SCSI updates from James Bottomley:
    "This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
    megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas.

    Additionally, we have a pile of annotation, unused variable and minor
    updates.

    The big API change is the updates for Christoph's DMA rework which
    include removing the DISABLE_CLUSTERING flag.

    And finally there are a couple of target tree updates"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (259 commits)
    scsi: isci: request: mark expected switch fall-through
    scsi: isci: remote_node_context: mark expected switch fall-throughs
    scsi: isci: remote_device: Mark expected switch fall-throughs
    scsi: isci: phy: Mark expected switch fall-through
    scsi: iscsi: Capture iscsi debug messages using tracepoints
    scsi: myrb: Mark expected switch fall-throughs
    scsi: megaraid: fix out-of-bound array accesses
    scsi: mpt3sas: mpt3sas_scsih: Mark expected switch fall-through
    scsi: fcoe: remove set but not used variable 'port'
    scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
    scsi: smartpqi: fix build warnings
    scsi: smartpqi: update driver version
    scsi: smartpqi: add ofa support
    scsi: smartpqi: increase fw status register read timeout
    scsi: smartpqi: bump driver version
    scsi: smartpqi: add smp_utils support
    scsi: smartpqi: correct lun reset issues
    scsi: smartpqi: correct volume status
    scsi: smartpqi: do not offline disks for transient did no connect conditions
    scsi: smartpqi: allow for larger raid maps
    ...

    Linus Torvalds
     

19 Dec, 2018

1 commit

  • Now that the the SCSI layer replaced the use of the cluster flag with
    segment size limits and the DMA boundary we can remove the cluster flag
    from the block layer.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jens Axboe
    Signed-off-by: Martin K. Petersen

    Christoph Hellwig
     

14 Dec, 2018

1 commit


10 Dec, 2018

2 commits

  • We want to convert to per-cpu in_flight counters.

    The function part_round_stats needs the in_flight counter every jiffy, it
    would be too costly to sum all the percpu variables every jiffy, so it
    must be deleted. part_round_stats is used to calculate two counters -
    time_in_queue and io_ticks.

    time_in_queue can be calculated without part_round_stats, by adding the
    duration of the I/O when the I/O ends (the value is almost as exact as the
    previously calculated value, except that time for in-progress I/Os is not
    counted).

    io_ticks can be approximated by increasing the value when I/O is started
    or ended and the jiffies value has changed. If the I/Os take less than a
    jiffy, the value is as exact as the previously calculated value. If the
    I/Os take more than a jiffy, io_ticks can drift behind the previously
    calculated value.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mikulas Patocka
     
  • All of part_stat_* and related methods are used with preempt disabled,
    so there is no need to pass cpu around to allow of them. Just call
    smp_processor_id() as needed.

    Suggested-by: Jens Axboe
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer