06 Feb, 2020

1 commit

  • Pull more block updates from Jens Axboe:
    "Some later arrivals, but all fixes at this point:

    - bcache fix series (Coly)

    - Series of BFQ fixes (Paolo)

    - NVMe pull request from Keith with a few minor NVMe fixes

    - Various little tweaks"

    * tag 'block-5.6-2020-02-05' of git://git.kernel.dk/linux-block: (23 commits)
    nvmet: update AEN list and array at one place
    nvmet: Fix controller use after free
    nvmet: Fix error print message at nvmet_install_queue function
    brd: check and limit max_part par
    nvme-pci: remove nvmeq->tags
    nvmet: fix dsm failure when payload does not match sgl descriptor
    nvmet: Pass lockdep expression to RCU lists
    block, bfq: clarify the goal of bfq_split_bfqq()
    block, bfq: get a ref to a group when adding it to a service tree
    block, bfq: remove ifdefs from around gets/puts of bfq groups
    block, bfq: extend incomplete name of field on_st
    block, bfq: get extra ref to prevent a queue from being freed during a group move
    block, bfq: do not insert oom queue into position tree
    block, bfq: do not plug I/O for bfq_queues with no proc refs
    bcache: check return value of prio_read()
    bcache: fix incorrect data type usage in btree_flush_write()
    bcache: add readahead cache policy options via sysfs interface
    bcache: explicity type cast in bset_bkey_last()
    bcache: fix memory corruption in bch_cache_accounting_clear()
    xen/blkfront: limit allocated memory size to actual use case
    ...

    Linus Torvalds
     

03 Feb, 2020

7 commits

  • The exact, general goal of the function bfq_split_bfqq() is not that
    apparent. Add a comment to make it clear.

    Tested-by: Oleksandr Natalenko
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • BFQ schedules generic entities, which may represent either bfq_queues
    or groups of bfq_queues. When an entity is inserted into a service
    tree, a reference must be taken, to make sure that the entity does not
    disappear while still referred in the tree. Unfortunately, such a
    reference is mistakenly taken only if the entity represents a
    bfq_queue. This commit takes a reference also in case the entity
    represents a group.

    Tested-by: Oleksandr Natalenko
    Tested-by: Chris Evich
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • ifdefs around gets and puts of bfq groups reduce readability, remove them.

    Tested-by: Oleksandr Natalenko
    Reported-by: Jens Axboe
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • The flag on_st in the bfq_entity data structure is true if the entity
    is on a service tree or is in service. Yet the name of the field,
    confusingly, does not mention the second, very important case. Extend
    the name to mention the second case too.

    Tested-by: Oleksandr Natalenko
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • In bfq_bfqq_move(), the bfq_queue, say Q, to be moved to a new group
    may happen to be deactivated in the scheduling data structures of the
    source group (and then activated in the destination group). If Q is
    referred only by the data structures in the source group when the
    deactivation happens, then Q is freed upon the deactivation.

    This commit addresses this issue by getting an extra reference before
    the possible deactivation, and releasing this extra reference after Q
    has been moved.

    Tested-by: Chris Evich
    Tested-by: Oleksandr Natalenko
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • BFQ maintains an ordered list, implemented with an RB tree, of
    head-request positions of non-empty bfq_queues. This position tree,
    inherited from CFQ, is used to find bfq_queues that contain I/O close
    to each other. BFQ merges these bfq_queues into a single shared queue,
    if this boosts throughput on the device at hand.

    There is however a special-purpose bfq_queue that does not participate
    in queue merging, the oom bfq_queue. Yet, also this bfq_queue could be
    wrongly added to the position tree. So bfqq_find_close() could return
    the oom bfq_queue, which is a source of further troubles in an
    out-of-memory situation. This commit prevents the oom bfq_queue from
    being inserted into the position tree.

    Tested-by: Patrick Dung
    Tested-by: Oleksandr Natalenko
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     
  • Commit 478de3380c1c ("block, bfq: deschedule empty bfq_queues not
    referred by any process") fixed commit 3726112ec731 ("block, bfq:
    re-schedule empty queues if they deserve I/O plugging") by
    descheduling an empty bfq_queue when it remains with not process
    reference. Yet, this still left a case uncovered: an empty bfq_queue
    with not process reference that remains in service. This happens for
    an in-service sync bfq_queue that is deemed to deserve I/O-dispatch
    plugging when it remains empty. Yet no new requests will arrive for
    such a bfq_queue if no process sends requests to it any longer. Even
    worse, the bfq_queue may happen to be prematurely freed while still in
    service (because there may remain no reference to it any longer).

    This commit solves this problem by preventing I/O dispatch from being
    plugged for the in-service bfq_queue, if the latter has no process
    reference (the bfq_queue is then prevented from remaining in service).

    Fixes: 3726112ec731 ("block, bfq: re-schedule empty queues if they deserve I/O plugging")
    Tested-by: Oleksandr Natalenko
    Reported-by: Patrick Dung
    Tested-by: Patrick Dung
    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     

30 Jan, 2020

1 commit

  • Pull SCSI updates from James Bottomley:
    "This series is slightly unusual because it includes Arnd's compat
    ioctl tree here:

    1c46a2cf2dbd Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue

    Excluding Arnd's changes, this is mostly an update of the usual
    drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas.

    There are a couple of core and base updates around error propagation
    and atomicity in the attribute container base we use for the SCSI
    transport classes.

    The rest is minor changes and updates"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits)
    scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask
    scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity
    scsi: hisi_sas: Modify the file permissions of trigger_dump to write only
    scsi: hisi_sas: Replace magic number when handle channel interrupt
    scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock
    scsi: hisi_sas: use threaded irq to process CQ interrupts
    scsi: ufs: Use UFS device indicated maximum LU number
    scsi: ufs: Add max_lu_supported in struct ufs_dev_info
    scsi: ufs: Delete is_init_prefetch from struct ufs_hba
    scsi: ufs: Inline two functions into their callers
    scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init()
    scsi: ufs: Split ufshcd_probe_hba() based on its called flow
    scsi: ufs: Delete struct ufs_dev_desc
    scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails
    scsi: ufs-mediatek: enable low-power mode for hibern8 state
    scsi: ufs: export some functions for vendor usage
    scsi: ufs-mediatek: add dbg_register_dump implementation
    scsi: qla2xxx: Fix a NULL pointer dereference in an error path
    scsi: qla1280: Make checking for 64bit support consistent
    scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1
    ...

    Linus Torvalds
     

28 Jan, 2020

1 commit

  • Pull core block updates from Jens Axboe:
    "This may be the most quiet round we've had in years. I'm not
    complaining. Really not a lot to detail here, outside of spelling and
    documentation improvements/fixes, we have:

    - Allow t10-pi to be modular (Herbert)

    - Remove dead code in bfq (Alex)

    - Mark zone management requests with REQ_SYNC (Chaitanya)

    - BFQ division improvement (Wen)

    - Small series improving plugging (Pavel)"

    * tag 'for-5.6/block-2020-01-27' of git://git.kernel.dk/linux-block:
    partitions/ldm: fix spelling mistake "to" -> "too"
    block, bfq: improve arithmetic division in bfq_delta()
    block/bfq: remove unused bfq_class_rt which never used
    block: mark zone-mgmt bios with REQ_SYNC
    blk-mq: Document functions for sending request
    block: Allow t10-pi to be modular
    blk-mq: optimise blk_mq_flush_plug_list()
    list: introduce list_for_each_continue()
    blk-mq: optimise rq sort function

    Linus Torvalds
     

27 Jan, 2020

1 commit

  • Host-aware SMR drives can be used with the commands to explicitly manage
    zone state, but they can also be used as normal disks. In the former
    case it makes perfect sense to allow partitions on them, in the latter
    it does not, just like for host managed devices. Add a check to
    add_partition to allow partitions on host aware devices, but give
    up any zone management capabilities in that case, which also catches
    the previously missed case of adding a partition vs just scanning it.

    Because sd can rescan the attribute at runtime it needs to check if
    a disk has partitions, for which a new helper is added to genhd.h.

    Fixes: 5eac3eb30c9a ("block: Remove partition support for zoned block devices")
    Reported-by: Borislav Petkov
    Signed-off-by: Christoph Hellwig
    Tested-by: Damien Le Moal
    Reviewed-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

24 Jan, 2020

1 commit


23 Jan, 2020

2 commits

  • do_div() does a 64-by-32 division. Use div64_ul() instead of it
    if the divisor is unsigned long, to avoid truncation to 32-bit.
    And as a nice side effect also cleans up the function a bit.

    Signed-off-by: Wen Yang
    Cc: Paolo Valente
    Cc: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Jens Axboe

    Wen Yang
     
  • This macro is never used after introduced from commit aee69d78dec0
    ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")

    Better to remove it.

    Signed-off-by: Alex Shi
    Cc: Paolo Valente
    Cc: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Jens Axboe

    Alex Shi
     

16 Jan, 2020

1 commit

  • Logical block size has type unsigned short. That means that it can be at
    most 32768. However, there are architectures that can run with 64k pages
    (for example arm64) and on these architectures, it may be possible to
    create block devices with 64k block size.

    For exmaple (run this on an architecture with 64k pages):

    Mount will fail with this error because it tries to read the superblock using 2-sector
    access:
    device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
    EXT4-fs (dm-0): unable to read superblock

    This patch changes the logical block size from unsigned short to unsigned
    int to avoid the overflow.

    Cc: stable@vger.kernel.org
    Reviewed-by: Martin K. Petersen
    Reviewed-by: Ming Lei
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Jens Axboe

    Mikulas Patocka
     

15 Jan, 2020

1 commit

  • Commit 429120f3df2d starts to take account of segment's start dma address
    when computing max segment size, and data type of 'unsigned long'
    is used to do that. However, the segment mask may be 0xffffffff, so
    the figured out segment size may be overflowed in case of zero physical
    address on 32bit arch.

    Fix the issue by returning queue_max_segment_size() directly when that
    happens.

    Fixes: 429120f3df2d ("block: fix splitting segments on boundary masks")
    Reported-by: Guenter Roeck
    Tested-by: Guenter Roeck
    Cc: Christoph Hellwig
    Tested-by: Steven Rostedt (VMware)
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

09 Jan, 2020

2 commits

  • Commit 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
    adds bio_truncate() for handling bio EOD. However, bio_truncate()
    doesn't use the passed 'op' parameter from guard_bio_eod's callers.

    So bio_trunacate() may retrieve wrong 'op', and zering pages may
    not be done for READ bio.

    Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
    in submit_bh_wbc() so that bio_truncate() can always retrieve correct
    op info.

    Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
    used any more.

    Cc: Carlos Maiolino
    Cc: linux-fsdevel@vger.kernel.org
    Fixes: 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
    Signed-off-by: Ming Lei

    Fold in kerneldoc and bio_op() change.

    Signed-off-by: Jens Axboe

    Ming Lei
     
  • In the current implementation, final zone-mgmt request is issued with
    submit_bio_wait() which marks the bio REQ_SYNC. This is needed since
    immediate action is expected for zone-mgmt requests as these are
    blocking operations. This also bypasses the scheduler in the
    blk_mq_make_request() and dispatches the request directly into the
    hw ctx.

    This patch marks all the chained bios REQ_SYNC so that we can have
    above-mentioned behavior for non-final bios also.

    Reviewed-by: Damien Le Moal
    Reviewed-by: Bob Liu
    Signed-off-by: Chaitanya Kulkarni
    Signed-off-by: Jens Axboe

    Chaitanya Kulkarni
     

07 Jan, 2020

2 commits

  • Add or improve documentation for function regarding creating and sending
    IO requests to the hardware.

    Signed-off-by: André Almeida
    Signed-off-by: Jens Axboe

    André Almeida
     
  • Currently t10-pi can only be built into the block layer which via
    crc-t10dif pulls in a whole chunk of the Crypto API. In fact all
    users of t10-pi work as modules and there is no reason for it to
    always be built-in.

    This patch adds a new hidden option for t10-pi that is selected
    automatically based on BLK_DEV_INTEGRITY and whether the users
    of t10-pi are built-in or not.

    Signed-off-by: Herbert Xu
    Signed-off-by: Jens Axboe

    Herbert Xu
     

03 Jan, 2020

10 commits

  • Having separate implementations of blkdev_ioctl() often leads to these
    getting out of sync, despite the comment at the top.

    Since most of the ioctl commands are compatible, and we try very hard
    not to add any new incompatible ones, move all the common bits into a
    shared function and leave only the ones that are historically different
    in separate functions for native/compat mode.

    To deal with the compat_ptr() conversion, pass both the integer
    argument and the pointer argument into the new blkdev_common_ioctl()
    and make sure to always use the correct one of these.

    blkdev_ioctl() is now only kept as a separate exported interfact
    for drivers/char/raw.c, which lacks a compat_ioctl variant.
    We should probably either move raw.c to staging if there are no
    more users, or export blkdev_compat_ioctl() as well.

    Reviewed-by: Ben Hutchings
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • There is no need to go through a compat_alloc_user_space()
    copy any more, just wrap the function in a small helper that
    works the same way for native and compat mode.

    Reviewed-by: Ben Hutchings
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • Having both in the same file allows a number of simplifications
    to the compat path, and makes it more likely that changes to
    the native path get applied to the compat version as well.

    Reviewed-by: Ben Hutchings
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • Most of the HDIO ioctls are only used by the obsolete drivers/ide
    subsystem, these can be handled by changing ide_cmd_ioctl() to be aware
    of compat mode and doing the correct transformations in place and using
    it as both native and compat handlers for all drivers.

    The SCSI drivers implementing the same commands are already doing
    this in the drivers, so the compat_blkdev_driver_ioctl() function
    is no longer needed now.

    The BLKSECTSET and HDIO_GETGEO_BIG ioctls are not implemented
    in any driver any more and no longer need any conversion.

    Reviewed-by: Ben Hutchings
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • There is no need for the special cases for the cdrom ioctls any more now,
    so make sure that each cdrom driver has a .compat_ioctl() callback and
    calls cdrom_compat_ioctl() directly there.

    Reviewed-by: Ben Hutchings
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • bsg_ioctl() calls into scsi_cmd_ioctl() for a couple of generic commands
    and relies on fs/compat_ioctl.c to handle it correctly in compat mode.

    Adding a private compat_ioctl() handler avoids that round-trip and lets
    us get rid of the generic emulation once this is done.

    Note that bsg implements an SG_IO command that is different from the
    other drivers and does not need emulation.

    Reviewed-by: Ben Hutchings
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • Again, there is only one file that needs this, so move the conversion
    handler into the native implementation.

    Reviewed-by: Ben Hutchings
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • There is only one implementation of this ioctl, so move the handling out
    of the common block layer code into the place where it's actually needed.

    It also gets called indirectly through pktcdvd, which needs to be aware
    of this change.

    As I noticed, the old implementation of the compat handler failed to
    convert the structure on the way out, so the updated fields never got
    written back to user space. This is either not important, or it has
    never worked and should be fixed now.

    Reviewed-by: Ben Hutchings
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • A lot of block drivers need only a trivial .compat_ioctl callback.

    Add a helper function that can be set as the callback pointer
    to only convert the argument using the compat_ptr() conversion
    and otherwise assume all input and output data is compatible,
    or handled using in_compat_syscall() checks.

    This mirrors the compat_ptr_ioctl() helper function used in
    character devices.

    Reviewed-by: Ben Hutchings
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • In the v5.4 merge window, a cleanup patch from Al Viro conflicted
    with my rework of the compat handling for sg.c read(). Linus Torvalds
    did a correct merge but pointed out that the resulting code is still
    unsatisfactory.

    I later noticed that the sg_new_read() function still gets the compat
    mode wrong, when the 'count' argument is large enough to pass a
    compat_sg_io_hdr object, but not a nativ sg_io_hdr.

    To address both of these, move the definition of compat_sg_io_hdr
    into a scsi/sg.h to make it visible to sg.c and rewrite the logic
    for reading req_pack_id as well as the size check to a simpler
    version that gets the expected results.

    Fixes: c35a5cfb4150 ("scsi: sg: sg_read(): simplify reading ->pack_id of userland sg_io_hdr_t")
    Fixes: 98aaaec4a150 ("compat_ioctl: reimplement SG_IO handling")
    Reviewed-by: Ben Hutchings
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

30 Dec, 2019

1 commit

  • We ran into a problem with a mpt3sas based controller, where we would
    see random (and hard to reproduce) file corruption). The issue seemed
    specific to this controller, but wasn't specific to the file system.
    After a lot of debugging, we find out that it's caused by segments
    spanning a 4G memory boundary. This shouldn't happen, as the default
    setting for segment boundary masks is 4G.

    Turns out there are two issues in get_max_segment_size():

    1) The default segment boundary mask is bypassed

    2) The segment start address isn't taken into account when checking
    segment boundary limit

    Fix these two issues by removing the bypass of the segment boundary
    check even if the mask is set to the default value, and taking into
    account the actual start address of the request when checking if a
    segment needs splitting.

    Cc: stable@vger.kernel.org # v5.1+
    Reviewed-by: Chris Mason
    Tested-by: Chris Mason
    Fixes: dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count")
    Signed-off-by: Ming Lei

    Dropped const on the page pointer, ppc page_to_phys() doesn't mark the
    page as const...

    Signed-off-by: Jens Axboe

    Ming Lei
     

29 Dec, 2019

1 commit

  • Some filesystem, such as vfat, may send bio which crosses device boundary,
    and the worse thing is that the IO request starting within device boundaries
    can contain more than one segment past EOD.

    Commit dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
    tries to fix this issue by returning -EIO for this situation. However,
    this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
    may hang for ever.

    Also the current truncating on last segment is dangerous by updating the
    last bvec, given bvec table becomes not immutable any more, and fs bio
    users may not retrieve the truncated pages via bio_for_each_segment_all() in
    its .end_io callback.

    Fixes this issue by supporting multi-segment truncating. And the
    approach is simpler:

    - just update bio size since block layer can make correct bvec with
    the updated bio size. Then bvec table becomes really immutable.

    - zero all truncated segments for read bio

    Cc: Carlos Maiolino
    Cc: linux-fsdevel@vger.kernel.org
    Fixed-by: dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
    Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

21 Dec, 2019

7 commits

  • These were added to blkdev_ioctl() in linux-5.5 but not
    blkdev_compat_ioctl, so add them now.

    Cc: # v4.4+
    Fixes: bbd3e064362e ("block: add an API for Persistent Reservations")
    Signed-off-by: Arnd Bergmann

    Fold in followup patch from Arnd with missing pr.h header include.

    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • These were added to blkdev_ioctl() in linux-5.5 but not
    blkdev_compat_ioctl, so add them now.

    Fixes: e876df1fe0ad ("block: add zone open, close and finish ioctl support")
    Reviewed-by: Damien Le Moal
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • These were added to blkdev_ioctl() in v4.20 but not blkdev_compat_ioctl,
    so add them now.

    Cc: # v4.20+
    Fixes: 72cd87576d1d ("block: Introduce BLKGETZONESZ ioctl")
    Fixes: 65e4e3eee83d ("block: Introduce BLKGETNRZONES ioctl")
    Reviewed-by: Damien Le Moal
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • These were added to blkdev_ioctl() but not blkdev_compat_ioctl,
    so add them now.

    Cc: # v4.10+
    Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls")
    Reviewed-by: Damien Le Moal
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • When I doing fuzzy test, get the memleak report:

    BUG: memory leak
    unreferenced object 0xffff88837af80000 (size 4096):
    comm "memleak", pid 3557, jiffies 4294817681 (age 112.499s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    20 00 00 00 10 01 00 00 00 00 00 00 01 00 00 00 ...............
    backtrace:
    [] bio_alloc_bioset+0x393/0x590
    [] bio_copy_user_iov+0x300/0xcd0
    [] blk_rq_map_user_iov+0x2f1/0x5f0
    [] blk_rq_map_user+0xf2/0x160
    [] sg_common_write.isra.21+0x1094/0x1870
    [] sg_write.part.25+0x5d9/0x950
    [] sg_write+0x5f/0x8c
    [] __vfs_write+0x7c/0x100
    [] vfs_write+0x1c3/0x500
    [] ksys_write+0xf9/0x200
    [] do_syscall_64+0x9f/0x4f0
    [] entry_SYSCALL_64_after_hwframe+0x49/0xbe

    If __blk_rq_map_user_iov() is failed in blk_rq_map_user_iov(),
    the bio(s) which is allocated before this failing will leak. The
    refcount of the bio(s) is init to 1 and increased to 2 by calling
    bio_get(), but __blk_rq_unmap_user() only decrease it to 1, so
    the bio cannot be freed. Fix it by calling blk_rq_unmap_user().

    Reviewed-by: Bob Liu
    Reported-by: Hulk Robot
    Signed-off-by: Yang Yingliang
    Signed-off-by: Jens Axboe

    Yang Yingliang
     
  • Avoid that running test nvme/012 from the blktests suite triggers the
    following false positive lockdep complaint:

    ============================================
    WARNING: possible recursive locking detected
    5.0.0-rc3-xfstests-00015-g1236f7d60242 #841 Not tainted
    --------------------------------------------
    ksoftirqd/1/16 is trying to acquire lock:
    000000000282032e (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

    but task is already holding lock:
    00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(&(&fq->mq_flush_lock)->rlock);
    lock(&(&fq->mq_flush_lock)->rlock);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    1 lock held by ksoftirqd/1/16:
    #0: 00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

    stack backtrace:
    CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3-xfstests-00015-g1236f7d60242 #841
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    dump_stack+0x67/0x90
    __lock_acquire.cold.45+0x2b4/0x313
    lock_acquire+0x98/0x160
    _raw_spin_lock_irqsave+0x3b/0x80
    flush_end_io+0x4e/0x1d0
    blk_mq_complete_request+0x76/0x110
    nvmet_req_complete+0x15/0x110 [nvmet]
    nvmet_bio_done+0x27/0x50 [nvmet]
    blk_update_request+0xd7/0x2d0
    blk_mq_end_request+0x1a/0x100
    blk_flush_complete_seq+0xe5/0x350
    flush_end_io+0x12f/0x1d0
    blk_done_softirq+0x9f/0xd0
    __do_softirq+0xca/0x440
    run_ksoftirqd+0x24/0x50
    smpboot_thread_fn+0x113/0x1e0
    kthread+0x121/0x140
    ret_from_fork+0x3a/0x50

    Cc: Christoph Hellwig
    Cc: Ming Lei
    Cc: Hannes Reinecke
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • This patch fixes the following sparse warnings:

    block/bsg-lib.c:269:19: warning: incorrect type in initializer (different base types)
    block/bsg-lib.c:269:19: expected int sts
    block/bsg-lib.c:269:19: got restricted blk_status_t [usertype]
    block/bsg-lib.c:286:16: warning: incorrect type in return expression (different base types)
    block/bsg-lib.c:286:16: expected restricted blk_status_t
    block/bsg-lib.c:286:16: got int [assigned] sts

    Cc: Martin Wilck
    Fixes: d46fe2cb2dce ("block: drop device references in bsg_queue_rq()")
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

19 Dec, 2019

1 commit

  • Instead of using list_del_init() in a loop, that generates a lot of
    unnecessary memory read/writes, iterate from the first request of a
    batch and cut out a sublist with list_cut_before().

    Apart from removing the list node initialisation part, this is more
    register-friendly, and the assembly uses the stack less intensively.

    list_empty() at the beginning is done with hope, that the compiler can
    optimise out the same check in the following list_splice_init().

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe

    Pavel Begunkov