Eric Lee / smarc-fsl-linux-kernel

27 Oct, 2021

1 commit

e0c60d010 block: Fix partition check for host-aware zoned block devices ... Browse Code »

Commit a33df75c6328 ("block: use an xarray for disk->part_tbl") modified
the method to check partition existence in host-aware zoned block
devices from disk_has_partitions() helper function call to empty check
of xarray disk->part_tbl. However, disk->part_tbl always has single
entry for disk->part0 and never becomes empty. This resulted in the
host-aware zoned devices always judged to have partitions, and it made
the sysfs queue/zoned attribute to be "none" instead of "host-aware"
regardless of partition existence in the devices.

This also caused DEBUG_LOCKS_WARN_ON(lock->magic != lock) for
sdkp->rev_mutex in scsi layer when the kernel detects host-aware zoned
device. Since block layer handled the host-aware zoned devices as non-
zoned devices, scsi layer did not have chance to initialize the mutex
for zone revalidation. Therefore, the warning was triggered.

To fix the issues, call the helper function disk_has_partitions() in
place of disk->part_tbl empty check. Since the function was removed with
the commit a33df75c6328, reimplement it to walk through entries in the
xarray disk->part_tbl.

Fixes: a33df75c6328 ("block: use an xarray for disk->part_tbl")
Signed-off-by: Shin'ichiro Kawasaki
Cc: stable@vger.kernel.org # v5.14+
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Link: https://lore.kernel.org/r/20211026060115.753746-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe

Shin'ichiro Kawasaki
2021-10-27 20:58:01 +0800

24 Aug, 2021

1 commit

d152c682f block: add an explicit ->disk backpointer to the request_queue ... Browse Code »

Replace the magic lookup through the kobject tree with an explicit
backpointer, given that the device model links are set up and torn
down at times when I/O is still possible, leading to potential
NULL or invalid pointer dereferences.

Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk")
Reported-by: syzbot
Signed-off-by: Christoph Hellwig
Tested-by: Sven Schnelle
Link: https://lore.kernel.org/r/20210816134624.GA24234@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-08-24 02:54:31 +0800

10 Aug, 2021

2 commits

edb0872f4 block: move the bdi from the request_queue to the gendisk ... Browse Code »

The backing device information only makes sense for file system I/O,
and thus belongs into the gendisk and not the lower level request_queue
structure. Move it there.

Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Link: https://lore.kernel.org/r/20210809141744.1203023-5-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-08-10 01:53:23 +0800
471aa704d block: pass a gendisk to blk_queue_update_readahead ... Browse Code »

.. and rename the function to disk_update_readahead. This is in
preparation for moving the BDI from the request_queue to the gendisk.

Signed-off-by: Christoph Hellwig
Reviewed-by: Jan Kara
Reviewed-by: Johannes Thumshirn
Link: https://lore.kernel.org/r/20210809141744.1203023-3-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-08-10 01:52:28 +0800

03 Aug, 2021

1 commit

453b8ab69 block: simplify printing the device names disk_stack_limits ... Browse Code »

Printk ->disk_name directly for the disk and use the %pg format specifier
for the block device, which is equivalent to a bdevname call.

Signed-off-by: Christoph Hellwig
Reviewed-by: Martin K. Petersen
Link: https://lore.kernel.org/r/20210727062518.122108-5-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-08-03 03:37:28 +0800

10 May, 2021

1 commit

506c30790 Merge tag 'block-5.13-2021-05-09' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fix from Jens Axboe:
"Turns out the bio max size change still has issues, so let's get it
reverted for 5.13-rc1. We'll shake out the issues there and defer it
to 5.14 instead"

* tag 'block-5.13-2021-05-09' of git://git.kernel.dk/linux-block:
Revert "bio: limit bio max size"

Linus Torvalds
2021-05-10 04:25:14 +0800

09 May, 2021

1 commit

35c820e71 Revert "bio: limit bio max size" ... Browse Code »

This reverts commit cd2c7545ae1beac3b6aae033c7f31193b3255946.

Alex reports that the commit causes corruption with LUKS on ext4. Revert
it for now so that this can be investigated properly.

Link: https://lore.kernel.org/linux-block/1620493841.bxdq8r5haw.none@localhost/
Reported-by: Alex Xu (Hello71)
Signed-off-by: Jens Axboe

Jens Axboe
2021-05-09 11:49:48 +0800

08 May, 2021

1 commit

bd313968f Merge tag 'block-5.13-2021-05-07' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- dasd spelling fixes (Bhaskar)

- Limit bio max size on multi-page bvecs to the hardware limit, to
avoid overly large bio's (and hence latencies). Originally queued for
the merge window, but needed a fix and was dropped from the initial
pull (Changheun)

- NVMe pull request (Christoph):
- reset the bdev to ns head when failover (Daniel Wagner)
- remove unsupported command noise (Keith Busch)
- misc passthrough improvements (Kanchan Joshi)
- fix controller ioctl through ns_head (Minwoo Im)
- fix controller timeouts during reset (Tao Chiu)

- rnbd fixes/cleanups (Gioh, Md, Dima)

- Fix iov_iter re-expansion (yangerkun)

* tag 'block-5.13-2021-05-07' of git://git.kernel.dk/linux-block:
block: reexpand iov_iter after read/write
nvmet: remove unsupported command noise
nvme-multipath: reset bdev to ns head when failover
nvme-pci: fix controller reset hang when racing with nvme_timeout
nvme: move the fabrics queue ready check routines to core
nvme: avoid memset for passthrough requests
nvme: add nvme_get_ns helper
nvme: fix controller ioctl through ns_head
bio: limit bio max size
RDMA/rtrs: fix uninitialized symbol 'cnt'
s390: dasd: Mundane spelling fixes
block/rnbd: Remove all likely and unlikely
block/rnbd-clt: Check the return value of the function rtrs_clt_query
block/rnbd: Fix style issues
block/rnbd-clt: Change queue_depth type in rnbd_clt_session to size_t

Linus Torvalds
2021-05-08 02:35:12 +0800

07 May, 2021

1 commit

4ee60ec15 include: remove pagemap.h from blkdev.h ... Browse Code »

My UEK-derived config has 1030 files depending on pagemap.h before this
change. Afterwards, just 326 files need to be rebuilt when I touch
pagemap.h. I think blkdev.h is probably included too widely, but
untangling that dependency is harder and this solves my problem. x86
allmodconfig builds, but there may be implicit include problems on other
architectures.

Link: https://lkml.kernel.org/r/20210309195747.283796-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle)
Acked-by: Dan Williams [nvdimm]
Acked-by: Jens Axboe [block]
Reviewed-by: Christoph Hellwig
Acked-by: Coly Li [bcache]
Acked-by: Martin K. Petersen [scsi]
Reviewed-by: William Kucharski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2021-05-07 10:24:11 +0800

04 May, 2021

1 commit

cd2c7545a bio: limit bio max size ... Browse Code »

bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 32MB chunk read in user space -
all pages for 32MB would be merged to a bio structure if the pages
physical addresses are contiguous. it makes some delay to submit
until merge complete. bio max size should be limited to a proper size.

When 32MB chunk read with direct I/O option is coming from userspace,
kernel behavior is below now in do_direct_IO() loop. it's timeline.

| bio merge for 32MB. total 8,192 pages are merged.
| total elapsed time is over 2ms.
|------------------ ... ----------------------->|
| 8,192 pages merged a bio.
| at this time, first bio submit is done.
| 1 bio is split to 32 read request and issue.
|--------------->
|--------------->
|--------------->
......
|--------------->
|--------------->|
total 19ms elapsed to complete 32MB read done from device. |

If bio max size is limited with 1MB, behavior is changed below.

| bio merge for 1MB. 256 pages are merged for each bio.
| total 32 bio will be made.
| total elapsed time is over 2ms. it's same.
| but, first bio submit timing is fast. about 100us.
|--->|--->|--->|---> ... -->|--->|--->|--->|--->|
| 256 pages merged a bio.
| at this time, first bio submit is done.
| and 1 read request is issued for 1 bio.
|--------------->
|--------------->
|--------------->
......
|--------------->
|--------------->|
total 17ms elapsed to complete 32MB read done from device. |

As a result, read request issue timing is faster if bio max size is limited.
Current kernel behavior with multipage bvec, super large bio can be created.
And it lead to delay first I/O request issue.

Signed-off-by: Changheun Lee
Reviewed-by: Bart Van Assche
Link: https://lore.kernel.org/r/20210503095203.29076-1-nanich.lee@samsung.com
Signed-off-by: Jens Axboe

Changheun Lee
2021-05-04 01:00:11 +0800

06 Apr, 2021

2 commits

9bb33f24a block: refactor the bounce buffering code ... Browse Code »

Get rid of all the PFN arithmetics and just use an enum for the two
remaining options, and use PageHighMem for the actual bounce decision.

Add a fast path to entirely avoid the call for the common case of a queue
not using the legacy bouncing code.

Signed-off-by: Christoph Hellwig
Acked-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Link: https://lore.kernel.org/r/20210331073001.46776-8-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-04-06 23:28:17 +0800
ce288e053 block: remove BLK_BOUNCE_ISA support ... Browse Code »

Remove the BLK_BOUNCE_ISA support now that all users are gone.

Signed-off-by: Christoph Hellwig
Acked-by: Martin K. Petersen
Reviewed-by: Hannes Reinecke
Link: https://lore.kernel.org/r/20210331073001.46776-7-hch@lst.de
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-04-06 23:28:17 +0800

24 Feb, 2021

1 commit

97f433c36 blk-settings: align max_sectors on "logical_block_size" boundary ... Browse Code »

We get I/O errors when we run md-raid1 on the top of dm-integrity on the
top of ramdisk.
device-mapper: integrity: Bio not aligned on 8 sectors: 0xff00, 0xff
device-mapper: integrity: Bio not aligned on 8 sectors: 0xff00, 0xff
device-mapper: integrity: Bio not aligned on 8 sectors: 0xffff, 0x1
device-mapper: integrity: Bio not aligned on 8 sectors: 0xffff, 0x1
device-mapper: integrity: Bio not aligned on 8 sectors: 0x8048, 0xff
device-mapper: integrity: Bio not aligned on 8 sectors: 0x8147, 0xff
device-mapper: integrity: Bio not aligned on 8 sectors: 0x8246, 0xff
device-mapper: integrity: Bio not aligned on 8 sectors: 0x8345, 0xbb

The ramdisk device has logical_block_size 512 and max_sectors 255. The
dm-integrity device uses logical_block_size 4096 and it doesn't affect the
"max_sectors" value - thus, it inherits 255 from the ramdisk. So, we have
a device with max_sectors not aligned on logical_block_size.

The md-raid device sees that the underlying leg has max_sectors 255 and it
will split the bios on 255-sector boundary, making the bios unaligned on
logical_block_size.

In order to fix the bug, we round down max_sectors to logical_block_size.

Cc: stable@vger.kernel.org
Reviewed-by: Ming Lei
Signed-off-by: Mikulas Patocka
Signed-off-by: Jens Axboe

Mikulas Patocka
2021-02-24 10:25:36 +0800

10 Feb, 2021

2 commits

508aebb80 block: introduce blk_queue_clear_zone_settings() ... Browse Code »

Introduce the internal function blk_queue_clear_zone_settings() to
cleanup all limits and resources related to zoned block devices. This
new function is called from blk_queue_set_zoned() when a disk zoned
model is set to BLK_ZONED_NONE. This particular case can happens when a
partition is created on a host-aware scsi disk.

Signed-off-by: Damien Le Moal
Reviewed-by: Chaitanya Kulkarni
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Damien Le Moal
2021-02-10 22:44:41 +0800
a805a4fa4 block: introduce zone_write_granularity limit ... Browse Code »

Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that
all writes into sequential write required zones be aligned to the device
physical block size. However, NVMe ZNS does not have this constraint and
allows write operations into sequential zones to be aligned to the
device logical block size. This inconsistency does not help with
software portability across device types.

To solve this, introduce the zone_write_granularity queue limit to
indicate the alignment constraint, in bytes, of write operations into
zones of a zoned block device. This new limit is exported as a
read-only sysfs queue attribute and the helper
blk_queue_zone_write_granularity() introduced for drivers to set this
limit.

The function blk_queue_set_zoned() is modified to set this new limit to
the device logical block size by default. NVMe ZNS devices as well as
zoned nullb devices use this default value as is. The scsi disk driver
is modified to execute the blk_queue_zone_write_granularity() helper to
set the zone write granularity of host-managed SMR disks to the disk
physical block size.

The accessor functions queue_zone_write_granularity() and
bdev_zone_write_granularity() are also introduced.

Signed-off-by: Damien Le Moal
Reviewed-by: Christoph Hellwig
Reviewed-by: Martin K. Petersen
Reviewed-by: Chaitanya Kulkarni
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Damien Le Moal
2021-02-10 22:44:40 +0800

25 Jan, 2021

1 commit

a33df75c6 block: use an xarray for disk->part_tbl ... Browse Code »

Now that no fast path lookups in the partition table are left, there is
no point in micro-optimizing the data structure for it. Just use a bog
standard xarray.

Signed-off-by: Christoph Hellwig
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Christoph Hellwig
2021-01-25 09:17:20 +0800

17 Dec, 2020

1 commit

69f637c33 Merge tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"Nothing major in here:

- NVMe pull request from Christoph:
- nvmet passthrough improvements (Chaitanya Kulkarni)
- fcloop error injection support (James Smart)
- read-only support for zoned namespaces without Zone Append
(Javier González)
- improve some error message (Minwoo Im)
- reject I/O to offline fabrics namespaces (Victor Gladkov)
- PCI queue allocation cleanups (Niklas Schnelle)
- remove an unused allocation in nvmet (Amit Engel)
- a Kconfig spelling fix (Colin Ian King)
- nvme_req_qid simplication (Baolin Wang)

- MD pull request from Song:
- Fix race condition in md_ioctl() (Dae R. Jeong)
- Initialize read_slot properly for raid10 (Kevin Vigor)
- Code cleanup (Pankaj Gupta)
- md-cluster resync/reshape fix (Zhao Heming)

- Move null_blk into its own directory (Damien Le Moal)

- null_blk zone and discard improvements (Damien Le Moal)

- bcache race fix (Dongsheng Yang)

- Set of rnbd fixes/improvements (Gioh Kim, Guoqing Jiang, Jack Wang,
Lutz Pogrell, Md Haris Iqbal)

- lightnvm NULL pointer deref fix (tangzhenhao)

- sr in_interrupt() removal (Sebastian Andrzej Siewior)

- FC endpoint security support for s390/dasd (Jan Höppner, Sebastian
Ott, Vineeth Vijayan). From the s390 arch guys, arch bits included
as it made it easier for them to funnel the feature through the
block driver tree.

- Follow up fixes (Colin Ian King)"

* tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block: (64 commits)
block: drop dead assignments in loop_init()
sr: Remove in_interrupt() usage in sr_init_command().
sr: Switch the sector size back to 2048 if sr_read_sector() changed it.
cdrom: Reset sector_size back it is not 2048.
drivers/lightnvm: fix a null-ptr-deref bug in pblk-core.c
null_blk: Move driver into its own directory
null_blk: Allow controlling max_hw_sectors limit
null_blk: discard zones on reset
null_blk: cleanup discard handling
null_blk: Improve implicit zone close
null_blk: improve zone locking
block: Align max_hw_sectors to logical blocksize
null_blk: Fail zone append to conventional zones
null_blk: Fix zone size initialization
bcache: fix race between setting bdev state to none and new write request direct to backing
block/rnbd: fix a null pointer dereference on dev->blk_symlink_name
block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name
block/rnbd: call kobject_put in the failure path
Documentation/ABI/rnbd-srv: add document for force_close
block/rnbd-srv: close a mapped device from server side.
...

Linus Torvalds
2020-12-17 05:09:32 +0800

08 Dec, 2020

1 commit

817046ecd block: Align max_hw_sectors to logical blocksize ... Browse Code »

Block device drivers do not have to call blk_queue_max_hw_sectors() to
set a limit on request size if the default limit BLK_SAFE_MAX_SECTORS
is acceptable. However, this limit (255 sectors) may not be aligned
to the device logical block size which cannot be used as is for a
request maximum size. This is the case for the null_blk device driver.

Modify blk_queue_max_hw_sectors() to make sure that the request size
limits specified by the max_hw_sectors and max_sectors queue limits
are always aligned to the device logical block size. Additionally, to
avoid introducing a dependence on the execution order of this function
with blk_queue_logical_block_size(), also modify
blk_queue_logical_block_size() to perform the same alignment when the
logical block size is set after max_hw_sectors.

Signed-off-by: Damien Le Moal
Reviewed-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Jens Axboe

Damien Le Moal
2020-12-08 08:36:03 +0800

02 Dec, 2020

1 commit

7e7986f9d block: use gcd() to fix chunk_sectors limit stacking ... Browse Code »

commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
reflect the most limited of all devices in the IO stack.

Otherwise malformed IO may result. E.g.: prior to this fix,
->chunk_sectors = lcm_not_zero(8, 128) would result in
blk_max_size_offset() splitting IO at 128 sectors rather than the
required more restrictive 8 sectors.

And since commit 07d098e6bbad ("block: allow 'chunk_sectors' to be
non-power-of-2") care must be taken to properly stack chunk_sectors to
be compatible with the possibility that a non-power-of-2 chunk_sectors
may be stacked. This is why gcd() is used instead of reverting back
to using min_not_zero().

Fixes: 22ada802ede8 ("block: use lcm_not_zero() when stacking chunk_sectors")
Fixes: 07d098e6bbad ("block: allow 'chunk_sectors' to be non-power-of-2")
Reported-by: John Dorminy
Reported-by: Bruce Johnston
Signed-off-by: Mike Snitzer
Reviewed-by: John Dorminy
Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Mike Snitzer
2020-12-02 02:02:55 +0800

14 Oct, 2020

1 commit

3ad11d7ac Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:

- Series of merge handling cleanups (Baolin, Christoph)

- Series of blk-throttle fixes and cleanups (Baolin)

- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)

- Removal of bdget() as a generic API (Christoph)

- Removal of blkdev_get() as a generic API (Christoph)

- Cleanup of is-partition checks (Christoph)

- Series reworking disk revalidation (Christoph)

- Series cleaning up bio flags (Christoph)

- bio crypt fixes (Eric)

- IO stats inflight tweak (Gabriel)

- blk-mq tags fixes (Hannes)

- Buffer invalidation fixes (Jan)

- Allow soft limits for zone append (Johannes)

- Shared tag set improvements (John, Kashyap)

- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

- DM no-wait support (Mike, Konstantin)

- Request allocation improvements (Ming)

- Allow md/dm/bcache to use IO stat helpers (Song)

- Series improving blk-iocost (Tejun)

- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...

Linus Torvalds
2020-10-14 03:12:44 +0800

25 Sep, 2020

1 commit

c2e4cd57c block: lift setting the readahead size into the block layer ... Browse Code »

Drivers shouldn't really mess with the readahead size, as that is a VM
concept. Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk. Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors. To ensure the limits work well for stacking drivers a
new helper is added to update the readahead limits from the block
limits, which is also called from disk_stack_limits.

Signed-off-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Reviewed-by: Jan Kara
Reviewed-by: Mike Snitzer
Reviewed-by: Martin K. Petersen
Acked-by: Coly Li
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-09-25 03:43:39 +0800

24 Sep, 2020

2 commits

07d098e6b block: allow 'chunk_sectors' to be non-power-of-2 ... Browse Code »

It is possible, albeit more unlikely, for a block device to have a non
power-of-2 for chunk_sectors (e.g. 10+2 RAID6 with 128K chunk_sectors,
which results in a full-stripe size of 1280K. This causes the RAID6's
io_opt to be advertised as 1280K, and a stacked device _could_ then be
made to use a blocksize, aka chunk_sectors, that matches non power-of-2
io_opt of underlying RAID6 -- resulting in stacked device's
chunk_sectors being a non power-of-2).

Update blk_queue_chunk_sectors() and blk_max_size_offset() to
accommodate drivers that need a non power-of-2 chunk_sectors.

Reviewed-by: Ming Lei
Reviewed-by: Martin K. Petersen
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mike Snitzer
2020-09-24 00:38:14 +0800
22ada802e block: use lcm_not_zero() when stacking chunk_sectors ... Browse Code »

Like 'io_opt', blk_stack_limits() should stack 'chunk_sectors' using
lcm_not_zero() rather than min_not_zero() -- otherwise the final
'chunk_sectors' could result in sub-optimal alignment of IO to
component devices in the IO stack.

Also, if 'chunk_sectors' isn't a multiple of 'physical_block_size'
then it is a bug in the driver and the device should be flagged as
'misaligned'.

Reviewed-by: Ming Lei
Reviewed-by: Martin K. Petersen
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mike Snitzer
2020-09-24 00:38:12 +0800

16 Sep, 2020

1 commit

27ba3e8ff scsi: sd: sd_zbc: Fix handling of host-aware ZBC disks ... Browse Code »

When CONFIG_BLK_DEV_ZONED is disabled, allow using host-aware ZBC disks as
regular disks. In this case, ensure that command completion is correctly
executed by changing sd_zbc_complete() to return good_bytes instead of 0
and causing a hang during device probe (endless retries).

When CONFIG_BLK_DEV_ZONED is enabled and a host-aware disk is detected to
have partitions, it will be used as a regular disk. In this case, make sure
to not do anything in sd_zbc_revalidate_zones() as that triggers warnings.

Since all these different cases result in subtle settings of the disk queue
zoned model, introduce the block layer helper function
blk_queue_set_zoned() to generically implement setting up the effective
zoned model according to the disk type, the presence of partitions on the
disk and CONFIG_BLK_DEV_ZONED configuration.

Link: https://lore.kernel.org/r/20200915073347.832424-2-damien.lemoal@wdc.com
Fixes: b72053072c0b ("block: allow partitions on host aware zone devices")
Cc:
Reported-by: Borislav Petkov
Suggested-by: Christoph Hellwig
Reviewed-by: Christoph Hellwig
Reviewed-by: Johannes Thumshirn
Signed-off-by: Damien Le Moal
Signed-off-by: Martin K. Petersen

Damien Le Moal
2020-09-16 08:08:14 +0800

21 Jul, 2020

3 commits

b9b1a5d71 block: remove blk_queue_stack_limits ... Browse Code »

This function is just a tiny wrapper around blk_stack_limits. Open code
it int the two callers.

Reviewed-by: Johannes Thumshirn
Reviewed-by: Damien Le Moal
Tested-by: Damien Le Moal
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-21 05:38:52 +0800
9efa82ef2 block: remove bdev_stack_limits ... Browse Code »

This function is just a tiny wrapper around blk_stack_limit and has
two callers. Simplify the stack a bit by open coding it in the two
callers.

Reviewed-by: Johannes Thumshirn
Reviewed-by: Damien Le Moal
Tested-by: Damien Le Moal
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-21 05:38:52 +0800
3093a4797 block: inherit the zoned characteristics in blk_stack_limits ... Browse Code »

Lift the code from device mapper into blk_stack_limits to inherity
the stacking limitations. This ensures we do the right thing for
all stacked zoned block devices.

Reviewed-by: Johannes Thumshirn
Reviewed-by: Damien Le Moal
Tested-by: Damien Le Moal
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-21 05:38:52 +0800

13 May, 2020

1 commit

0512a75b9 block: Introduce REQ_OP_ZONE_APPEND ... Browse Code »

Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned
block device. This is a no-merge write operation.

A zone append write BIO must:
* Target a zoned block device
* Have a sector position indicating the start sector of the target zone
* The target zone must be a sequential write zone
* The BIO must not cross a zone boundary
* The BIO size must not be split to ensure that a single range of LBAs
is written with a single command.

Implement these checks in generic_make_request_checks() using the
helper function blk_check_zone_append(). To avoid write append BIO
splitting, introduce the new max_zone_append_sectors queue limit
attribute and ensure that a BIO size is always lower than this limit.
Export this new limit through sysfs and check these limits in bio_full().

Also when a LLDD can't dispatch a request to a specific zone, it
will return BLK_STS_ZONE_RESOURCE indicating this request needs to
be delayed, e.g. because the zone it will be dispatched to is still
write-locked. If this happens set the request aside in a local list
to continue trying dispatching requests such as READ requests or a
WRITE/ZONE_APPEND requests targetting other zones. This way we can
still keep a high queue depth without starving other requests even if
one request can't be served due to zone write-locking.

Finally, make sure that the bio sector position indicates the actual
write position as indicated by the device on completion.

Signed-off-by: Keith Busch
[ jth: added zone-append specific add_page and merge_page helpers ]
Signed-off-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Keith Busch
2020-05-13 10:36:28 +0800

23 Apr, 2020

1 commit

cc97923a5 block: move dma drain handling to scsi ... Browse Code »

Don't burden the common block code with with specifics of the libata DMA
draining mechanism. Instead move most of the code to the scsi midlayer.

That also means the nr_phys_segments adjustments in the blk-mq fast path
can go away entirely, given that SCSI never looks at nr_phys_segments
after mapping the request to a scatterlist.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-04-23 00:47:35 +0800

31 Mar, 2020

1 commit

159261483 Merge tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:

- floppy driver cleanup series from Willy

- NVMe updates and fixes (Various)

- null_blk trace improvements (Chaitanya)

- bcache fixes (Coly)

- md fixes (via Song)

- loop block size change optimizations (Martijn)

- scnprintf() use (Takashi)

* tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits)
null_blk: add trace in null_blk_zoned.c
null_blk: add tracepoint helpers for zoned mode
block: add a zone condition debug helper
nvme: cleanup namespace identifier reporting in nvme_init_ns_head
nvme: rename __nvme_find_ns_head to nvme_find_ns_head
nvme: refactor nvme_identify_ns_descs error handling
nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl
nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl
nvme: Fix controller creation races with teardown flow
nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl
nvme: Fix ctrl use-after-free during sysfs deletion
nvme-pci: Re-order nvme_pci_free_ctrl
nvme: Remove unused return code from nvme_delete_ctrl_sync
nvme: Use nvme_state_terminal helper
nvme: release ida resources
nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO
nvmet-tcp: optimize tcp stack TX when data digest is used
nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow
nvme-multipath: do not reset on unknown status
nvmet-rdma: allocate RW ctxs according to mdts
...

Linus Torvalds
2020-03-31 02:43:51 +0800

28 Mar, 2020

1 commit

3d745ea5b block: simplify queue allocation ... Browse Code »

Current make_request based drivers use either blk_alloc_queue_node or
blk_alloc_queue to allocate a queue, and then set up the make_request_fn
function pointer and a few parameters using the blk_queue_make_request
helper. Simplify this by passing the make_request pointer to
blk_alloc_queue, and while at it merge the _node variant into the main
helper by always passing a node_id, and remove the superfluous gfp_mask
parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-03-28 00:23:43 +0800

18 Mar, 2020

1 commit

e74d93e96 block: keep bdi->io_pages in sync with max_sectors_kb for stacked devices ... Browse Code »

Field bdi->io_pages added in commit 9491ae4aade6 ("mm: don't cap request
size based on read-ahead setting") removes unneeded split of read requests.

Stacked drivers do not call blk_queue_max_hw_sectors(). Instead they set
limits of their devices by blk_set_stacking_limits() + disk_stack_limits().
Field bio->io_pages stays zero until user set max_sectors_kb via sysfs.

This patch updates io_pages after merging limits in disk_stack_limits().

Commit c6d6e9b0f6b4 ("dm: do not allow readahead to limit IO size") fixed
the same problem for device-mapper devices, this one fixes MD RAIDs.

Fixes: 9491ae4aade6 ("mm: don't cap request size based on read-ahead setting")
Reviewed-by: Paul Menzel
Reviewed-by: Bob Liu
Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Song Liu

Konstantin Khlebnikov
2020-03-18 01:53:07 +0800

16 Jan, 2020

1 commit

ad6bf88a6 block: fix an integer overflow in logical block size ... Browse Code »

Logical block size has type unsigned short. That means that it can be at
most 32768. However, there are architectures that can run with 64k pages
(for example arm64) and on these architectures, it may be possible to
create block devices with 64k block size.

For exmaple (run this on an architecture with 64k pages):

Mount will fail with this error because it tries to read the superblock using 2-sector
access:
device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
EXT4-fs (dm-0): unable to read superblock

This patch changes the logical block size from unsigned short to unsigned
int to avoid the overflow.

Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen
Reviewed-by: Ming Lei
Signed-off-by: Mikulas Patocka
Signed-off-by: Jens Axboe

Mikulas Patocka
2020-01-16 12:43:09 +0800

20 Sep, 2019

1 commit

671df1895 Merge tag 'dma-mapping-5.4' of git://git.infradead.org/users/hch/dma-mapping ... Browse Code »

Pull dma-mapping updates from Christoph Hellwig:

- add dma-mapping and block layer helpers to take care of IOMMU merging
for mmc plus subsequent fixups (Yoshihiro Shimoda)

- rework handling of the pgprot bits for remapping (me)

- take care of the dma direct infrastructure for swiotlb-xen (me)

- improve the dma noncoherent remapping infrastructure (me)

- better defaults for ->mmap, ->get_sgtable and ->get_required_mask
(me)

- cleanup mmaping of coherent DMA allocations (me)

- various misc cleanups (Andy Shevchenko, me)

* tag 'dma-mapping-5.4' of git://git.infradead.org/users/hch/dma-mapping: (41 commits)
mmc: renesas_sdhi_internal_dmac: Add MMC_CAP2_MERGE_CAPABLE
mmc: queue: Fix bigger segments usage
arm64: use asm-generic/dma-mapping.h
swiotlb-xen: merge xen_unmap_single into xen_swiotlb_unmap_page
swiotlb-xen: simplify cache maintainance
swiotlb-xen: use the same foreign page check everywhere
swiotlb-xen: remove xen_swiotlb_dma_mmap and xen_swiotlb_dma_get_sgtable
xen: remove the exports for xen_{create,destroy}_contiguous_region
xen/arm: remove xen_dma_ops
xen/arm: simplify dma_cache_maint
xen/arm: use dev_is_dma_coherent
xen/arm: consolidate page-coherent.h
xen/arm: use dma-noncoherent.h calls for xen-swiotlb cache maintainance
arm: remove wrappers for the generic dma remap helpers
dma-mapping: introduce a dma_common_find_pages helper
dma-mapping: always use VM_DMA_COHERENT for generic DMA remap
vmalloc: lift the arm flag for coherent mappings to common code
dma-mapping: provide a better default ->get_required_mask
dma-mapping: remove the dma_declare_coherent_memory export
remoteproc: don't allow modular build
...

Linus Torvalds
2019-09-20 04:27:23 +0800

06 Sep, 2019

1 commit

68c43f133 block: Introduce elevator features ... Browse Code »

Introduce the definition of elevator features through the
elevator_features flags in the elevator_type structure. Each flag can
represent a feature supported by an elevator. The first feature defined
by this patch is support for zoned block device sequential write
constraint with the flag ELEVATOR_F_ZBD_SEQ_WRITE, which is implemented
by the mq-deadline elevator using zone write locking.

Other possible features are IO priorities, write hints, latency targets
or single-LUN dual-actuator disks (for which the elevator could maintain
one LBA ordered list per actuator).

The required_elevator_features field is also added to the request_queue
structure to allow a device driver to specify elevator feature flags
that an elevator must support for the correct operation of the device
(e.g. device drivers for zoned block devices can have the
ELEVATOR_F_ZBD_SEQ_WRITE flag as a required feature).
The helper function blk_queue_required_elevator_features() is
defined for setting this new field.

With these two new fields in place, the elevator functions
elevator_match() and elevator_find() are modified to allow a user to set
only an elevator with a set of features that satisfies the device
required features. Elevators not matching the device requirements are
not shown in the device sysfs queue/scheduler file to prevent their use.

The "none" elevator can always be selected as before.

Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Reviewed-by: Ming Lei
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe

Damien Le Moal
2019-09-06 09:52:33 +0800

03 Sep, 2019

1 commit

45147fb52 block: add a helper function to merge the segments ... Browse Code »

This patch adds a helper function whether a queue can merge
the segments by the DMA MAP layer (e.g. via IOMMU).

Signed-off-by: Yoshihiro Shimoda
Reviewed-by: Christoph Hellwig
Reviewed-by: Simon Horman

Yoshihiro Shimoda
2019-09-03 14:32:50 +0800

29 Aug, 2019

1 commit

9677a3e01 block/rq_qos: implement rq_qos_ops->queue_depth_changed() ... Browse Code »

wbt already gets queue depth changed notification through
wbt_set_queue_depth(). Generalize it into
rq_qos_ops->queue_depth_changed() so that other rq_qos policies can
easily hook into the events too.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2019-08-29 11:17:07 +0800

27 Jul, 2019

1 commit

c6c84f78e block: fix max segment size handling in blk_queue_virt_boundary ... Browse Code »

We should only set the max segment size to unlimited if we actually
have a virt boundary. Otherwise we accidentally clear that limit
when called from the SCSI midlayer, which always calls
blk_queue_virt_boundary, even if that mask is 0.

Fixes: 7ad388d8e4c7 ("scsi: core: add a host / host template field for the virt boundary")
Reported-by: Guenter Roeck
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-07-27 02:50:57 +0800

24 May, 2019

1 commit

09324d32d block: force an unlimited segment size on queues with a virt boundary ... Browse Code »

We currently fail to update the front/back segment size in the bio when
deciding to allow an otherwise gappy segement to a device with a
virt boundary. The reason why this did not cause problems is that
devices with a virt boundary fundamentally don't use segments as we
know it and thus don't care. Make that assumption formal by forcing
an unlimited segement size in this case.

Fixes: f6970f83ef79 ("block: don't check if adjacent bvecs in one bio can be mergeable")
Signed-off-by: Christoph Hellwig
Reviewed-by: Ming Lei
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-05-24 00:25:26 +0800

01 May, 2019

1 commit

12adb7a01 block: remove the unused blk_queue_dma_pad function ... Browse Code »

Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-05-01 06:12:36 +0800