Eric Lee / smarc-fsl-linux-kernel

13 Nov, 2020

1 commit

c01a21b77 loop: Fix occasional uevent drop ... Browse Code »

Commit 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify")
causes an occasional drop of loop device uevent, which are no longer
triggered in loop_set_size() but in a different part of code.

Bug is reproducible with LTP test uevent01 [1]:

i=0; while true; do
i=$((i+1)); echo "== $i =="
lsmod |grep -q loop && rmmod -f loop
./uevent01 || break
done

Put back triggering through code called in loop_set_size().

Fix required to add yet another parameter to
set_capacity_revalidate_and_notify().

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/uevents/uevent01.c

[hch: rebased on a different change to the prototype of
set_capacity_revalidate_and_notify]

Cc: stable@vger.kernel.org # v5.9
Fixes: 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify")
Reported-by:
Signed-off-by: Petr Vorel
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Petr Vorel
2020-11-13 04:59:04 +0800

02 Sep, 2020

1 commit

611bee526 block: replace bd_set_size with bd_set_nr_sectors ... Browse Code »

Replace bd_set_size with a version that takes the number of sectors
instead, as that fits most of the current and future callers much better.

Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Sagi Grimberg
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-09-02 06:49:25 +0800

29 Aug, 2020

1 commit

4d41ead6e Merge tag 'block-5.9-2020-08-28' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- nbd timeout fix (Hou)

- device size fix for loop LOOP_CONFIGURE (Martijn)

- MD pull from Song with raid5 stripe size fix (Yufen)

* tag 'block-5.9-2020-08-28' of git://git.kernel.dk/linux-block:
md/raid5: make sure stripe_size as power of two
loop: Set correct device size when using LOOP_CONFIGURE
nbd: restore default timeout when setting it to zero

Linus Torvalds
2020-08-29 07:38:29 +0800

26 Aug, 2020

1 commit

79e5dc59e loop: Set correct device size when using LOOP_CONFIGURE ... Browse Code »

The device size calculation was done before processing the loop
configuration, which meant that the we set the size on the underlying
block device incorrectly in case lo_offset/lo_sizelimit were set in the
configuration. Delay computing the size until we've setup the device
parameters correctly.

Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl")
Reported-by: Lennart Poettering
Tested-by: Yang Xu
Signed-off-by: Martijn Coenen
Signed-off-by: Jens Axboe

Martijn Coenen
2020-08-26 23:30:31 +0800

25 Aug, 2020

1 commit

c41c3ec4a Merge tag 'io_uring-5.9-2020-08-23' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- NVMe pull request from Sagi:
- nvme completion rework from Christoph and Chao that mostly came
from a bit of divergence of how we classify errors related to
pathing/retry etc.
- nvmet passthru fixes from Chaitanya
- minor nvmet fixes from Amit and I
- mpath round-robin path selection fix from Martin
- ignore noiob for zoned devices from Keith
- minor nvme-fc fix from Tianjia"

- BFQ cgroup leak fix (Dmitry)

- block layer MAINTAINERS addition (Geert)

- fix null_blk FUA checking (Hou)

- get_max_io_size() size fix (Keith)

- fix block page_is_mergeable() for compound pages (Matthew)

- discard granularity fixes (Ming)

- IO scheduler ordering fix (Ming)

- misc fixes

* tag 'io_uring-5.9-2020-08-23' of git://git.kernel.dk/linux-block: (31 commits)
null_blk: fix passing of REQ_FUA flag in null_handle_rq
nvmet: Disable keep-alive timer when kato is cleared to 0h
nvme: redirect commands on dying queue
nvme: just check the status code type in nvme_is_path_error
nvme: refactor command completion
nvme: rename and document nvme_end_request
nvme: skip noiob for zoned devices
nvme-pci: fix PRP pool size
nvme-pci: Use u32 for nvme_dev.q_depth and nvme_queue.q_depth
nvme: Use spin_lock_irq() when taking the ctrl->lock
nvmet: call blk_mq_free_request() directly
nvmet: fix oops in pt cmd execution
nvmet: add ns tear down label for pt-cmd handling
nvme: multipath: round-robin: eliminate "fallback" variable
nvme: multipath: round-robin: fix single non-optimized path case
nvme-fc: Fix wrong return value in __nvme_fc_init_request()
nvmet-passthru: Reject commands with non-sgl flags set
nvmet: fix a memory leak
blkcg: fix memleak for iolatency
MAINTAINERS: Add missing header files to BLOCK LAYER section
...

Linus Torvalds
2020-08-25 02:53:15 +0800

24 Aug, 2020

1 commit

df561f668 treewide: Use fallthrough pseudo-keyword ... Browse Code »

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva

Gustavo A. R. Silva
2020-08-24 06:36:59 +0800

17 Aug, 2020

1 commit

bcb21c8cc block: loop: set discard granularity and alignment for block device backed loop ... Browse Code »

In case of block device backend, if the backend supports write zeros, the
loop device will set queue flag of QUEUE_FLAG_DISCARD. However,
limits.discard_granularity isn't setup, and this way is wrong,
see the following description in Documentation/ABI/testing/sysfs-block:

A discard_granularity of 0 means that the device does not support
discard functionality.

Especially 9b15d109a6b2 ("block: improve discard bio alignment in
__blkdev_issue_discard()") starts to take q->limits.discard_granularity
for computing max discard sectors. And zero discard granularity may cause
kernel oops, or fail discard request even though the loop queue claims
discard support via QUEUE_FLAG_DISCARD.

Fix the issue by setup discard granularity and alignment.

Fixes: c52abf563049 ("loop: Better discard support for block devices")
Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Acked-by: Coly Li
Cc: Hannes Reinecke
Cc: Xiao Ni
Cc: Martin K. Petersen
Cc: Evan Green
Cc: Gwendal Grignou
Cc: Chaitanya Kulkarni
Cc: Andrzej Pietrasiewicz
Cc: Christoph Hellwig
Cc:
Signed-off-by: Jens Axboe

Ming Lei
2020-08-17 21:58:36 +0800

11 Aug, 2020

1 commit

fe6a8fc5e loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE ... Browse Code »

When LOOP_CONFIGURE is used with LO_FLAGS_PARTSCAN we need to propagate
this into the GENHD_FL_NO_PART_SCAN. LOOP_SETSTATUS does this,
LOOP_CONFIGURE doesn't so far. Effect is that setting up a loopback
device with partition scanning doesn't actually work when LOOP_CONFIGURE
is issued, though it works fine with LOOP_SETSTATUS.

Let's correct that and propagate the flag in LOOP_CONFIGURE too.

Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl")

Signed-off-by: Lennart Poettering
Acked-by: Martijn Coenen
Signed-off-by: Jens Axboe

Lennart Poettering
2020-08-11 21:13:06 +0800

16 Jul, 2020

1 commit

ecbe6bc00 block: use bd_prepare_to_claim directly in the loop driver ... Browse Code »

The arcane magic in bd_start_claiming is only needed to be able to claim
a block_device that hasn't been fully set up. Switch the loop driver
that claims from the ioctl path with a fully set up struct block_device
to just use the much simpler bd_prepare_to_claim directly.

Signed-off-by: Christoph Hellwig
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-16 23:35:44 +0800

24 Jun, 2020

2 commits

200f93377 loop: be paranoid on exit and prevent new additions / removals ... Browse Code »

Be pedantic on removal as well and hold the mutex.
This should prevent uses of addition while we exit.

Signed-off-by: Luis Chamberlain
Reviewed-by: Ming Lei
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Luis Chamberlain
2020-06-24 23:15:58 +0800
15f73f5b3 blk-mq: move failure injection out of blk_mq_complete_request ... Browse Code »

Move the call to blk_should_fake_timeout out of blk_mq_complete_request
and into the drivers, skipping call sites that are obvious error
handlers, and remove the now superflous blk_mq_force_complete_rq helper.
This ensures we don't keep injecting errors into completions that just
terminate the Linux request after the hardware has been reset or the
command has been aborted.

Reviewed-by: Daniel Wagner
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-06-24 23:15:57 +0800

18 Jun, 2020

1 commit

f4bd34b13 loop: replace kill_bdev with invalidate_bdev ... Browse Code »

When a filesystem is mounted on a loop device and on a loop ioctl
LOOP_SET_STATUS64, because of kill_bdev, buffer_head mappings are getting
destroyed.
kill_bdev
truncate_inode_pages
truncate_inode_pages_range
do_invalidatepage
block_invalidatepage
discard_buffer -->clear BH_Mapped flag

sb_bread
__bread_gfp
bh = __getblk_gfp
-->discard_buffer clear BH_Mapped flag
__bread_slow
submit_bh
submit_bh_wbc
BUG_ON(!buffer_mapped(bh)) --> hit this BUG_ON

Fixes: 5db470e229e2 ("loop: drop caches if offset or block_size are changed")
Signed-off-by: Zheng Bin
Reviewed-by: Christoph Hellwig
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe

Zheng Bin
2020-06-18 23:24:35 +0800

12 Jun, 2020

1 commit

a58dfea29 Merge tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"Some followup fixes for this merge window. In particular:

- Seqcount write missing preemption disable for stats (Ahmed)

- blktrace fixes (Chaitanya)

- Redundant initializations (Colin)

- Various small NVMe fixes (Chaitanya, Christoph, Daniel, Max,
Niklas, Rikard)

- loop flag bug regression fix (Martijn)

- blk-mq tagging fixes (Christoph, Ming)"

* tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-block:
umem: remove redundant initialization of variable ret
pktcdvd: remove redundant initialization of variable ret
nvmet: fail outstanding host posted AEN req
nvme-pci: use simple suspend when a HMB is enabled
nvme-fc: don't call nvme_cleanup_cmd() for AENs
nvmet-tcp: constify nvmet_tcp_ops
nvme-tcp: constify nvme_tcp_mq_ops and nvme_tcp_admin_mq_ops
nvme: do not call del_gendisk() on a disk that was never added
blk-mq: fix blk_mq_all_tag_iter
blk-mq: split out a __blk_mq_get_driver_tag helper
blktrace: fix endianness for blk_log_remap()
blktrace: fix endianness in get_pdu_int()
blktrace: use errno instead of bi_status
block: nr_sects_write(): Disable preemption on seqcount write
block: remove the error argument to the block_bio_complete tracepoint
loop: Fix wrong masking of status flags
block/bio-integrity: don't free 'buf' if bio_integrity_add_page() failed

Linus Torvalds
2020-06-12 07:07:33 +0800

05 Jun, 2020

1 commit

6ac92fb5c loop: Fix wrong masking of status flags ... Browse Code »

In faf1d25440d6, loop_set_status() now assigns lo_status directly from
the passed in lo_flags, but then fixes it up by masking out flags that
can't be set by LOOP_SET_STATUS; unfortunately the mask was negated.

Re-ran all ltp ioctl_loop tests, and they all passed.

Pass run of the previously failing one:

tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s
tst_device.c:88: INFO: Found free device 0 '/dev/loop0'
ioctl_loop01.c:49: PASS: /sys/block/loop0/loop/partscan = 0
ioctl_loop01.c:50: PASS: /sys/block/loop0/loop/autoclear = 0
ioctl_loop01.c:51: PASS: /sys/block/loop0/loop/backing_file =
'/tmp/ZRJ6H4/test.img'
ioctl_loop01.c:65: PASS: get expected lo_flag 12
ioctl_loop01.c:67: PASS: /sys/block/loop0/loop/partscan = 1
ioctl_loop01.c:68: PASS: /sys/block/loop0/loop/autoclear = 1
ioctl_loop01.c:77: PASS: access /dev/loop0p1 succeeds
ioctl_loop01.c:83: PASS: access /sys/block/loop0/loop0p1 succeeds

Summary:
passed 8
failed 0
skipped 0
warnings 0

Fixes: faf1d25440d6 ("loop: Clean up LOOP_SET_STATUS lo_flags handling")
Reported-by: Naresh Kamboju
Signed-off-by: Martijn Coenen
Tested-by: Naresh Kamboju
Signed-off-by: Jens Axboe

Martijn Coenen
2020-06-05 11:13:45 +0800

03 Jun, 2020

4 commits

96ed320d5 Merge tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull DAX updates part one from Darrick Wong:
"After many years of LKML-wrangling about how to enable programs to
query and influence the file data access mode (DAX) when a filesystem
resides on storage devices such as persistent memory, Ira Weiny has
emerged with a proposed set of standard behaviors that has not been
shot down by anyone! We're more or less standardizing on the current
XFS behavior and adapting ext4 to do the same.

This is the first of a handful pull requests that will make ext4 and
XFS present a consistent interface for user programs that care about
DAX. We add a statx attribute that programs can check to see if DAX is
enabled on a particular file. Then, we update the DAX documentation to
spell out the user-visible behaviors that filesystems will guarantee
(until the next storage industry shakeup). The on-disk inode flag has
been in XFS for a few years now.

Summary:

- Clean up io_is_direct.

- Add a new statx flag to indicate when file data access is being
done via DAX (as opposed to the page cache).

- Update the documentation for how system administrators and
application programmers can take advantage of the (still
experimental DAX) feature"

Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/

* tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
Documentation/dax: Update Usage section
fs/stat: Define DAX statx attribute
fs: Remove unneeded IS_DAX() check in io_is_direct()

Linus Torvalds
2020-06-03 10:45:12 +0800
bce159d73 Merge tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"On top of the core changes, here are the block driver changes for this
merge window:

- NVMe changes:
- NVMe over Fibre Channel protocol updates, which also reach
over to drivers/scsi/lpfc (James Smart)
- namespace revalidation support on the target (Anthony
Iliopoulos)
- gcc zero length array fix (Arnd Bergmann)
- nvmet cleanups (Chaitanya Kulkarni)
- misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
- use a SRQ per completion vector (Max Gurtovoy)
- fix handling of runtime changes to the queue count (Weiping
Zhang)
- t10 protection information support for nvme-rdma and
nvmet-rdma (Israel Rukshin and Max Gurtovoy)
- target side AEN improvements (Chaitanya Kulkarni)
- various fixes and minor improvements all over, icluding the
nvme part of the lpfc driver"

- Floppy code cleanup series (Willy, Denis)

- Floppy contention fix (Jiri)

- Loop CONFIGURE support (Martijn)

- bcache fixes/improvements (Coly, Joe, Colin)

- q->queuedata cleanups (Christoph)

- Get rid of ioctl_by_bdev (Christoph, Stefan)

- md/raid5 allocation fixes (Coly)

- zero length array fixes (Gustavo)

- swim3 task state fix (Xu)"

* tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
bcache: configure the asynchronous registertion to be experimental
bcache: asynchronous devices registration
bcache: fix refcount underflow in bcache_device_free()
bcache: Convert pr_ uses to a more typical style
bcache: remove redundant variables i and n
lpfc: Fix return value in __lpfc_nvme_ls_abort
lpfc: fix axchg pointer reference after free and double frees
lpfc: Fix pointer checks and comments in LS receive refactoring
nvme: set dma alignment to qword
nvmet: cleanups the loop in nvmet_async_events_process
nvmet: fix memory leak when removing namespaces and controllers concurrently
nvmet-rdma: add metadata/T10-PI support
nvmet: add metadata support for block devices
nvmet: add metadata/T10-PI support
nvme: add Metadata Capabilities enumerations
nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
nvmet: rename nvmet_rw_len to nvmet_rw_data_len
nvmet: add metadata characteristics for a namespace
nvme-rdma: add metadata/T10-PI support
nvme-rdma: introduce nvme_rdma_sgl structure
...

Linus Torvalds
2020-06-03 06:37:03 +0800
750a02ab8 Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"Core block changes that have been queued up for this release:

- Remove dead blk-throttle and blk-wbt code (Guoqing)

- Include pid in blktrace note traces (Jan)

- Don't spew I/O errors on wouldblock termination (me)

- Zone append addition (Johannes, Keith, Damien)

- IO accounting improvements (Konstantin, Christoph)

- blk-mq hardware map update improvements (Ming)

- Scheduler dispatch improvement (Salman)

- Inline block encryption support (Satya)

- Request map fixes and improvements (Weiping)

- blk-iocost tweaks (Tejun)

- Fix for timeout failing with error injection (Keith)

- Queue re-run fixes (Douglas)

- CPU hotplug improvements (Christoph)

- Queue entry/exit improvements (Christoph)

- Move DMA drain handling to the few drivers that use it (Christoph)

- Partition handling cleanups (Christoph)"

* tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits)
block: mark bio_wouldblock_error() bio with BIO_QUIET
blk-wbt: rename __wbt_update_limits to wbt_update_limits
blk-wbt: remove wbt_update_limits
blk-throttle: remove tg_drain_bios
blk-throttle: remove blk_throtl_drain
null_blk: force complete for timeout request
blk-mq: drain I/O when all CPUs in a hctx are offline
blk-mq: add blk_mq_all_tag_iter
blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx
blk-mq: use BLK_MQ_NO_TAG in more places
blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG
blk-mq: move more request initialization to blk_mq_rq_ctx_init
blk-mq: simplify the blk_mq_get_request calling convention
blk-mq: remove the bio argument to ->prepare_request
nvme: force complete cancelled requests
blk-mq: blk-mq: provide forced completion method
block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds
block: blk-crypto-fallback: remove redundant initialization of variable err
block: reduce part_stat_lock() scope
block: use __this_cpu_add() instead of access by smp_processor_id()
...

Linus Torvalds
2020-06-03 06:29:19 +0800
a37b0715d mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE ... Browse Code »

PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the
loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a
daemon needs to write to one bdi (the final bdi) in order to free up
writes queued to another bdi (the client bdi).

The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
pages, so that it can still dirty pages after other processses have been
throttled. The purpose of this is to avoid deadlock that happen when
the PF_LESS_THROTTLE process must write for any dirty pages to be freed,
but it is being thottled and cannot write.

This approach was designed when all threads were blocked equally,
independently on which device they were writing to, or how fast it was.
Since that time the writeback algorithm has changed substantially with
different threads getting different allowances based on non-trivial
heuristics. This means the simple "add 25%" heuristic is no longer
reliable.

The important issue is not that the daemon needs a *larger* dirty page
allowance, but that it needs a *private* dirty page allowance, so that
dirty pages for the "client" bdi that it is helping to clear (the bdi
for an NFS filesystem or loop block device etc) do not affect the
throttling of the daemon writing to the "final" bdi.

This patch changes the heuristic so that the task is not throttled when
the bdi it is writing to has a dirty page count below below (or equal
to) the free-run threshold for that bdi. This ensures it will always be
able to have some pages in flight, and so will not deadlock.

In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might
still be throttled by global threshold, but that is acceptable as it is
only the deadlock state that is interesting for this flag.

This approach of "only throttle when target bdi is busy" is consistent
with the other use of PF_LESS_THROTTLE in current_may_throttle(), were
it causes attention to be focussed only on the target bdi.

So this patch
- renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
- removes the 25% bonus that that flag gives, and
- If PF_LOCAL_THROTTLE is set, don't delay at all unless the
global and the local free-run thresholds are exceeded.

Note that previously realtime threads were treated the same as
PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour
for real-time threads, so it is now different from the behaviour of nfsd
and loop tasks. I don't know what is wanted for realtime.

[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: NeilBrown
Signed-off-by: Andrew Morton
Reviewed-by: Jan Kara
Acked-by: Chuck Lever [nfsd]
Cc: Christoph Hellwig
Cc: Michal Hocko
Cc: Trond Myklebust
Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds

NeilBrown
2020-06-03 01:59:08 +0800

30 May, 2020

1 commit

bf0beec06 blk-mq: drain I/O when all CPUs in a hctx are offline ... Browse Code »

Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
up queue mapping. Thomas mentioned the following point[1]:

"That was the constraint of managed interrupts from the very beginning:

The driver/subsystem has to quiesce the interrupt line and the associated
queue _before_ it gets shutdown in CPU unplug and not fiddle with it
until it's restarted by the core when the CPU is plugged in again."

However, current blk-mq implementation doesn't quiesce hw queue before
the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a
cpuhp state handled after the CPU is down, so there isn't any chance to
quiesce the hctx before shutting down the CPU.

Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs
where the last CPU goes away, and wait for completion of in-flight
requests. This guarantees that there is no inflight I/O before shutting
down the managed IRQ.

Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need
to wait for completion of in-flight requests from these drivers to avoid
a potential dead-lock. It is safe to do this for stacking drivers as those
do not use interrupts at all and their I/O completions are triggered by
underlying devices I/O completion.

[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/

[hch: different retry mechanism, merged two patches, minor cleanups]

Signed-off-by: Ming Lei
Signed-off-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Reviewed-by: Daniel Wagner
Signed-off-by: Jens Axboe

Ming Lei
2020-05-30 00:23:25 +0800

25 May, 2020

1 commit

d29b92f57 loop: remove redundant assignment to variable error ... Browse Code »

The variable error is being assigned a value that is never
read so the assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King
Signed-off-by: Jens Axboe

Colin Ian King
2020-05-25 01:13:36 +0800

21 May, 2020

11 commits

3448914e8 loop: Add LOOP_CONFIGURE ioctl ... Browse Code »

This allows userspace to completely setup a loop device with a single
ioctl, removing the in-between state where the device can be partially
configured - eg the loop device has a backing file associated with it,
but is reading from the wrong offset.

Besides removing the intermediate state, another big benefit of this
ioctl is that LOOP_SET_STATUS can be slow; the main reason for this
slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to
freeze the associated queue; this requires waiting for RCU
synchronization, which I've measured can take about 15-20ms on this
device on average.

In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can
also be used to:
- Set the correct block size immediately by setting
loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE)
- Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO
in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO)
- Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY
in loop_config.info.lo_flags

Here's setting up ~70 regular loop devices with an offset on an x86
Android device, using LOOP_SET_FD and LOOP_SET_STATUS:

vsoc_x86:/system/apex # time for i in `seq 30 100`;
do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
0m03.40s real 0m00.02s user 0m00.03s system

Here's configuring ~70 devices in the same way, but using a modified
losetup that uses the new LOOP_CONFIGURE ioctl:

vsoc_x86:/system/apex # time for i in `seq 30 100`;
do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
0m01.94s real 0m00.01s user 0m00.01s system

Signed-off-by: Martijn Coenen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Martijn Coenen
2020-05-21 22:20:35 +0800
faf1d2544 loop: Clean up LOOP_SET_STATUS lo_flags handling ... Browse Code »

LOOP_SET_STATUS(64) will actually allow some lo_flags to be modified; in
particular, LO_FLAGS_AUTOCLEAR can be set and cleared, whereas
LO_FLAGS_PARTSCAN can be set to request a partition scan. Make this
explicit by updating the UAPI to include the flags that can be
set/cleared using this ioctl.

The implementation can then blindly take over the passed in flags,
and use the previous flags for those flags that can't be set / cleared
using LOOP_SET_STATUS.

Signed-off-by: Martijn Coenen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Martijn Coenen
2020-05-21 22:20:35 +0800
571fae6e2 loop: Rework lo_ioctl() __user argument casting ... Browse Code »

In preparation for a new ioctl that needs to copy_from_user(); makes the
code easier to read as well.

Signed-off-by: Martijn Coenen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Martijn Coenen
2020-05-21 22:20:35 +0800
62ab466ca loop: Move loop_set_status_from_info() and friends up ... Browse Code »

So we can use it without forward declaration. This is a separate commit
to make it easier to verify that this is just a move, without functional
modifications.

Signed-off-by: Martijn Coenen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Martijn Coenen
2020-05-21 22:20:35 +0800
0c3796c24 loop: Factor out configuring loop from status ... Browse Code »

Factor out this code into a separate function, so it can be reused by
other code more easily.

Signed-off-by: Martijn Coenen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Martijn Coenen
2020-05-21 22:20:34 +0800
0a6ed1b5f loop: Remove figure_loop_size() ... Browse Code »

This function was now only used by loop_set_capacity(). Just open code
the remaining code in the caller instead.

Signed-off-by: Martijn Coenen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Martijn Coenen
2020-05-21 22:20:34 +0800
b0bd158dd loop: Refactor loop_set_status() size calculation ... Browse Code »

figure_loop_size() calculates the loop size based on the passed in
parameters, but at the same time it updates the offset and sizelimit
parameters in the loop device configuration. That is a somewhat
unexpected side effect of a function with this name, and it is only only
needed by one of the two callers of this function - loop_set_status().

Move the lo_offset and lo_sizelimit assignment back into loop_set_status(),
and use the newly factored out functions to validate and apply the newly
calculated size. This allows us to get rid of figure_loop_size() in a
follow-up commit.

Signed-off-by: Martijn Coenen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Martijn Coenen
2020-05-21 22:20:34 +0800
716ad0986 loop: Switch to set_capacity_revalidate_and_notify() ... Browse Code »

This was recently added to block/genhd.c, and takes care of both
updating the capacity and notifying userspace of the new size.

Signed-off-by: Martijn Coenen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Martijn Coenen
2020-05-21 22:20:34 +0800
5795b6f56 loop: Factor out setting loop device size ... Browse Code »

This code is used repeatedly.

Signed-off-by: Martijn Coenen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Martijn Coenen
2020-05-21 22:20:34 +0800
083a6a507 loop: Remove sector_t truncation checks ... Browse Code »

sector_t is now always u64, so we don't need to check for truncation.

Signed-off-by: Martijn Coenen
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Martijn Coenen
2020-05-21 22:20:34 +0800
7c5014b09 loop: Call loop_config_discard() only after new config is applied ... Browse Code »

loop_set_status() calls loop_config_discard() to configure discard for
the loop device; however, the discard configuration depends on whether
the loop device uses encryption, and when we call it the encryption
configuration has not been updated yet. Move the call down so we apply
the correct discard configuration based on the new configuration.

Signed-off-by: Martijn Coenen
Reviewed-by: Christoph Hellwig
Reviewed-by: Bob Liu
Reviewed-by: Bart Van Assche
Signed-off-by: Jens Axboe

Martijn Coenen
2020-05-21 22:20:34 +0800

04 May, 2020

1 commit

efbe3c249 fs: Remove unneeded IS_DAX() check in io_is_direct() ... Browse Code »

Remove the check because DAX now has it's own read/write methods and
file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on
their own. Therefore, it does not matter if the file state is DAX when
the iocb flags are created.

Also remove io_is_direct() as it is just a simple flag check.

Reviewed-by: Dave Chinner
Reviewed-by: Jan Kara
Reviewed-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Signed-off-by: Ira Weiny
Signed-off-by: Darrick J. Wong

Ira Weiny
2020-05-04 23:49:39 +0800

04 Apr, 2020

2 commits

c52abf563 loop: Better discard support for block devices ... Browse Code »

If the backing device for a loop device is itself a block device,
then mirror the "write zeroes" capabilities of the underlying
block device into the loop device. Copy this capability into both
max_write_zeroes_sectors and max_discard_sectors of the loop device.

The reason for this is that REQ_OP_DISCARD on a loop device translates
into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This
presents a consistent interface for loop devices (that discarded data
is zeroed), regardless of the backing device type of the loop device.
There should be no behavior change for loop devices backed by regular
files.

This change fixes blktest block/003, and removes an extraneous
error print in block/013 when testing on a loop device backed
by a block device that does not support discard.

Signed-off-by: Evan Green
Reviewed-by: Gwendal Grignou
Reviewed-by: Chaitanya Kulkarni
[used updated version of Evan's comment in loop_config_discard()]
[moved backingq to local scope, removed redundant braces]
Signed-off-by: Andrzej Pietrasiewicz
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Evan Green
2020-04-04 03:44:22 +0800
8cd55087d loop: Report EOPNOTSUPP properly ... Browse Code »

Properly plumb out EOPNOTSUPP from loop driver operations, which may
get returned when for instance a discard operation is attempted but not
supported by the underlying block device. Before this change, everything
was reported in the log as an I/O error, which is scary and not
helpful in debugging.

Signed-off-by: Evan Green
Reviewed-by: Gwendal Grignou
Reviewed-by: Bart Van Assche
Signed-off-by: Andrzej Pietrasiewicz
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Evan Green
2020-04-04 03:44:20 +0800

11 Mar, 2020

2 commits

0fbcf5798 loop: Only freeze block queue when needed. ... Browse Code »

__loop_update_dio() can be called as a part of loop_set_fd(), when the
block queue is not yet up and running; avoid freezing the block queue in
that case, since that is an expensive operation.

Reviewed-by: Christoph Hellwig
Reviewed-by: Chaitanya Kulkarni
Signed-off-by: Martijn Coenen
Signed-off-by: Jens Axboe

Martijn Coenen
2020-03-11 04:10:43 +0800
7e81f99af loop: Only change blocksize when needed. ... Browse Code »

Return early in loop_set_block_size() if the requested block size is
identical to the one we already have; this avoids expensive calls to
freeze the block queue.

Reviewed-by: Christoph Hellwig
Signed-off-by: Martijn Coenen
Signed-off-by: Jens Axboe

Martijn Coenen
2020-03-11 04:10:41 +0800

14 Nov, 2019

1 commit

f0b870df8 block: remove (__)blkdev_reread_part as an exported API ... Browse Code »

In general drivers should never mess with partition tables directly.
Unfortunately s390 and loop do for somewhat historic reasons, but they
can use bdev_disk_changed directly instead when we export it as they
satisfy the sanity checks we have in __blkdev_reread_part.

Signed-off-by: Christoph Hellwig
Reviewed-by: Stefan Haberland [dasd]
Reviewed-by: Jan Kara
Signed-off-by: Jens Axboe

Christoph Hellwig
2019-11-14 22:43:59 +0800

01 Nov, 2019

1 commit

efcfec579 loop: fix no-unmap write-zeroes request behavior ... Browse Code »

Currently, if the loop device receives a WRITE_ZEROES request, it asks
the underlying filesystem to punch out the range. This behavior is
correct if unmapping is allowed. However, a NOUNMAP request means that
the caller doesn't want us to free the storage backing the range, so
punching out the range is incorrect behavior.

To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the
underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to
the fallocate documentation) required to ensure that the entire range is
backed by real storage, which suffices for our purposes.

Fixes: 19372e2769179dd ("loop: implement REQ_OP_WRITE_ZEROES")
Signed-off-by: Darrick J. Wong
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Darrick J. Wong
2019-11-01 22:43:20 +0800

01 Oct, 2019

1 commit

85560117d loop: change queue block size to match when using DIO ... Browse Code »

The loop driver assumes that if the passed in fd is opened with
O_DIRECT, the caller wants to use direct I/O on the loop device.
However, if the underlying block device has a different block size than
the loop block queue, direct I/O can't be enabled. Instead of requiring
userspace to manually change the blocksize and re-enable direct I/O,
just change the queue block sizes to match, as well as the io_min size.

Reviewed-by: Christoph Hellwig
Signed-off-by: Martijn Coenen
Signed-off-by: Jens Axboe

Martijn Coenen
2019-10-01 23:36:01 +0800

18 Sep, 2019

1 commit

7ad67ca55 Merge tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:

- Two NVMe pull requests:
- ana log parse fix from Anton
- nvme quirks support for Apple devices from Ben
- fix missing bio completion tracing for multipath stack devices
from Hannes and Mikhail
- IP TOS settings for nvme rdma and tcp transports from Israel
- rq_dma_dir cleanups from Israel
- tracing for Get LBA Status command from Minwoo
- Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
- Some consolidation between the fabrics transports for handling
the CAP register
- reset race with ns scanning fix for fabrics (move fabrics
commands to a dedicated request queue with a different lifetime
from the admin request queue)."
- controller reset and namespace scan races fixes
- nvme discovery log change uevent support
- naming improvements from Keith
- multiple discovery controllers reject fix from James
- some regular cleanups from various people

- Series fixing (and re-fixing) null_blk debug printing and nr_devices
checks (André)

- A few pull requests from Song, with fixes from Andy, Guoqing,
Guilherme, Neil, Nigel, and Yufen.

- REQ_OP_ZONE_RESET_ALL support (Chaitanya)

- Bio merge handling unification (Christoph)

- Pick default elevator correctly for devices with special needs
(Damien)

- Block stats fixes (Hou)

- Timeout and support devices nbd fixes (Mike)

- Series fixing races around elevator switching and device add/remove
(Ming)

- sed-opal cleanups (Revanth)

- Per device weight support for BFQ (Fam)

- Support for blk-iocost, a new model that can properly account cost of
IO workloads. (Tejun)

- blk-cgroup writeback fixes (Tejun)

- paride queue init fixes (zhengbin)

- blk_set_runtime_active() cleanup (Stanley)

- Block segment mapping optimizations (Bart)

- lightnvm fixes (Hans/Minwoo/YueHaibing)

- Various little fixes and cleanups

* tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
null_blk: format pr_* logs with pr_fmt
null_blk: match the type of parameter nr_devices
null_blk: do not fail the module load with zero devices
block: also check RQF_STATS in blk_mq_need_time_stamp()
block: make rq sector size accessible for block stats
bfq: Fix bfq linkage error
raid5: use bio_end_sector in r5_next_bio
raid5: remove STRIPE_OPS_REQ_PENDING
md: add feature flag MD_FEATURE_RAID0_LAYOUT
md/raid0: avoid RAID0 data corruption due to layout confusion.
raid5: don't set STRIPE_HANDLE to stripe which is in batch list
raid5: don't increment read_errors on EILSEQ return
nvmet: fix a wrong error status returned in error log page
nvme: send discovery log page change events to userspace
nvme: add uevent variables for controller devices
nvme: enable aen regardless of the presence of I/O queues
nvme-fabrics: allow discovery subsystems accept a kato
nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
nvme: Remove redundant assignment of cq vector
nvme: Assign subsys instance from first ctrl
...

Linus Torvalds
2019-09-18 07:57:47 +0800