13 Nov, 2020

1 commit

  • Commit 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify")
    causes an occasional drop of loop device uevent, which are no longer
    triggered in loop_set_size() but in a different part of code.

    Bug is reproducible with LTP test uevent01 [1]:

    i=0; while true; do
    i=$((i+1)); echo "== $i =="
    lsmod |grep -q loop && rmmod -f loop
    ./uevent01 || break
    done

    Put back triggering through code called in loop_set_size().

    Fix required to add yet another parameter to
    set_capacity_revalidate_and_notify().

    [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/uevents/uevent01.c

    [hch: rebased on a different change to the prototype of
    set_capacity_revalidate_and_notify]

    Cc: stable@vger.kernel.org # v5.9
    Fixes: 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify")
    Reported-by:
    Signed-off-by: Petr Vorel
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Petr Vorel
     

02 Sep, 2020

1 commit


29 Aug, 2020

1 commit

  • Pull block fixes from Jens Axboe:

    - nbd timeout fix (Hou)

    - device size fix for loop LOOP_CONFIGURE (Martijn)

    - MD pull from Song with raid5 stripe size fix (Yufen)

    * tag 'block-5.9-2020-08-28' of git://git.kernel.dk/linux-block:
    md/raid5: make sure stripe_size as power of two
    loop: Set correct device size when using LOOP_CONFIGURE
    nbd: restore default timeout when setting it to zero

    Linus Torvalds
     

26 Aug, 2020

1 commit

  • The device size calculation was done before processing the loop
    configuration, which meant that the we set the size on the underlying
    block device incorrectly in case lo_offset/lo_sizelimit were set in the
    configuration. Delay computing the size until we've setup the device
    parameters correctly.

    Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl")
    Reported-by: Lennart Poettering
    Tested-by: Yang Xu
    Signed-off-by: Martijn Coenen
    Signed-off-by: Jens Axboe

    Martijn Coenen
     

25 Aug, 2020

1 commit

  • Pull block fixes from Jens Axboe:

    - NVMe pull request from Sagi:
    - nvme completion rework from Christoph and Chao that mostly came
    from a bit of divergence of how we classify errors related to
    pathing/retry etc.
    - nvmet passthru fixes from Chaitanya
    - minor nvmet fixes from Amit and I
    - mpath round-robin path selection fix from Martin
    - ignore noiob for zoned devices from Keith
    - minor nvme-fc fix from Tianjia"

    - BFQ cgroup leak fix (Dmitry)

    - block layer MAINTAINERS addition (Geert)

    - fix null_blk FUA checking (Hou)

    - get_max_io_size() size fix (Keith)

    - fix block page_is_mergeable() for compound pages (Matthew)

    - discard granularity fixes (Ming)

    - IO scheduler ordering fix (Ming)

    - misc fixes

    * tag 'io_uring-5.9-2020-08-23' of git://git.kernel.dk/linux-block: (31 commits)
    null_blk: fix passing of REQ_FUA flag in null_handle_rq
    nvmet: Disable keep-alive timer when kato is cleared to 0h
    nvme: redirect commands on dying queue
    nvme: just check the status code type in nvme_is_path_error
    nvme: refactor command completion
    nvme: rename and document nvme_end_request
    nvme: skip noiob for zoned devices
    nvme-pci: fix PRP pool size
    nvme-pci: Use u32 for nvme_dev.q_depth and nvme_queue.q_depth
    nvme: Use spin_lock_irq() when taking the ctrl->lock
    nvmet: call blk_mq_free_request() directly
    nvmet: fix oops in pt cmd execution
    nvmet: add ns tear down label for pt-cmd handling
    nvme: multipath: round-robin: eliminate "fallback" variable
    nvme: multipath: round-robin: fix single non-optimized path case
    nvme-fc: Fix wrong return value in __nvme_fc_init_request()
    nvmet-passthru: Reject commands with non-sgl flags set
    nvmet: fix a memory leak
    blkcg: fix memleak for iolatency
    MAINTAINERS: Add missing header files to BLOCK LAYER section
    ...

    Linus Torvalds
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

17 Aug, 2020

1 commit

  • In case of block device backend, if the backend supports write zeros, the
    loop device will set queue flag of QUEUE_FLAG_DISCARD. However,
    limits.discard_granularity isn't setup, and this way is wrong,
    see the following description in Documentation/ABI/testing/sysfs-block:

    A discard_granularity of 0 means that the device does not support
    discard functionality.

    Especially 9b15d109a6b2 ("block: improve discard bio alignment in
    __blkdev_issue_discard()") starts to take q->limits.discard_granularity
    for computing max discard sectors. And zero discard granularity may cause
    kernel oops, or fail discard request even though the loop queue claims
    discard support via QUEUE_FLAG_DISCARD.

    Fix the issue by setup discard granularity and alignment.

    Fixes: c52abf563049 ("loop: Better discard support for block devices")
    Signed-off-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Acked-by: Coly Li
    Cc: Hannes Reinecke
    Cc: Xiao Ni
    Cc: Martin K. Petersen
    Cc: Evan Green
    Cc: Gwendal Grignou
    Cc: Chaitanya Kulkarni
    Cc: Andrzej Pietrasiewicz
    Cc: Christoph Hellwig
    Cc:
    Signed-off-by: Jens Axboe

    Ming Lei
     

11 Aug, 2020

1 commit

  • When LOOP_CONFIGURE is used with LO_FLAGS_PARTSCAN we need to propagate
    this into the GENHD_FL_NO_PART_SCAN. LOOP_SETSTATUS does this,
    LOOP_CONFIGURE doesn't so far. Effect is that setting up a loopback
    device with partition scanning doesn't actually work when LOOP_CONFIGURE
    is issued, though it works fine with LOOP_SETSTATUS.

    Let's correct that and propagate the flag in LOOP_CONFIGURE too.

    Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl")

    Signed-off-by: Lennart Poettering
    Acked-by: Martijn Coenen
    Signed-off-by: Jens Axboe

    Lennart Poettering
     

16 Jul, 2020

1 commit

  • The arcane magic in bd_start_claiming is only needed to be able to claim
    a block_device that hasn't been fully set up. Switch the loop driver
    that claims from the ioctl path with a fully set up struct block_device
    to just use the much simpler bd_prepare_to_claim directly.

    Signed-off-by: Christoph Hellwig
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

24 Jun, 2020

2 commits


18 Jun, 2020

1 commit

  • When a filesystem is mounted on a loop device and on a loop ioctl
    LOOP_SET_STATUS64, because of kill_bdev, buffer_head mappings are getting
    destroyed.
    kill_bdev
    truncate_inode_pages
    truncate_inode_pages_range
    do_invalidatepage
    block_invalidatepage
    discard_buffer -->clear BH_Mapped flag

    sb_bread
    __bread_gfp
    bh = __getblk_gfp
    -->discard_buffer clear BH_Mapped flag
    __bread_slow
    submit_bh
    submit_bh_wbc
    BUG_ON(!buffer_mapped(bh)) --> hit this BUG_ON

    Fixes: 5db470e229e2 ("loop: drop caches if offset or block_size are changed")
    Signed-off-by: Zheng Bin
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Zheng Bin
     

12 Jun, 2020

1 commit

  • Pull block fixes from Jens Axboe:
    "Some followup fixes for this merge window. In particular:

    - Seqcount write missing preemption disable for stats (Ahmed)

    - blktrace fixes (Chaitanya)

    - Redundant initializations (Colin)

    - Various small NVMe fixes (Chaitanya, Christoph, Daniel, Max,
    Niklas, Rikard)

    - loop flag bug regression fix (Martijn)

    - blk-mq tagging fixes (Christoph, Ming)"

    * tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-block:
    umem: remove redundant initialization of variable ret
    pktcdvd: remove redundant initialization of variable ret
    nvmet: fail outstanding host posted AEN req
    nvme-pci: use simple suspend when a HMB is enabled
    nvme-fc: don't call nvme_cleanup_cmd() for AENs
    nvmet-tcp: constify nvmet_tcp_ops
    nvme-tcp: constify nvme_tcp_mq_ops and nvme_tcp_admin_mq_ops
    nvme: do not call del_gendisk() on a disk that was never added
    blk-mq: fix blk_mq_all_tag_iter
    blk-mq: split out a __blk_mq_get_driver_tag helper
    blktrace: fix endianness for blk_log_remap()
    blktrace: fix endianness in get_pdu_int()
    blktrace: use errno instead of bi_status
    block: nr_sects_write(): Disable preemption on seqcount write
    block: remove the error argument to the block_bio_complete tracepoint
    loop: Fix wrong masking of status flags
    block/bio-integrity: don't free 'buf' if bio_integrity_add_page() failed

    Linus Torvalds
     

05 Jun, 2020

1 commit

  • In faf1d25440d6, loop_set_status() now assigns lo_status directly from
    the passed in lo_flags, but then fixes it up by masking out flags that
    can't be set by LOOP_SET_STATUS; unfortunately the mask was negated.

    Re-ran all ltp ioctl_loop tests, and they all passed.

    Pass run of the previously failing one:

    tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s
    tst_device.c:88: INFO: Found free device 0 '/dev/loop0'
    ioctl_loop01.c:49: PASS: /sys/block/loop0/loop/partscan = 0
    ioctl_loop01.c:50: PASS: /sys/block/loop0/loop/autoclear = 0
    ioctl_loop01.c:51: PASS: /sys/block/loop0/loop/backing_file =
    '/tmp/ZRJ6H4/test.img'
    ioctl_loop01.c:65: PASS: get expected lo_flag 12
    ioctl_loop01.c:67: PASS: /sys/block/loop0/loop/partscan = 1
    ioctl_loop01.c:68: PASS: /sys/block/loop0/loop/autoclear = 1
    ioctl_loop01.c:77: PASS: access /dev/loop0p1 succeeds
    ioctl_loop01.c:83: PASS: access /sys/block/loop0/loop0p1 succeeds

    Summary:
    passed 8
    failed 0
    skipped 0
    warnings 0

    Fixes: faf1d25440d6 ("loop: Clean up LOOP_SET_STATUS lo_flags handling")
    Reported-by: Naresh Kamboju
    Signed-off-by: Martijn Coenen
    Tested-by: Naresh Kamboju
    Signed-off-by: Jens Axboe

    Martijn Coenen
     

03 Jun, 2020

4 commits

  • Pull DAX updates part one from Darrick Wong:
    "After many years of LKML-wrangling about how to enable programs to
    query and influence the file data access mode (DAX) when a filesystem
    resides on storage devices such as persistent memory, Ira Weiny has
    emerged with a proposed set of standard behaviors that has not been
    shot down by anyone! We're more or less standardizing on the current
    XFS behavior and adapting ext4 to do the same.

    This is the first of a handful pull requests that will make ext4 and
    XFS present a consistent interface for user programs that care about
    DAX. We add a statx attribute that programs can check to see if DAX is
    enabled on a particular file. Then, we update the DAX documentation to
    spell out the user-visible behaviors that filesystems will guarantee
    (until the next storage industry shakeup). The on-disk inode flag has
    been in XFS for a few years now.

    Summary:

    - Clean up io_is_direct.

    - Add a new statx flag to indicate when file data access is being
    done via DAX (as opposed to the page cache).

    - Update the documentation for how system administrators and
    application programmers can take advantage of the (still
    experimental DAX) feature"

    Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/

    * tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    Documentation/dax: Update Usage section
    fs/stat: Define DAX statx attribute
    fs: Remove unneeded IS_DAX() check in io_is_direct()

    Linus Torvalds
     
  • Pull block driver updates from Jens Axboe:
    "On top of the core changes, here are the block driver changes for this
    merge window:

    - NVMe changes:
    - NVMe over Fibre Channel protocol updates, which also reach
    over to drivers/scsi/lpfc (James Smart)
    - namespace revalidation support on the target (Anthony
    Iliopoulos)
    - gcc zero length array fix (Arnd Bergmann)
    - nvmet cleanups (Chaitanya Kulkarni)
    - misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
    - use a SRQ per completion vector (Max Gurtovoy)
    - fix handling of runtime changes to the queue count (Weiping
    Zhang)
    - t10 protection information support for nvme-rdma and
    nvmet-rdma (Israel Rukshin and Max Gurtovoy)
    - target side AEN improvements (Chaitanya Kulkarni)
    - various fixes and minor improvements all over, icluding the
    nvme part of the lpfc driver"

    - Floppy code cleanup series (Willy, Denis)

    - Floppy contention fix (Jiri)

    - Loop CONFIGURE support (Martijn)

    - bcache fixes/improvements (Coly, Joe, Colin)

    - q->queuedata cleanups (Christoph)

    - Get rid of ioctl_by_bdev (Christoph, Stefan)

    - md/raid5 allocation fixes (Coly)

    - zero length array fixes (Gustavo)

    - swim3 task state fix (Xu)"

    * tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
    bcache: configure the asynchronous registertion to be experimental
    bcache: asynchronous devices registration
    bcache: fix refcount underflow in bcache_device_free()
    bcache: Convert pr_ uses to a more typical style
    bcache: remove redundant variables i and n
    lpfc: Fix return value in __lpfc_nvme_ls_abort
    lpfc: fix axchg pointer reference after free and double frees
    lpfc: Fix pointer checks and comments in LS receive refactoring
    nvme: set dma alignment to qword
    nvmet: cleanups the loop in nvmet_async_events_process
    nvmet: fix memory leak when removing namespaces and controllers concurrently
    nvmet-rdma: add metadata/T10-PI support
    nvmet: add metadata support for block devices
    nvmet: add metadata/T10-PI support
    nvme: add Metadata Capabilities enumerations
    nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
    nvmet: rename nvmet_rw_len to nvmet_rw_data_len
    nvmet: add metadata characteristics for a namespace
    nvme-rdma: add metadata/T10-PI support
    nvme-rdma: introduce nvme_rdma_sgl structure
    ...

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:
    "Core block changes that have been queued up for this release:

    - Remove dead blk-throttle and blk-wbt code (Guoqing)

    - Include pid in blktrace note traces (Jan)

    - Don't spew I/O errors on wouldblock termination (me)

    - Zone append addition (Johannes, Keith, Damien)

    - IO accounting improvements (Konstantin, Christoph)

    - blk-mq hardware map update improvements (Ming)

    - Scheduler dispatch improvement (Salman)

    - Inline block encryption support (Satya)

    - Request map fixes and improvements (Weiping)

    - blk-iocost tweaks (Tejun)

    - Fix for timeout failing with error injection (Keith)

    - Queue re-run fixes (Douglas)

    - CPU hotplug improvements (Christoph)

    - Queue entry/exit improvements (Christoph)

    - Move DMA drain handling to the few drivers that use it (Christoph)

    - Partition handling cleanups (Christoph)"

    * tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits)
    block: mark bio_wouldblock_error() bio with BIO_QUIET
    blk-wbt: rename __wbt_update_limits to wbt_update_limits
    blk-wbt: remove wbt_update_limits
    blk-throttle: remove tg_drain_bios
    blk-throttle: remove blk_throtl_drain
    null_blk: force complete for timeout request
    blk-mq: drain I/O when all CPUs in a hctx are offline
    blk-mq: add blk_mq_all_tag_iter
    blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx
    blk-mq: use BLK_MQ_NO_TAG in more places
    blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG
    blk-mq: move more request initialization to blk_mq_rq_ctx_init
    blk-mq: simplify the blk_mq_get_request calling convention
    blk-mq: remove the bio argument to ->prepare_request
    nvme: force complete cancelled requests
    blk-mq: blk-mq: provide forced completion method
    block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds
    block: blk-crypto-fallback: remove redundant initialization of variable err
    block: reduce part_stat_lock() scope
    block: use __this_cpu_add() instead of access by smp_processor_id()
    ...

    Linus Torvalds
     
  • PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the
    loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a
    daemon needs to write to one bdi (the final bdi) in order to free up
    writes queued to another bdi (the client bdi).

    The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
    pages, so that it can still dirty pages after other processses have been
    throttled. The purpose of this is to avoid deadlock that happen when
    the PF_LESS_THROTTLE process must write for any dirty pages to be freed,
    but it is being thottled and cannot write.

    This approach was designed when all threads were blocked equally,
    independently on which device they were writing to, or how fast it was.
    Since that time the writeback algorithm has changed substantially with
    different threads getting different allowances based on non-trivial
    heuristics. This means the simple "add 25%" heuristic is no longer
    reliable.

    The important issue is not that the daemon needs a *larger* dirty page
    allowance, but that it needs a *private* dirty page allowance, so that
    dirty pages for the "client" bdi that it is helping to clear (the bdi
    for an NFS filesystem or loop block device etc) do not affect the
    throttling of the daemon writing to the "final" bdi.

    This patch changes the heuristic so that the task is not throttled when
    the bdi it is writing to has a dirty page count below below (or equal
    to) the free-run threshold for that bdi. This ensures it will always be
    able to have some pages in flight, and so will not deadlock.

    In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might
    still be throttled by global threshold, but that is acceptable as it is
    only the deadlock state that is interesting for this flag.

    This approach of "only throttle when target bdi is busy" is consistent
    with the other use of PF_LESS_THROTTLE in current_may_throttle(), were
    it causes attention to be focussed only on the target bdi.

    So this patch
    - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
    - removes the 25% bonus that that flag gives, and
    - If PF_LOCAL_THROTTLE is set, don't delay at all unless the
    global and the local free-run thresholds are exceeded.

    Note that previously realtime threads were treated the same as
    PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour
    for real-time threads, so it is now different from the behaviour of nfsd
    and loop tasks. I don't know what is wanted for realtime.

    [akpm@linux-foundation.org: coding style fixes]
    Signed-off-by: NeilBrown
    Signed-off-by: Andrew Morton
    Reviewed-by: Jan Kara
    Acked-by: Chuck Lever [nfsd]
    Cc: Christoph Hellwig
    Cc: Michal Hocko
    Cc: Trond Myklebust
    Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name
    Signed-off-by: Linus Torvalds

    NeilBrown
     

30 May, 2020

1 commit

  • Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
    up queue mapping. Thomas mentioned the following point[1]:

    "That was the constraint of managed interrupts from the very beginning:

    The driver/subsystem has to quiesce the interrupt line and the associated
    queue _before_ it gets shutdown in CPU unplug and not fiddle with it
    until it's restarted by the core when the CPU is plugged in again."

    However, current blk-mq implementation doesn't quiesce hw queue before
    the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a
    cpuhp state handled after the CPU is down, so there isn't any chance to
    quiesce the hctx before shutting down the CPU.

    Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs
    where the last CPU goes away, and wait for completion of in-flight
    requests. This guarantees that there is no inflight I/O before shutting
    down the managed IRQ.

    Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need
    to wait for completion of in-flight requests from these drivers to avoid
    a potential dead-lock. It is safe to do this for stacking drivers as those
    do not use interrupts at all and their I/O completions are triggered by
    underlying devices I/O completion.

    [1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/

    [hch: different retry mechanism, merged two patches, minor cleanups]

    Signed-off-by: Ming Lei
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Daniel Wagner
    Signed-off-by: Jens Axboe

    Ming Lei
     

25 May, 2020

1 commit


21 May, 2020

11 commits

  • This allows userspace to completely setup a loop device with a single
    ioctl, removing the in-between state where the device can be partially
    configured - eg the loop device has a backing file associated with it,
    but is reading from the wrong offset.

    Besides removing the intermediate state, another big benefit of this
    ioctl is that LOOP_SET_STATUS can be slow; the main reason for this
    slowness is that LOOP_SET_STATUS(64) calls blk_mq_freeze_queue() to
    freeze the associated queue; this requires waiting for RCU
    synchronization, which I've measured can take about 15-20ms on this
    device on average.

    In addition to doing what LOOP_SET_STATUS can do, LOOP_CONFIGURE can
    also be used to:
    - Set the correct block size immediately by setting
    loop_config.block_size (avoids LOOP_SET_BLOCK_SIZE)
    - Explicitly request direct I/O mode by setting LO_FLAGS_DIRECT_IO
    in loop_config.info.lo_flags (avoids LOOP_SET_DIRECT_IO)
    - Explicitly request read-only mode by setting LO_FLAGS_READ_ONLY
    in loop_config.info.lo_flags

    Here's setting up ~70 regular loop devices with an offset on an x86
    Android device, using LOOP_SET_FD and LOOP_SET_STATUS:

    vsoc_x86:/system/apex # time for i in `seq 30 100`;
    do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
    0m03.40s real 0m00.02s user 0m00.03s system

    Here's configuring ~70 devices in the same way, but using a modified
    losetup that uses the new LOOP_CONFIGURE ioctl:

    vsoc_x86:/system/apex # time for i in `seq 30 100`;
    do losetup -r -o 4096 /dev/block/loop$i com.android.adbd.apex; done
    0m01.94s real 0m00.01s user 0m00.01s system

    Signed-off-by: Martijn Coenen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Martijn Coenen
     
  • LOOP_SET_STATUS(64) will actually allow some lo_flags to be modified; in
    particular, LO_FLAGS_AUTOCLEAR can be set and cleared, whereas
    LO_FLAGS_PARTSCAN can be set to request a partition scan. Make this
    explicit by updating the UAPI to include the flags that can be
    set/cleared using this ioctl.

    The implementation can then blindly take over the passed in flags,
    and use the previous flags for those flags that can't be set / cleared
    using LOOP_SET_STATUS.

    Signed-off-by: Martijn Coenen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Martijn Coenen
     
  • In preparation for a new ioctl that needs to copy_from_user(); makes the
    code easier to read as well.

    Signed-off-by: Martijn Coenen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Martijn Coenen
     
  • So we can use it without forward declaration. This is a separate commit
    to make it easier to verify that this is just a move, without functional
    modifications.

    Signed-off-by: Martijn Coenen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Martijn Coenen
     
  • Factor out this code into a separate function, so it can be reused by
    other code more easily.

    Signed-off-by: Martijn Coenen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Martijn Coenen
     
  • This function was now only used by loop_set_capacity(). Just open code
    the remaining code in the caller instead.

    Signed-off-by: Martijn Coenen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Martijn Coenen
     
  • figure_loop_size() calculates the loop size based on the passed in
    parameters, but at the same time it updates the offset and sizelimit
    parameters in the loop device configuration. That is a somewhat
    unexpected side effect of a function with this name, and it is only only
    needed by one of the two callers of this function - loop_set_status().

    Move the lo_offset and lo_sizelimit assignment back into loop_set_status(),
    and use the newly factored out functions to validate and apply the newly
    calculated size. This allows us to get rid of figure_loop_size() in a
    follow-up commit.

    Signed-off-by: Martijn Coenen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Martijn Coenen
     
  • This was recently added to block/genhd.c, and takes care of both
    updating the capacity and notifying userspace of the new size.

    Signed-off-by: Martijn Coenen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Martijn Coenen
     
  • This code is used repeatedly.

    Signed-off-by: Martijn Coenen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Martijn Coenen
     
  • sector_t is now always u64, so we don't need to check for truncation.

    Signed-off-by: Martijn Coenen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Martijn Coenen
     
  • loop_set_status() calls loop_config_discard() to configure discard for
    the loop device; however, the discard configuration depends on whether
    the loop device uses encryption, and when we call it the encryption
    configuration has not been updated yet. Move the call down so we apply
    the correct discard configuration based on the new configuration.

    Signed-off-by: Martijn Coenen
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Bob Liu
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Martijn Coenen
     

04 May, 2020

1 commit

  • Remove the check because DAX now has it's own read/write methods and
    file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on
    their own. Therefore, it does not matter if the file state is DAX when
    the iocb flags are created.

    Also remove io_is_direct() as it is just a simple flag check.

    Reviewed-by: Dave Chinner
    Reviewed-by: Jan Kara
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ira Weiny
    Signed-off-by: Darrick J. Wong

    Ira Weiny
     

04 Apr, 2020

2 commits

  • If the backing device for a loop device is itself a block device,
    then mirror the "write zeroes" capabilities of the underlying
    block device into the loop device. Copy this capability into both
    max_write_zeroes_sectors and max_discard_sectors of the loop device.

    The reason for this is that REQ_OP_DISCARD on a loop device translates
    into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This
    presents a consistent interface for loop devices (that discarded data
    is zeroed), regardless of the backing device type of the loop device.
    There should be no behavior change for loop devices backed by regular
    files.

    This change fixes blktest block/003, and removes an extraneous
    error print in block/013 when testing on a loop device backed
    by a block device that does not support discard.

    Signed-off-by: Evan Green
    Reviewed-by: Gwendal Grignou
    Reviewed-by: Chaitanya Kulkarni
    [used updated version of Evan's comment in loop_config_discard()]
    [moved backingq to local scope, removed redundant braces]
    Signed-off-by: Andrzej Pietrasiewicz
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Evan Green
     
  • Properly plumb out EOPNOTSUPP from loop driver operations, which may
    get returned when for instance a discard operation is attempted but not
    supported by the underlying block device. Before this change, everything
    was reported in the log as an I/O error, which is scary and not
    helpful in debugging.

    Signed-off-by: Evan Green
    Reviewed-by: Gwendal Grignou
    Reviewed-by: Bart Van Assche
    Signed-off-by: Andrzej Pietrasiewicz
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Evan Green
     

11 Mar, 2020

2 commits

  • __loop_update_dio() can be called as a part of loop_set_fd(), when the
    block queue is not yet up and running; avoid freezing the block queue in
    that case, since that is an expensive operation.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Chaitanya Kulkarni
    Signed-off-by: Martijn Coenen
    Signed-off-by: Jens Axboe

    Martijn Coenen
     
  • Return early in loop_set_block_size() if the requested block size is
    identical to the one we already have; this avoids expensive calls to
    freeze the block queue.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Martijn Coenen
    Signed-off-by: Jens Axboe

    Martijn Coenen
     

14 Nov, 2019

1 commit

  • In general drivers should never mess with partition tables directly.
    Unfortunately s390 and loop do for somewhat historic reasons, but they
    can use bdev_disk_changed directly instead when we export it as they
    satisfy the sanity checks we have in __blkdev_reread_part.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Stefan Haberland [dasd]
    Reviewed-by: Jan Kara
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

01 Nov, 2019

1 commit

  • Currently, if the loop device receives a WRITE_ZEROES request, it asks
    the underlying filesystem to punch out the range. This behavior is
    correct if unmapping is allowed. However, a NOUNMAP request means that
    the caller doesn't want us to free the storage backing the range, so
    punching out the range is incorrect behavior.

    To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the
    underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to
    the fallocate documentation) required to ensure that the entire range is
    backed by real storage, which suffices for our purposes.

    Fixes: 19372e2769179dd ("loop: implement REQ_OP_WRITE_ZEROES")
    Signed-off-by: Darrick J. Wong
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Darrick J. Wong
     

01 Oct, 2019

1 commit

  • The loop driver assumes that if the passed in fd is opened with
    O_DIRECT, the caller wants to use direct I/O on the loop device.
    However, if the underlying block device has a different block size than
    the loop block queue, direct I/O can't be enabled. Instead of requiring
    userspace to manually change the blocksize and re-enable direct I/O,
    just change the queue block sizes to match, as well as the io_min size.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Martijn Coenen
    Signed-off-by: Jens Axboe

    Martijn Coenen
     

18 Sep, 2019

1 commit

  • Pull block updates from Jens Axboe:

    - Two NVMe pull requests:
    - ana log parse fix from Anton
    - nvme quirks support for Apple devices from Ben
    - fix missing bio completion tracing for multipath stack devices
    from Hannes and Mikhail
    - IP TOS settings for nvme rdma and tcp transports from Israel
    - rq_dma_dir cleanups from Israel
    - tracing for Get LBA Status command from Minwoo
    - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
    - Some consolidation between the fabrics transports for handling
    the CAP register
    - reset race with ns scanning fix for fabrics (move fabrics
    commands to a dedicated request queue with a different lifetime
    from the admin request queue)."
    - controller reset and namespace scan races fixes
    - nvme discovery log change uevent support
    - naming improvements from Keith
    - multiple discovery controllers reject fix from James
    - some regular cleanups from various people

    - Series fixing (and re-fixing) null_blk debug printing and nr_devices
    checks (André)

    - A few pull requests from Song, with fixes from Andy, Guoqing,
    Guilherme, Neil, Nigel, and Yufen.

    - REQ_OP_ZONE_RESET_ALL support (Chaitanya)

    - Bio merge handling unification (Christoph)

    - Pick default elevator correctly for devices with special needs
    (Damien)

    - Block stats fixes (Hou)

    - Timeout and support devices nbd fixes (Mike)

    - Series fixing races around elevator switching and device add/remove
    (Ming)

    - sed-opal cleanups (Revanth)

    - Per device weight support for BFQ (Fam)

    - Support for blk-iocost, a new model that can properly account cost of
    IO workloads. (Tejun)

    - blk-cgroup writeback fixes (Tejun)

    - paride queue init fixes (zhengbin)

    - blk_set_runtime_active() cleanup (Stanley)

    - Block segment mapping optimizations (Bart)

    - lightnvm fixes (Hans/Minwoo/YueHaibing)

    - Various little fixes and cleanups

    * tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
    null_blk: format pr_* logs with pr_fmt
    null_blk: match the type of parameter nr_devices
    null_blk: do not fail the module load with zero devices
    block: also check RQF_STATS in blk_mq_need_time_stamp()
    block: make rq sector size accessible for block stats
    bfq: Fix bfq linkage error
    raid5: use bio_end_sector in r5_next_bio
    raid5: remove STRIPE_OPS_REQ_PENDING
    md: add feature flag MD_FEATURE_RAID0_LAYOUT
    md/raid0: avoid RAID0 data corruption due to layout confusion.
    raid5: don't set STRIPE_HANDLE to stripe which is in batch list
    raid5: don't increment read_errors on EILSEQ return
    nvmet: fix a wrong error status returned in error log page
    nvme: send discovery log page change events to userspace
    nvme: add uevent variables for controller devices
    nvme: enable aen regardless of the presence of I/O queues
    nvme-fabrics: allow discovery subsystems accept a kato
    nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
    nvme: Remove redundant assignment of cq vector
    nvme: Assign subsys instance from first ctrl
    ...

    Linus Torvalds