25 Jan, 2021

1 commit


20 Jan, 2021

1 commit

  • commit 0378c625afe80eb3f212adae42cc33c9f6f31abf upstream.

    There wasn't ever a real need to log an error in the kernel log for
    ioctls issued with insufficient permissions. Simply return an error
    and if an admin/user is sufficiently motivated they can enable DM's
    dynamic debugging to see an explanation for why the ioctls were
    disallowed.

    Reported-by: Nir Soffer
    Fixes: e980f62353c6 ("dm: don't allow ioctls to targets that don't map to whole devices")
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Mike Snitzer
     

09 Dec, 2020

1 commit


05 Dec, 2020

3 commits

  • Fixes sparse warnings:
    drivers/md/dm.c:508:12: warning: context imbalance in 'dm_prepare_ioctl' - wrong count at exit
    drivers/md/dm.c:543:13: warning: context imbalance in 'dm_unprepare_ioctl' - wrong count at exit

    Fixes: 971888c46993f ("dm: hold DM table for duration of ioctl rather than use blkdev_get")
    Cc: stable@vger.kernel.org
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     
  • Remove redundant dm_put_live_table() in dm_dax_zero_page_range() error
    path to fix sparse warning:
    drivers/md/dm.c:1208:9: warning: context imbalance in 'dm_dax_zero_page_range' - unexpected unlock

    Fixes: cdf6cdcd3b99a ("dm,dax: Add dax zero_page_range operation")
    Cc: stable@vger.kernel.org
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     
  • Commit 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account
    for target-specific splitting") caused a couple regressions:
    1) Using lcm_not_zero() when stacking chunk_sectors was a bug because
    chunk_sectors must reflect the most limited of all devices in the
    IO stack.
    2) DM targets that set max_io_len but that do _not_ provide an
    .iterate_devices method no longer had there IO split properly.

    And commit 5091cdec56fa ("dm: change max_io_len() to use
    blk_max_size_offset()") also caused a regression where DM no longer
    supported varied (per target) IO splitting. The implication being the
    potential for severely reduced performance for IO stacks that use a DM
    target like dm-cache to hide performance limitations of a slower
    device (e.g. one that requires 4K IO splitting).

    Coming full circle: Fix all these issues by discontinuing stacking
    chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(),
    add optional chunk_sectors override argument to blk_max_size_offset()
    and update DM's max_io_len() to pass ti->max_io_len to its
    blk_max_size_offset() call.

    Passing in an optional chunk_sectors override to blk_max_size_offset()
    allows for code reuse of block's centralized calculation for max IO
    size based on provided offset and split boundary.

    Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
    Fixes: 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()")
    Cc: stable@vger.kernel.org
    Reported-by: John Dorminy
    Reported-by: Bruce Johnston
    Reported-by: Kirill Tkhai
    Reviewed-by: John Dorminy
    Signed-off-by: Mike Snitzer
    Reviewed-by: Jens Axboe

    Mike Snitzer
     

02 Dec, 2020

1 commit


25 Oct, 2020

1 commit


24 Oct, 2020

2 commits


15 Oct, 2020

1 commit

  • …/device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - Improve DM core's bio splitting to use blk_max_size_offset(). Also
    fix bio splitting for bios that were deferred to the worker thread
    due to a DM device being suspended.

    - Remove DM core's special handling of NVMe devices now that block core
    has internalized efficiencies drivers previously needed to be
    concerned about (via now removed direct_make_request).

    - Fix request-based DM to not bounce through indirect dm_submit_bio;
    instead have block core make direct call to blk_mq_submit_bio().

    - Various DM core cleanups to simplify and improve code.

    - Update DM cryot to not use drivers that set
    CRYPTO_ALG_ALLOCATES_MEMORY.

    - Fix DM raid's raid1 and raid10 discard limits for the purposes of
    linux-stable. But then remove DM raid's discard limits settings now
    that MD raid can efficiently handle large discards.

    - A couple small cleanups across various targets.

    * tag 'for-5.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm: fix request-based DM to not bounce through indirect dm_submit_bio
    dm: remove special-casing of bio-based immutable singleton target on NVMe
    dm: export dm_copy_name_and_uuid
    dm: fix comment in __dm_suspend()
    dm: fold dm_process_bio() into dm_submit_bio()
    dm: fix missing imposition of queue_limits from dm_wq_work() thread
    dm snap persistent: simplify area_io()
    dm thin metadata: Remove unused local variable when create thin and snap
    dm raid: remove unnecessary discard limits for raid10
    dm raid: fix discard limits for raid1 and raid10
    dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
    dm: use dm_table_get_device_name() where appropriate in targets
    dm table: make 'struct dm_table' definition accessible to all of DM core
    dm: eliminate need for start_io_acct() forward declaration
    dm: simplify __process_abnormal_io()
    dm: push use of on-stack flush_bio down to __send_empty_flush()
    dm: optimize max_io_len() by inlining max_io_len_target_boundary()
    dm: push md->immutable_target optimization down to __process_bio()
    dm: change max_io_len() to use blk_max_size_offset()
    dm table: stack 'chunk_sectors' limit to account for target-specific splitting

    Linus Torvalds
     

14 Oct, 2020

1 commit

  • Pull block updates from Jens Axboe:

    - Series of merge handling cleanups (Baolin, Christoph)

    - Series of blk-throttle fixes and cleanups (Baolin)

    - Series cleaning up BDI, seperating the block device from the
    backing_dev_info (Christoph)

    - Removal of bdget() as a generic API (Christoph)

    - Removal of blkdev_get() as a generic API (Christoph)

    - Cleanup of is-partition checks (Christoph)

    - Series reworking disk revalidation (Christoph)

    - Series cleaning up bio flags (Christoph)

    - bio crypt fixes (Eric)

    - IO stats inflight tweak (Gabriel)

    - blk-mq tags fixes (Hannes)

    - Buffer invalidation fixes (Jan)

    - Allow soft limits for zone append (Johannes)

    - Shared tag set improvements (John, Kashyap)

    - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

    - DM no-wait support (Mike, Konstantin)

    - Request allocation improvements (Ming)

    - Allow md/dm/bcache to use IO stat helpers (Song)

    - Series improving blk-iocost (Tejun)

    - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
    Xianting, Yang, Yufen, yangerkun)

    * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
    block: fix uapi blkzoned.h comments
    blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
    blk-mq: get rid of the dead flush handle code path
    block: get rid of unnecessary local variable
    block: fix comment and add lockdep assert
    blk-mq: use helper function to test hw stopped
    block: use helper function to test queue register
    block: remove redundant mq check
    block: invoke blk_mq_exit_sched no matter whether have .exit_sched
    percpu_ref: don't refer to ref->data if it isn't allocated
    block: ratelimit handle_bad_sector() message
    blk-throttle: Re-use the throtl_set_slice_end()
    blk-throttle: Open code __throtl_de/enqueue_tg()
    blk-throttle: Move service tree validation out of the throtl_rb_first()
    blk-throttle: Move the list operation after list validation
    blk-throttle: Fix IO hang for a corner case
    blk-throttle: Avoid tracking latency if low limit is invalid
    blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
    blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
    block: Remove redundant 'return' statement
    ...

    Linus Torvalds
     

08 Oct, 2020

2 commits

  • It is unnecessary to force request-based DM to call into bio-based
    dm_submit_bio (via indirect disk->fops->submit_bio) only to have it then
    call blk_mq_submit_bio().

    Fix this by establishing a request-based DM block_device_operations
    (dm_rq_blk_dops, which doesn't have .submit_bio) and update
    dm_setup_md_queue() to set md->disk->fops to it for
    DM_TYPE_REQUEST_BASED.

    Remove DM_TYPE_REQUEST_BASED conditional in dm_submit_bio and unexport
    blk_mq_submit_bio.

    Fixes: c62b37d96b6eb ("block: move ->make_request_fn to struct block_device_operations")
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     
  • Since commit 5a6c35f9af416 ("block: remove direct_make_request") there
    is no benefit to DM special-casing NVMe. Remove all code used to
    establish DM_TYPE_NVME_BIO_BASED.

    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

06 Oct, 2020

1 commit

  • bio_crypt_clone() assumes its gfp_mask argument always includes
    __GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed.

    However, bio_crypt_clone() might be called with GFP_ATOMIC via
    setup_clone() in drivers/md/dm-rq.c, or with GFP_NOWAIT via
    kcryptd_io_read() in drivers/md/dm-crypt.c.

    Neither case is currently reachable with a bio that actually has an
    encryption context. However, it's fragile to rely on this. Just make
    bio_crypt_clone() able to fail, analogous to bio_integrity_clone().

    Reported-by: Miaohe Lin
    Signed-off-by: Eric Biggers
    Reviewed-by: Mike Snitzer
    Reviewed-by: Satya Tangirala
    Cc: Satya Tangirala
    Signed-off-by: Jens Axboe

    Eric Biggers
     

02 Oct, 2020

2 commits


01 Oct, 2020

1 commit

  • If a DM device was suspended when bios were issued to it, those bios
    would be deferred using queue_io(). Once the DM device was resumed
    dm_process_bio() could be called by dm_wq_work() for original bio that
    still needs splitting. dm_process_bio()'s check for current->bio_list
    (meaning call chain is within ->submit_bio) as a prerequisite for
    calling blk_queue_split() for "abnormal IO" would result in
    dm_process_bio() never imposing corresponding queue_limits
    (e.g. discard_granularity, discard_max_bytes, etc).

    Fix this by always having dm_wq_work() resubmit deferred bios using
    submit_bio_noacct().

    Side-effect is blk_queue_split() is always called for "abnormal IO" from
    ->submit_bio, be it from application thread or dm_wq_work() workqueue,
    so proper bio splitting and depth-first bio submission is performed.
    For sake of clarity, remove current->bio_list check before call to
    blk_queue_split().

    Also, remove dm_wq_work()'s use of dm_{get,put}_live_table() -- no
    longer needed since IO will be reissued in terms of ->submit_bio.
    And rename bio variable from 'c' to 'bio'.

    Fixes: cf9c37865557 ("dm: fix comment in dm_process_bio()")
    Reported-by: Jeffle Xu
    Reviewed-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

30 Sep, 2020

8 commits


27 Sep, 2020

1 commit


25 Sep, 2020

1 commit


22 Sep, 2020

2 commits

  • Refer to the correct function (->submit_bio instead of ->queue_bio).
    Also, add details about why using blk_queue_split() isn't needed for
    dm_wq_work()'s call to dm_process_bio().

    Fixes: c62b37d96b6eb ("block: move ->make_request_fn to struct block_device_operations")
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     
  • dm_queue_split() is removed because __split_and_process_bio() _must_
    handle splitting bios to ensure proper bio submission and completion
    ordering as a bio is split.

    Otherwise, multiple recursive calls to ->submit_bio will cause multiple
    split bios to be allocated from the same ->bio_split mempool at the same
    time. This would result in deadlock in low memory conditions because no
    progress could be made (only one bio is available in ->bio_split
    mempool).

    This fix has been verified to still fix the loss of performance, due
    to excess splitting, that commit 120c9257f5f1 provided.

    Fixes: 120c9257f5f1 ("Revert "dm: always call blk_queue_split() in dm_process_bio()"")
    Cc: stable@vger.kernel.org # 5.0+, requires custom backport due to 5.9 changes
    Reported-by: Ming Lei
    Signed-off-by: Mike Snitzer

    Mike Snitzer
     

21 Sep, 2020

1 commit


20 Sep, 2020

1 commit

  • A recent fix to the dm_dax_supported() flow uncovered a latent bug. When
    dm_get_live_table() fails it is still required to drop the
    srcu_read_lock(). Without this change the lvm2 test-suite triggers this
    warning:

    # lvm2-testsuite --only pvmove-abort-all.sh

    WARNING: lock held when returning to user space!
    5.9.0-rc5+ #251 Tainted: G OE
    ------------------------------------------------
    lvm/1318 is leaving the kernel with locks still held!
    1 lock held by lvm/1318:
    #0: ffff9372abb5a340 (&md->io_barrier){....}-{0:0}, at: dm_get_live_table+0x5/0xb0 [dm_mod]

    ...and later on this hang signature:

    INFO: task lvm:1344 blocked for more than 122 seconds.
    Tainted: G OE 5.9.0-rc5+ #251
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:lvm state:D stack: 0 pid: 1344 ppid: 1 flags:0x00004000
    Call Trace:
    __schedule+0x45f/0xa80
    ? finish_task_switch+0x249/0x2c0
    ? wait_for_completion+0x86/0x110
    schedule+0x5f/0xd0
    schedule_timeout+0x212/0x2a0
    ? __schedule+0x467/0xa80
    ? wait_for_completion+0x86/0x110
    wait_for_completion+0xb0/0x110
    __synchronize_srcu+0xd1/0x160
    ? __bpf_trace_rcu_utilization+0x10/0x10
    __dm_suspend+0x6d/0x210 [dm_mod]
    dm_suspend+0xf6/0x140 [dm_mod]

    Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
    Cc:
    Cc: Jan Kara
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Reported-by: Adrian Huang
    Reviewed-by: Ira Weiny
    Tested-by: Adrian Huang
    Link: https://lore.kernel.org/r/160045867590.25663.7548541079217827340.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams

    Dan Williams
     

02 Sep, 2020

1 commit

  • Two different callers use two different mutexes for updating the
    block device size, which obviously doesn't help to actually protect
    against concurrent updates from the different callers. In addition
    one of the locks, bd_mutex is rather prone to deadlocks with other
    parts of the block stack that use it for high level synchronization.

    Switch to using a new spinlock protecting just the size updates, as
    that is all we need, and make sure everyone does the update through
    the proper helper.

    This fixes a bug reported with the nvme revalidating disks during a
    hot removal operation, which can currently deadlock on bd_mutex.

    Reported-by: Xianting Tian
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

01 Sep, 2020

1 commit


24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

08 Aug, 2020

2 commits

  • …kernel/git/sre/linux-power-supply") into android-mainline

    Merges along the way to 5.9-rc1

    resolves conflicts in:
    Documentation/ABI/testing/sysfs-class-power
    drivers/power/supply/power_supply_sysfs.c
    fs/crypto/inline_crypt.c

    Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
    Change-Id: Ia087834f54fb4e5269d68c3c404747ceed240701

    Greg Kroah-Hartman
     
  • …device-mapper/linux-dm

    Pull device mapper updates from Mike Snitzer:

    - DM multipath locking fixes around m->flags tests and improvements to
    bio-based code so that it follows patterns established by
    request-based code.

    - Request-based DM core improvement to eliminate unnecessary call to
    blk_mq_queue_stopped().

    - Add "panic_on_corruption" error handling mode to DM verity target.

    - DM bufio fix to to perform buffer cleanup from a workqueue rather
    than wait for IO in reclaim context from shrinker.

    - DM crypt improvement to optionally avoid async processing via
    workqueues for reads and/or writes -- via "no_read_workqueue" and
    "no_write_workqueue" features. This more direct IO processing
    improves latency and throughput with faster storage. Avoiding
    workqueue IO submission for writes (DM_CRYPT_NO_WRITE_WORKQUEUE) is a
    requirement for adding zoned block device support to DM crypt.

    - Add zoned block device support to DM crypt. Makes use of
    DM_CRYPT_NO_WRITE_WORKQUEUE and a new optional feature
    (DM_CRYPT_WRITE_INLINE) that allows write completion to wait for
    encryption to complete. This allows write ordering to be preserved,
    which is needed for zoned block devices.

    - Fix DM ebs target's check for REQ_OP_FLUSH.

    - Fix DM core's report zones support to not report more zones than were
    requested.

    - A few small compiler warning fixes.

    - DM dust improvements to return output directly to the user rather
    than require they scrape the system log for output.

    * tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
    dm: don't call report zones for more than the user requested
    dm ebs: Fix incorrect checking for REQ_OP_FLUSH
    dm init: Set file local variable static
    dm ioctl: Fix compilation warning
    dm raid: Remove empty if statement
    dm verity: Fix compilation warning
    dm crypt: Enable zoned block device support
    dm crypt: add flags to optionally bypass kcryptd workqueues
    dm bufio: do buffer cleanup from a workqueue
    dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue()
    dm dust: add interface to list all badblocks
    dm dust: report some message results directly back to user
    dm verity: add "panic_on_corruption" error handling mode
    dm mpath: use double checked locking in fast path
    dm mpath: rename current_pgpath to pgpath in multipath_prepare_ioctl
    dm mpath: rework __map_bio()
    dm mpath: factor out multipath_queue_bio
    dm mpath: push locking down to must_push_back_rq()
    dm mpath: take m->lock spinlock when testing QUEUE_IF_NO_PATH
    dm mpath: changes from initial m->flags locking audit

    Linus Torvalds
     

07 Aug, 2020

1 commit


05 Aug, 2020

1 commit

  • Don't call report zones for more zones than the user actually requested,
    otherwise this can lead to out-of-bounds accesses in the callback
    functions.

    Such a situation can happen if the target's ->report_zones() callback
    function returns 0 because we've reached the end of the target and then
    restart the report zones on the second target.

    We're again calling into ->report_zones() and ultimately into the user
    supplied callback function but when we're not subtracting the number of
    zones already processed this may lead to out-of-bounds accesses in the
    user callbacks.

    Signed-off-by: Johannes Thumshirn
    Reviewed-by: Damien Le Moal
    Fixes: d41003513e61 ("block: rework zone reporting")
    Cc: stable@vger.kernel.org # v5.5+
    Signed-off-by: Mike Snitzer

    Johannes Thumshirn
     

04 Aug, 2020

1 commit

  • Pull core block updates from Jens Axboe:
    "Good amount of cleanups and tech debt removals in here, and as a
    result, the diffstat shows a nice net reduction in code.

    - Softirq completion cleanups (Christoph)

    - Stop using ->queuedata (Christoph)

    - Cleanup bd claiming (Christoph)

    - Use check_events, moving away from the legacy media change
    (Christoph)

    - Use inode i_blkbits consistently (Christoph)

    - Remove old unused writeback congestion bits (Christoph)

    - Cleanup/unify submission path (Christoph)

    - Use bio_uninit consistently, instead of bio_disassociate_blkg
    (Christoph)

    - sbitmap cleared bits handling (John)

    - Request merging blktrace event addition (Jan)

    - sysfs add/remove race fixes (Luis)

    - blk-mq tag fixes/optimizations (Ming)

    - Duplicate words in comments (Randy)

    - Flush deferral cleanup (Yufen)

    - IO context locking/retry fixes (John)

    - struct_size() usage (Gustavo)

    - blk-iocost fixes (Chengming)

    - blk-cgroup IO stats fixes (Boris)

    - Various little fixes"

    * tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits)
    block: blk-timeout: delete duplicated word
    block: blk-mq-sched: delete duplicated word
    block: blk-mq: delete duplicated word
    block: genhd: delete duplicated words
    block: elevator: delete duplicated word and fix typos
    block: bio: delete duplicated words
    block: bfq-iosched: fix duplicated word
    iocost_monitor: start from the oldest usage index
    iocost: Fix check condition of iocg abs_vdebt
    block: Remove callback typedefs for blk_mq_ops
    block: Use non _rcu version of list functions for tag_set_list
    blk-cgroup: show global disk stats in root cgroup io.stat
    blk-cgroup: make iostat functions visible to stat printing
    block: improve discard bio alignment in __blkdev_issue_discard()
    block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
    block: defer flush request no matter whether we have elevator
    block: make blk_timeout_init() static
    block: remove retry loop in ioc_release_fn()
    block: remove unnecessary ioc nested locking
    block: integrate bd_start_claiming into __blkdev_get
    ...

    Linus Torvalds