17 Jan, 2021

1 commit

  • commit aebf5db917055b38f4945ed6d621d9f07a44ff30 upstream.

    Make sure that bdgrab() is done on the 'block_device' instance before
    referring to it for avoiding use-after-free.

    Cc:
    Reported-by: syzbot+825f0f9657d4e528046e@syzkaller.appspotmail.com
    Signed-off-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Ming Lei
     

13 Nov, 2020

1 commit


06 Oct, 2020

1 commit

  • All remaining callers of bdget() outside of fs/block_dev.c want to get a
    reference to the struct block_device for a given struct hd_struct. Add
    a helper just for that and then mark bdget static.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

25 Sep, 2020

1 commit


24 Sep, 2020

2 commits


10 Sep, 2020

1 commit


02 Sep, 2020

5 commits


01 Aug, 2020

1 commit


18 Jul, 2020

1 commit

  • In order to improve consistency and usability in cgroup stat accounting,
    we would like to support the root cgroup's io.stat.

    Since the root cgroup has processes doing io even if the system has no
    explicitly created cgroups, we need to be careful to avoid overhead in
    that case. For that reason, the rstat algorithms don't handle the root
    cgroup, so just turning the file on wouldn't give correct statistics.

    To get around this, we simulate flushing the iostat struct by filling it
    out directly from global disk stats. The result is a root cgroup io.stat
    file consistent with both /proc/diskstats and io.stat.

    Note that in order to collect the disk stats, we needed to iterate over
    devices. To facilitate that, we had to change the linkage of a disk_type
    to external so that it can be used from blk-cgroup.c to iterate over
    disks.

    Suggested-by: Tejun Heo
    Signed-off-by: Boris Burkov
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Boris Burkov
     

09 Jul, 2020

1 commit


24 Jun, 2020

3 commits

  • Commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") merged on
    v4.12 moved the work behind blk_release_queue() into a workqueue after a
    splat floated around which indicated some work on blk_release_queue()
    could sleep in blk_exit_rl(). This splat would be possible when a driver
    called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue()
    as its final call) from an atomic context.

    blk_put_queue() decrements the refcount for the request_queue kobject, and
    upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is
    now removed through commit db6d99523560 ("block: remove request_list code")
    on v5.0, we reserve the right to be able to sleep within
    blk_release_queue() context.

    The last reference for the request_queue must not be called from atomic
    context. *When* the last reference to the request_queue reaches 0 varies,
    and so let's take the opportunity to document when that is expected to
    happen and also document the context of the related calls as best as
    possible so we can avoid future issues, and with the hopes that the
    synchronous request_queue removal sticks.

    We revert back to synchronous request_queue removal because asynchronous
    removal creates a regression with expected userspace interaction with
    several drivers. An example is when removing the loopback driver, one
    uses ioctls from userspace to do so, but upon return and if successful,
    one expects the device to be removed. Likewise if one races to add another
    device the new one may not be added as it is still being removed. This was
    expected behavior before and it now fails as the device is still present
    and busy still. Moving to asynchronous request_queue removal could have
    broken many scripts which relied on the removal to have been completed if
    there was no error. Document this expectation as well so that this
    doesn't regress userspace again.

    Using asynchronous request_queue removal however has helped us find
    other bugs. In the future we can test what could break with this
    arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE.

    While at it, update the docs with the context expectations for the
    request_queue / gendisk refcount decrement, and make these
    expectations explicit by using might_sleep().

    Fixes: dc9edc44de6c ("block: Fix a blk_exit_rl() regression")
    Suggested-by: Nicolai Stange
    Signed-off-by: Luis Chamberlain
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Cc: Bart Van Assche
    Cc: Omar Sandoval
    Cc: Hannes Reinecke
    Cc: Nicolai Stange
    Cc: Greg Kroah-Hartman
    Cc: Michal Hocko
    Cc: yu kuai
    Signed-off-by: Jens Axboe

    Luis Chamberlain
     
  • Let us clarify the context under which the helpers to increment the
    refcount for the gendisk and request_queue can be called under. We
    make this explicit on the places where we may sleep with might_sleep().

    We don't address the decrement context yet, as that needs some extra
    work and fixes, but will be addressed in the next patch.

    Signed-off-by: Luis Chamberlain
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Luis Chamberlain
     
  • This adds documentation for the gendisk / request_queue refcount
    helpers.

    Signed-off-by: Luis Chamberlain
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Luis Chamberlain
     

27 May, 2020

2 commits


19 May, 2020

2 commits


13 May, 2020

3 commits

  • gendisk can't be gone when there is IO activity, so not hold
    part0's refcount in IO path.

    Signed-off-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Cc: Yufen Yu
    Cc: Christoph Hellwig
    Cc: Hou Tao
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • The seqcount of 'nr_sects_seq' is only needed in case of 32bit SMP,
    so define it just for 32bit SMP.

    Signed-off-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Cc: Yufen Yu
    Cc: Christoph Hellwig
    Cc: Hou Tao
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • delete_partition() clears the cached last_lookup partition. However the
    .last_lookup cache may be overwritten by one IO path after it is cleared
    from delete_partition(). Then another IO path may use the cached deleting
    partition after hd_struct_free() is called, then use-after-free is triggered
    on the cached partition.

    Fixes the issue by the following approach:

    1) always get the partition's refcount via hd_struct_try_get() before
    setting .last_lookup

    2) move clearing .last_lookup from delete_partition() to hd_struct_free()
    which is the release handle of the partition's percpu-refcount, so that no
    IO path can cache deleteing partition via .last_lookup.

    It is one candidate approach of Yufen's patch[1] which adds overhead
    in fast path by indirect lookup which may introduce one extra cacheline
    in IO path. Also this patch relies on percpu-refcount's protection, and
    it is easier to understand and verify.

    [1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#t

    Reported-by: Yufen Yu
    Signed-off-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Cc: Christoph Hellwig
    Cc: Hou Tao
    Signed-off-by: Jens Axboe

    Ming Lei
     

10 May, 2020

1 commit

  • Split out a new bdi_set_owner helper to set the owner, and move the policy
    for creating the bdi name back into genhd.c, where it belongs.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Reviewed-by: Greg Kroah-Hartman
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

21 Apr, 2020

3 commits


31 Mar, 2020

1 commit

  • Pull block updates from Jens Axboe:

    - Online capacity resizing (Balbir)

    - Number of hardware queue change fixes (Bart)

    - null_blk fault injection addition (Bart)

    - Cleanup of queue allocation, unifying the node/no-node API
    (Christoph)

    - Cleanup of genhd, moving code to where it makes sense (Christoph)

    - Cleanup of the partition handling code (Christoph)

    - disk stat fixes/improvements (Konstantin)

    - BFQ improvements (Paolo)

    - Various fixes and improvements

    * tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block: (72 commits)
    block: return NULL in blk_alloc_queue() on error
    block: move bio_map_* to blk-map.c
    Revert "blkdev: check for valid request queue before issuing flush"
    block: simplify queue allocation
    bcache: pass the make_request methods to blk_queue_make_request
    null_blk: use blk_mq_init_queue_data
    block: add a blk_mq_init_queue_data helper
    block: move the ->devnode callback to struct block_device_operations
    block: move the part_stat* helpers from genhd.h to a new header
    block: move block layer internals out of include/linux/genhd.h
    block: move guard_bio_eod to bio.c
    block: unexport get_gendisk
    block: unexport disk_map_sector_rcu
    block: unexport disk_get_part
    block: mark part_in_flight and part_in_flight_rw static
    block: mark block_depr static
    block: factor out requeue handling from dispatch code
    block/diskstats: replace time_in_queue with sum of request times
    block/diskstats: accumulate all per-cpu counters in one pass
    block/diskstats: more accurate approximation of io_ticks for slow disks
    ...

    Linus Torvalds
     

27 Mar, 2020

1 commit


25 Mar, 2020

7 commits


24 Mar, 2020

2 commits