Eric Lee / smarc-fsl-linux-kernel

16 Aug, 2020

1 commit

4b6c093e2 Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A few fixes on the block side of things:

- Discard granularity fix (Coly)

- rnbd cleanups (Guoqing)

- md error handling fix (Dan)

- md sysfs fix (Junxiao)

- Fix flush request accounting, which caused an IO slowdown for some
configurations (Ming)

- Properly propagate loop flag for partition scanning (Lennart)"

* tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block:
block: fix double account of flush request's driver tag
loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
rnbd: no need to set bi_end_io in rnbd_bio_map_kern
rnbd: remove rnbd_dev_submit_io
md-cluster: Fix potential error pointer dereference in resize_bitmaps()
block: check queue's limits.discard_granularity in __blkdev_issue_discard()
md: get sysfs entry after redundancy attr group create

Linus Torvalds
2020-08-16 11:36:42 +0800

12 Aug, 2020

1 commit

c1e2b8422 block: fix double account of flush request's driver tag ... Browse Code »

In case of none scheduler, we share data request's driver tag for
flush request, so have to mark the flush request as INFLIGHT for
avoiding double account of this driver tag.

Fixes: 568f27006577 ("blk-mq: centralise related handling into blk_mq_get_driver_tag")
Reported-by: Matthew Wilcox
Signed-off-by: Ming Lei
Tested-by: Matthew Wilcox
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe

Ming Lei
2020-08-12 03:53:32 +0800

11 Aug, 2020

1 commit

97d052ea3 Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking updates from Thomas Gleixner:
"A set of locking fixes and updates:

- Untangle the header spaghetti which causes build failures in
various situations caused by the lockdep additions to seqcount to
validate that the write side critical sections are non-preemptible.

- The seqcount associated lock debug addons which were blocked by the
above fallout.

seqcount writers contrary to seqlock writers must be externally
serialized, which usually happens via locking - except for strict
per CPU seqcounts. As the lock is not part of the seqcount, lockdep
cannot validate that the lock is held.

This new debug mechanism adds the concept of associated locks.
sequence count has now lock type variants and corresponding
initializers which take a pointer to the associated lock used for
writer serialization. If lockdep is enabled the pointer is stored
and write_seqcount_begin() has a lockdep assertion to validate that
the lock is held.

Aside of the type and the initializer no other code changes are
required at the seqcount usage sites. The rest of the seqcount API
is unchanged and determines the type at compile time with the help
of _Generic which is possible now that the minimal GCC version has
been moved up.

Adding this lockdep coverage unearthed a handful of seqcount bugs
which have been addressed already independent of this.

While generally useful this comes with a Trojan Horse twist: On RT
kernels the write side critical section can become preemtible if
the writers are serialized by an associated lock, which leads to
the well known reader preempts writer livelock. RT prevents this by
storing the associated lock pointer independent of lockdep in the
seqcount and changing the reader side to block on the lock when a
reader detects that a writer is in the write side critical section.

- Conversion of seqcount usage sites to associated types and
initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
locking/seqlock, headers: Untangle the spaghetti monster
locking, arch/ia64: Reduce header dependencies by moving XTP bits into the new header
x86/headers: Remove APIC headers from
seqcount: More consistent seqprop names
seqcount: Compress SEQCNT_LOCKNAME_ZERO()
seqlock: Fold seqcount_LOCKNAME_init() definition
seqlock: Fold seqcount_LOCKNAME_t definition
seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
hrtimer: Use sequence counter with associated raw spinlock
kvm/eventfd: Use sequence counter with associated spinlock
userfaultfd: Use sequence counter with associated spinlock
NFSv4: Use sequence counter with associated spinlock
iocost: Use sequence counter with associated spinlock
raid5: Use sequence counter with associated spinlock
vfs: Use sequence counter with associated spinlock
timekeeping: Use sequence counter with associated raw spinlock
xfrm: policy: Use sequence counters with associated lock
netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
netfilter: conntrack: Use sequence counter with associated spinlock
sched: tasks: Use sequence counter with associated spinlock
...

Linus Torvalds
2020-08-11 10:07:44 +0800

07 Aug, 2020

1 commit

dfdf16ecf Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi ... Browse Code »

Pull SCSI updates from James Bottomley:
"This consists of the usual driver updates (ufs, qla2xxx, tcmu, lpfc,
hpsa, zfcp, scsi_debug) and minor bug fixes.

We also have a huge docbook fix update like most other subsystems and
no major update to the core (the few non trivial updates are either
minor fixes or removing an unused feature [scsi_sdb_cache])"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (307 commits)
scsi: scsi_transport_srp: Sanitize scsi_target_block/unblock sequences
scsi: ufs-mediatek: Apply DELAY_AFTER_LPM quirk to Micron devices
scsi: ufs: Introduce device quirk "DELAY_AFTER_LPM"
scsi: virtio-scsi: Correctly handle the case where all LUNs are unplugged
scsi: scsi_debug: Implement tur_ms_to_ready parameter
scsi: scsi_debug: Fix request sense
scsi: lpfc: Fix typo in comment for ULP
scsi: ufs-mediatek: Prevent LPM operation on undeclared VCC
scsi: iscsi: Do not put host in iscsi_set_flashnode_param()
scsi: hpsa: Correct ctrl queue depth
scsi: target: tcmu: Make TMR notification optional
scsi: target: tcmu: Implement tmr_notify callback
scsi: target: tcmu: Fix and simplify timeout handling
scsi: target: tcmu: Factor out new helper ring_insert_padding
scsi: target: tcmu: Do not queue aborted commands
scsi: target: tcmu: Use priv pointer in se_cmd
scsi: target: Add tmr_notify backend function
scsi: target: Modify core_tmr_abort_task()
scsi: target: iscsi: Fix inconsistent debug message
scsi: target: iscsi: Fix login error when receiving
...

Linus Torvalds
2020-08-07 07:50:07 +0800

06 Aug, 2020

3 commits

b35fd7422 block: check queue's limits.discard_granularity in __blkdev_issue_discard() ... Browse Code »

If create a loop device with a backing NVMe SSD, current loop device
driver doesn't correctly set its queue's limits.discard_granularity and
leaves it as 0. If a discard request at LBA 0 on this loop device, in
__blkdev_issue_discard() the calculated req_sects will be 0, and a zero
length discard request will trigger a BUG() panic in generic block layer
code at block/blk-mq.c:563.

[ 955.565006][ C39] ------------[ cut here ]------------
[ 955.559660][ C39] invalid opcode: 0000 [#1] SMP NOPTI
[ 955.622171][ C39] CPU: 39 PID: 248 Comm: ksoftirqd/39 Tainted: G E 5.8.0-default+ #40
[ 955.622171][ C39] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE160M-2.70]- 07/17/2020
[ 955.622175][ C39] RIP: 0010:blk_mq_end_request+0x107/0x110
[ 955.622177][ C39] Code: 48 8b 03 e9 59 ff ff ff 48 89 df 5b 5d 41 5c e9 9f ed ff ff 48 8b 35 98 3c f4 00 48 83 c7 10 48 83 c6 19 e8 cb 56 c9 ff eb cb 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 54
[ 955.622179][ C39] RSP: 0018:ffffb1288701fe28 EFLAGS: 00010202
[ 955.749277][ C39] RAX: 0000000000000001 RBX: ffff956fffba5080 RCX: 0000000000004003
[ 955.749278][ C39] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000000
[ 955.749279][ C39] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 955.749279][ C39] R10: ffffb1288701fd28 R11: 0000000000000001 R12: ffffffffa8e05160
[ 955.749280][ C39] R13: 0000000000000004 R14: 0000000000000004 R15: ffffffffa7ad3a1e
[ 955.749281][ C39] FS: 0000000000000000(0000) GS:ffff95bfbda00000(0000) knlGS:0000000000000000
[ 955.749282][ C39] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 955.749282][ C39] CR2: 00007f6f0ef766a8 CR3: 0000005a37012002 CR4: 00000000007606e0
[ 955.749283][ C39] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 955.749284][ C39] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 955.749284][ C39] PKRU: 55555554
[ 955.749285][ C39] Call Trace:
[ 955.749290][ C39] blk_done_softirq+0x99/0xc0
[ 957.550669][ C39] __do_softirq+0xd3/0x45f
[ 957.550677][ C39] ? smpboot_thread_fn+0x2f/0x1e0
[ 957.550679][ C39] ? smpboot_thread_fn+0x74/0x1e0
[ 957.550680][ C39] ? smpboot_thread_fn+0x14e/0x1e0
[ 957.550684][ C39] run_ksoftirqd+0x30/0x60
[ 957.550687][ C39] smpboot_thread_fn+0x149/0x1e0
[ 957.886225][ C39] ? sort_range+0x20/0x20
[ 957.886226][ C39] kthread+0x137/0x160
[ 957.886228][ C39] ? kthread_park+0x90/0x90
[ 957.886231][ C39] ret_from_fork+0x22/0x30
[ 959.117120][ C39] ---[ end trace 3dacdac97e2ed164 ]---

This is the procedure to reproduce the panic,
# modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
# losetup -f /dev/nvme0n1 --direct-io=on
# blkdiscard /dev/loop0 -o 0 -l 0x200

This patch fixes the issue by checking q->limits.discard_granularity in
__blkdev_issue_discard() before composing the discard bio. If the value
is 0, then prints a warning oops information and returns -EOPNOTSUPP to
the caller to indicate that this buggy device driver doesn't support
discard request.

Fixes: 9b15d109a6b2 ("block: improve discard bio alignment in __blkdev_issue_discard()")
Fixes: c52abf563049 ("loop: Better discard support for block devices")
Reported-and-suggested-by: Ming Lei
Signed-off-by: Coly Li
Reviewed-by: Ming Lei
Reviewed-by: Hannes Reinecke
Reviewed-by: Jack Wang
Cc: Bart Van Assche
Cc: Christoph Hellwig
Cc: Darrick J. Wong
Cc: Enzo Matsumiya
Cc: Evan Green
Cc: Jens Axboe
Cc: Martin K. Petersen
Cc: Xiao Ni
Signed-off-by: Jens Axboe

Coly Li
2020-08-06 07:15:47 +0800
060a72a26 Merge tag 'for-5.9/block-merge-20200804' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block stacking updates from Jens Axboe:
"The stacking related fixes depended on both the core block and drivers
branches, so here's a topic branch with that change.

Outside of that, a late fix from Johannes for zone revalidation"

* tag 'for-5.9/block-merge-20200804' of git://git.kernel.dk/linux-block:
block: don't do revalidate zones on invalid devices
block: remove blk_queue_stack_limits
block: remove bdev_stack_limits
block: inherit the zoned characteristics in blk_stack_limits

Linus Torvalds
2020-08-06 02:12:34 +0800
e0fc99e21 Merge tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:

- NVMe:
- ZNS support (Aravind, Keith, Matias, Niklas)
- Misc cleanups, optimizations, fixes (Baolin, Chaitanya, David,
Dongli, Max, Sagi)

- null_blk zone capacity support (Aravind)

- MD:
- raid5/6 fixes (ChangSyun)
- Warning fixes (Damien)
- raid5 stripe fixes (Guoqing, Song, Yufen)
- sysfs deadlock fix (Junxiao)
- raid10 deadlock fix (Vitaly)

- struct_size conversions (Gustavo)

- Set of bcache updates/fixes (Coly)

* tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-block: (117 commits)
md/raid5: Allow degraded raid6 to do rmw
md/raid5: Fix Force reconstruct-write io stuck in degraded raid5
raid5: don't duplicate code for different paths in handle_stripe
raid5-cache: hold spinlock instead of mutex in r5c_journal_mode_show
md: print errno in super_written
md/raid5: remove the redundant setting of STRIPE_HANDLE
md: register new md sysfs file 'uuid' read-only
md: fix max sectors calculation for super 1.0
nvme-loop: remove extra variable in create ctrl
nvme-loop: set ctrl state connecting after init
nvme-multipath: do not fall back to __nvme_find_path() for non-optimized paths
nvme-multipath: fix logic for non-optimized paths
nvme-rdma: fix controller reset hang during traffic
nvme-tcp: fix controller reset hang during traffic
nvmet: introduce the passthru Kconfig option
nvmet: introduce the passthru configfs interface
nvmet: Add passthru enable/disable helpers
nvmet: add passthru code to process commands
nvme: export nvme_find_get_ns() and nvme_put_ns()
nvme: introduce nvme_ctrl_get_by_path()
...

Linus Torvalds
2020-08-06 01:51:40 +0800

05 Aug, 2020

1 commit

99ea1521a Merge tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull uninitialized_var() macro removal from Kees Cook:
"This is long overdue, and has hidden too many bugs over the years. The
series has several "by hand" fixes, and then a trivial treewide
replacement.

- Clean up non-trivial uses of uninitialized_var()

- Update documentation and checkpatch for uninitialized_var() removal

- Treewide removal of uninitialized_var()"

* tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
compiler: Remove uninitialized_var() macro
treewide: Remove uninitialized_var() usage
checkpatch: Remove awareness of uninitialized_var() macro
mm/debug_vm_pgtable: Remove uninitialized_var() usage
f2fs: Eliminate usage of uninitialized_var() macro
media: sur40: Remove uninitialized_var() usage
KVM: PPC: Book3S PR: Remove uninitialized_var() usage
clk: spear: Remove uninitialized_var() usage
clk: st: Remove uninitialized_var() usage
spi: davinci: Remove uninitialized_var() usage
ide: Remove uninitialized_var() usage
rtlwifi: rtl8192cu: Remove uninitialized_var() usage
b43: Remove uninitialized_var() usage
drbd: Remove uninitialized_var() usage
x86/mm/numa: Remove uninitialized_var() usage
docs: deprecated.rst: Add uninitialized_var()

Linus Torvalds
2020-08-05 04:49:43 +0800

04 Aug, 2020

2 commits

cdc8fcb49 Merge tag 'for-5.9/io_uring-20200802' of git://git.kernel.dk/linux-block ... Browse Code »

Pull io_uring updates from Jens Axboe:
"Lots of cleanups in here, hardening the code and/or making it easier
to read and fixing bugs, but a core feature/change too adding support
for real async buffered reads. With the latter in place, we just need
buffered write async support and we're done relying on kthreads for
the fast path. In detail:

- Cleanup how memory accounting is done on ring setup/free (Bijan)

- sq array offset calculation fixup (Dmitry)

- Consistently handle blocking off O_DIRECT submission path (me)

- Support proper async buffered reads, instead of relying on kthread
offload for that. This uses the page waitqueue to drive retries
from task_work, like we handle poll based retry. (me)

- IO completion optimizations (me)

- Fix race with accounting and ring fd install (me)

- Support EPOLLEXCLUSIVE (Jiufei)

- Get rid of the io_kiocb unionizing, made possible by shrinking
other bits (Pavel)

- Completion side cleanups (Pavel)

- Cleanup REQ_F_ flags handling, and kill off many of them (Pavel)

- Request environment grabbing cleanups (Pavel)

- File and socket read/write cleanups (Pavel)

- Improve kiocb_set_rw_flags() (Pavel)

- Tons of fixes and cleanups (Pavel)

- IORING_SQ_NEED_WAKEUP clear fix (Xiaoguang)"

* tag 'for-5.9/io_uring-20200802' of git://git.kernel.dk/linux-block: (127 commits)
io_uring: flip if handling after io_setup_async_rw
fs: optimise kiocb_set_rw_flags()
io_uring: don't touch 'ctx' after installing file descriptor
io_uring: get rid of atomic FAA for cq_timeouts
io_uring: consolidate *_check_overflow accounting
io_uring: fix stalled deferred requests
io_uring: fix racy overflow count reporting
io_uring: deduplicate __io_complete_rw()
io_uring: de-unionise io_kiocb
io-wq: update hash bits
io_uring: fix missing io_queue_linked_timeout()
io_uring: mark ->work uninitialised after cleanup
io_uring: deduplicate io_grab_files() calls
io_uring: don't do opcode prep twice
io_uring: clear IORING_SQ_NEED_WAKEUP after executing task works
io_uring: batch put_task_struct()
tasks: add put_task_struct_many()
io_uring: return locked and pinned page accounting
io_uring: don't miscount pinned memory
io_uring: don't open-code recv kbuf managment
...

Linus Torvalds
2020-08-04 04:01:22 +0800
382625d0d Merge tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:
"Good amount of cleanups and tech debt removals in here, and as a
result, the diffstat shows a nice net reduction in code.

- Softirq completion cleanups (Christoph)

- Stop using ->queuedata (Christoph)

- Cleanup bd claiming (Christoph)

- Use check_events, moving away from the legacy media change
(Christoph)

- Use inode i_blkbits consistently (Christoph)

- Remove old unused writeback congestion bits (Christoph)

- Cleanup/unify submission path (Christoph)

- Use bio_uninit consistently, instead of bio_disassociate_blkg
(Christoph)

- sbitmap cleared bits handling (John)

- Request merging blktrace event addition (Jan)

- sysfs add/remove race fixes (Luis)

- blk-mq tag fixes/optimizations (Ming)

- Duplicate words in comments (Randy)

- Flush deferral cleanup (Yufen)

- IO context locking/retry fixes (John)

- struct_size() usage (Gustavo)

- blk-iocost fixes (Chengming)

- blk-cgroup IO stats fixes (Boris)

- Various little fixes"

* tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits)
block: blk-timeout: delete duplicated word
block: blk-mq-sched: delete duplicated word
block: blk-mq: delete duplicated word
block: genhd: delete duplicated words
block: elevator: delete duplicated word and fix typos
block: bio: delete duplicated words
block: bfq-iosched: fix duplicated word
iocost_monitor: start from the oldest usage index
iocost: Fix check condition of iocg abs_vdebt
block: Remove callback typedefs for blk_mq_ops
block: Use non _rcu version of list functions for tag_set_list
blk-cgroup: show global disk stats in root cgroup io.stat
blk-cgroup: make iostat functions visible to stat printing
block: improve discard bio alignment in __blkdev_issue_discard()
block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
block: defer flush request no matter whether we have elevator
block: make blk_timeout_init() static
block: remove retry loop in ioc_release_fn()
block: remove unnecessary ioc nested locking
block: integrate bd_start_claiming into __blkdev_get
...

Linus Torvalds
2020-08-04 02:57:03 +0800

03 Aug, 2020

1 commit

1a1206dc4 block: don't do revalidate zones on invalid devices ... Browse Code »

When we loose a device for whatever reason while (re)scanning zones, we
trip over a NULL pointer in blk_revalidate_zone_cb, like in the following
log:

sd 0:0:0:0: [sda] 3418095616 4096-byte logical blocks: (14.0 TB/12.7 TiB)
sd 0:0:0:0: [sda] 52156 zones of 65536 logical blocks
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] REPORT ZONES start lba 1065287680 failed
sd 0:0:0:0: [sda] REPORT ZONES: Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda] Sense Key : 0xb [current]
sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x6
sda: failed to revalidate zones
sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
sda: detected capacity change from 14000519643136 to 0
==================================================================
BUG: KASAN: null-ptr-deref in blk_revalidate_zone_cb+0x1b7/0x550
Write of size 8 at addr 0000000000000010 by task kworker/u4:1/58

CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 5.8.0-rc1 #692
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
Workqueue: events_unbound async_run_entry_fn
Call Trace:
dump_stack+0x7d/0xb0
? blk_revalidate_zone_cb+0x1b7/0x550
kasan_report.cold+0x5/0x37
? blk_revalidate_zone_cb+0x1b7/0x550
check_memory_region+0x145/0x1a0
blk_revalidate_zone_cb+0x1b7/0x550
sd_zbc_parse_report+0x1f1/0x370
? blk_req_zone_write_trylock+0x200/0x200
? sectors_to_logical+0x60/0x60
? blk_req_zone_write_trylock+0x200/0x200
? blk_req_zone_write_trylock+0x200/0x200
sd_zbc_report_zones+0x3c4/0x5e0
? sd_dif_config_host+0x500/0x500
blk_revalidate_disk_zones+0x231/0x44d
? _raw_write_lock_irqsave+0xb0/0xb0
? blk_queue_free_zone_bitmaps+0xd0/0xd0
sd_zbc_read_zones+0x8cf/0x11a0
sd_revalidate_disk+0x305c/0x64e0
? __device_add_disk+0x776/0xf20
? read_capacity_16.part.0+0x1080/0x1080
? blk_alloc_devt+0x250/0x250
? create_object.isra.0+0x595/0xa20
? kasan_unpoison_shadow+0x33/0x40
sd_probe+0x8dc/0xcd2
really_probe+0x20e/0xaf0
__driver_attach_async_helper+0x249/0x2d0
async_run_entry_fn+0xbe/0x560
process_one_work+0x764/0x1290
? _raw_read_unlock_irqrestore+0x30/0x30
worker_thread+0x598/0x12f0
? __kthread_parkme+0xc6/0x1b0
? schedule+0xed/0x2c0
? process_one_work+0x1290/0x1290
kthread+0x36b/0x440
? kthread_create_worker_on_cpu+0xa0/0xa0
ret_from_fork+0x22/0x30
==================================================================

When the device is already gone we end up with the following scenario:
The device's capacity is 0 and thus the number of zones will be 0 as well. When
allocating the bitmap for the conventional zones, we then trip over a NULL
pointer.

So if we encounter a zoned block device with a 0 capacity, don't dare to
revalidate the zones sizes.

Fixes: 6c6b35491422 ("block: set the zone size in blk_revalidate_disk_zones atomically")
Signed-off-by: Johannes Thumshirn
Reviewed-by: Damien Le Moal
Signed-off-by: Jens Axboe

Johannes Thumshirn
2020-08-03 23:24:04 +0800

01 Aug, 2020

7 commits

d958e343b block: blk-timeout: delete duplicated word ... Browse Code »

Drop the repeated word "request".
Change to the correct kernel-doc notation for function name separtor.

Signed-off-by: Randy Dunlap
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe

Randy Dunlap
2020-08-01 06:29:47 +0800
c4aecaa25 block: blk-mq-sched: delete duplicated word ... Browse Code »

Drop the repeated word "to".

Signed-off-by: Randy Dunlap
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe

Randy Dunlap
2020-08-01 06:29:47 +0800
70f15a4fd block: blk-mq: delete duplicated word ... Browse Code »

Drop the repeated word "the".

Signed-off-by: Randy Dunlap
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe

Randy Dunlap
2020-08-01 06:29:47 +0800
0d20dcc27 block: genhd: delete duplicated words ... Browse Code »

Drop the repeated word "to" in multiple places.

Signed-off-by: Randy Dunlap
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe

Randy Dunlap
2020-08-01 06:29:47 +0800
5b8f65e1f block: elevator: delete duplicated word and fix typos ... Browse Code »

Drop the repeated word "the".
Fix typos of "features" and "specified".

Signed-off-by: Randy Dunlap
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe

Randy Dunlap
2020-08-01 06:29:47 +0800
3cf148891 block: bio: delete duplicated words ... Browse Code »

Drop the repeated words "a" and "the".

Signed-off-by: Randy Dunlap
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe

Randy Dunlap
2020-08-01 06:29:47 +0800
f06678af9 block: bfq-iosched: fix duplicated word ... Browse Code »

Change "at at" to "at a".

Signed-off-by: Randy Dunlap
Cc: Jens Axboe
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe

Randy Dunlap
2020-08-01 06:29:47 +0800

31 Jul, 2020

1 commit

d9012a59d iocost: Fix check condition of iocg abs_vdebt ... Browse Code »

We shouldn't skip iocg when its abs_vdebt is not zero.

Fixes: 0b80f9866e6b ("iocost: protect iocg->abs_vdebt with iocg->waitq.lock")
Signed-off-by: Chengming Zhou
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Chengming Zhou
2020-07-31 01:45:12 +0800

29 Jul, 2020

1 commit

67b7b641c iocost: Use sequence counter with associated spinlock ... Browse Code »

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Daniel Wagner
Link: https://lkml.kernel.org/r/20200720155530.1173732-21-a.darwish@linutronix.de

Ahmed S. Darwish
2020-07-29 22:14:28 +0800

28 Jul, 2020

1 commit

08c875cbf block: Use non _rcu version of list functions for tag_set_list ... Browse Code »

tag_set_list is only accessed under the tag_set_lock lock. There is
no need for using the _rcu list functions.

The _rcu list function were introduced to allow read access to the
tag_set_list protected under RCU, see 705cda97ee3a ("blk-mq: Make it
safe to use RCU to iterate over blk_mq_tag_set.tag_list") and
05b79413946d ("Revert "blk-mq: don't handle TAG_SHARED in restart"").
Those changes got reverted later but the cleanup commit missed a
couple of places to undo the changes.

Fixes: 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()"
Signed-off-by: Daniel Wagner
Reviewed-by: Hannes Reinecke
Cc: Ming Lei
Signed-off-by: Jens Axboe

Daniel Wagner
2020-07-28 23:37:09 +0800

25 Jul, 2020

1 commit

8f38f8e0a scsi: block: pm: Simplify resume handling ... Browse Code »

Commit 05d18ae1cc8a ("scsi: pm: Balance pm_only counter of request queue
during system resume") fixed a problem in the block layer's runtime-PM
code: blk_set_runtime_active() failed to call blk_clear_pm_only().
However, the commit's implementation was awkward; it forced the SCSI
system-resume handler to choose whether to call blk_post_runtime_resume()
or blk_set_runtime_active(), depending on whether or not the SCSI device
had previously been runtime suspended.

This patch simplifies the situation considerably by adding the missing
function call directly into blk_set_runtime_active() (under the condition
that the queue is not already in the RPM_ACTIVE state). This allows the
SCSI routine to revert back to its original form. Furthermore, making this
change reveals that blk_post_runtime_resume() (in its success pathway) does
exactly the same thing as blk_set_runtime_active(). The duplicate code is
easily removed by making one routine call the other.

No functional changes are intended.

Link: https://lore.kernel.org/r/20200706151436.GA702867@rowland.harvard.edu
CC: Can Guo
CC: Bart Van Assche
Reviewed-by: Bart Van Assche
Signed-off-by: Alan Stern
Signed-off-by: Martin K. Petersen

Alan Stern
2020-07-25 10:09:55 +0800

21 Jul, 2020

5 commits

b9b1a5d71 block: remove blk_queue_stack_limits ... Browse Code »

This function is just a tiny wrapper around blk_stack_limits. Open code
it int the two callers.

Reviewed-by: Johannes Thumshirn
Reviewed-by: Damien Le Moal
Tested-by: Damien Le Moal
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-21 05:38:52 +0800
9efa82ef2 block: remove bdev_stack_limits ... Browse Code »

This function is just a tiny wrapper around blk_stack_limit and has
two callers. Simplify the stack a bit by open coding it in the two
callers.

Reviewed-by: Johannes Thumshirn
Reviewed-by: Damien Le Moal
Tested-by: Damien Le Moal
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-21 05:38:52 +0800
3093a4797 block: inherit the zoned characteristics in blk_stack_limits ... Browse Code »

Lift the code from device mapper into blk_stack_limits to inherity
the stacking limitations. This ensures we do the right thing for
all stacked zoned block devices.

Reviewed-by: Johannes Thumshirn
Reviewed-by: Damien Le Moal
Tested-by: Damien Le Moal
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2020-07-21 05:38:52 +0800
4f43d6480 Merge branch 'for-5.9/drivers' into for-5.9/block-merge ... Browse Code »

* for-5.9/drivers: (38 commits)
block: add max_active_zones to blk-sysfs
block: add max_open_zones to blk-sysfs
s390/dasd: Use struct_size() helper
s390/dasd: fix inability to use DASD with DIAG driver
md-cluster: fix wild pointer of unlock_all_bitmaps()
md/raid5-cache: clear MD_SB_CHANGE_PENDING before flushing stripes
md: fix deadlock causing by sysfs_notify
md: improve io stats accounting
md: raid0/linear: fix dereference before null check on pointer mddev
rsxx: switch from 'pci_free_consistent()' to 'dma_free_coherent()'
nvme: remove ns->disk checks
nvme-pci: use standard block status symbolic names
nvme-pci: use the consistent return type of nvme_pci_iod_alloc_size()
nvme-pci: add a blank line after declarations
nvme-pci: fix some comments issues
nvme-pci: remove redundant segment validation
nvme: document quirked Intel models
nvme: expose reconnect_delay and ctrl_loss_tmo via sysfs
nvme: support for zoned namespaces
nvme: support for multiple Command Sets Supported and Effects log pages
...

Jens Axboe
2020-07-21 05:38:27 +0800
9caaa66c9 Merge branch 'for-5.9/block' into for-5.9/block-merge ... Browse Code »

* for-5.9/block: (124 commits)
blk-cgroup: show global disk stats in root cgroup io.stat
blk-cgroup: make iostat functions visible to stat printing
block: improve discard bio alignment in __blkdev_issue_discard()
block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
block: defer flush request no matter whether we have elevator
block: make blk_timeout_init() static
block: remove retry loop in ioc_release_fn()
block: remove unnecessary ioc nested locking
block: integrate bd_start_claiming into __blkdev_get
block: use bd_prepare_to_claim directly in the loop driver
block: refactor bd_start_claiming
block: simplify the restart case in __blkdev_get
Revert "blk-rq-qos: remove redundant finish_wait to rq_qos_wait."
block: always remove partitions from blk_drop_partitions()
block: relax jiffies rounding for timeouts
blk-mq: remove redundant validation in __blk_mq_end_request()
blk-mq: Remove unnecessary local variable
writeback: remove bdi->congested_fn
writeback: remove struct bdi_writeback_congested
writeback: remove {set,clear}_wb_congested
...

Jens Axboe
2020-07-21 05:38:23 +0800

18 Jul, 2020

2 commits

ef45fe470 blk-cgroup: show global disk stats in root cgroup io.stat ... Browse Code »

In order to improve consistency and usability in cgroup stat accounting,
we would like to support the root cgroup's io.stat.

Since the root cgroup has processes doing io even if the system has no
explicitly created cgroups, we need to be careful to avoid overhead in
that case. For that reason, the rstat algorithms don't handle the root
cgroup, so just turning the file on wouldn't give correct statistics.

To get around this, we simulate flushing the iostat struct by filling it
out directly from global disk stats. The result is a root cgroup io.stat
file consistent with both /proc/diskstats and io.stat.

Note that in order to collect the disk stats, we needed to iterate over
devices. To facilitate that, we had to change the linkage of a disk_type
to external so that it can be used from blk-cgroup.c to iterate over
disks.

Suggested-by: Tejun Heo
Signed-off-by: Boris Burkov
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Boris Burkov
2020-07-18 10:18:00 +0800
cd1fc4b98 blk-cgroup: make iostat functions visible to stat printing ... Browse Code »

Previously, the code which printed io.stat only needed access to the
generic rstat flushing code, but since we plan to write some more
specific code for preparing root cgroup stats, we need to manipulate
iostat structs directly. Since declaring static functions ahead does not
seem like common practice in this file, simply move the iostat functions
up. We only plan to use blkg_iostat_set, but it seems better to keep them
all together.

Signed-off-by: Boris Burkov
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Boris Burkov
2020-07-18 10:17:59 +0800

17 Jul, 2020

6 commits

9b15d109a block: improve discard bio alignment in __blkdev_issue_discard() ... Browse Code »

This patch improves discard bio split for address and size alignment in
__blkdev_issue_discard(). The aligned discard bio may help underlying
device controller to perform better discard and internal garbage
collection, and avoid unnecessary internal fragment.

Current discard bio split algorithm in __blkdev_issue_discard() may have
non-discarded fregment on device even the discard bio LBA and size are
both aligned to device's discard granularity size.

Here is the example steps on how to reproduce the above problem.
- On a VMWare ESXi 6.5 update3 installation, create a 51GB virtual disk
with thin mode and give it to a Linux virtual machine.
- Inside the Linux virtual machine, if the 50GB virtual disk shows up as
/dev/sdb, fill data into the first 50GB by,
# dd if=/dev/zero of=/dev/sdb bs=4096 count=13107200
- Discard the 50GB range from offset 0 on /dev/sdb,
# blkdiscard /dev/sdb -o 0 -l 53687091200
- Observe the underlying mapping status of the device
# sg_get_lba_status /dev/sdb -m 1048 --lba=0
descriptor LBA: 0x0000000000000000 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000000000800 blocks: 16773120 deallocated
descriptor LBA: 0x0000000000fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000001000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000017ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000001800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000001fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000002000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000027ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000002800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000002fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000003000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000037ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000003800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000003fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000004000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000047ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000004800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000004fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000005000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000057ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000005800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000005fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000006000000 blocks: 6291456 deallocated
descriptor LBA: 0x0000000006600000 blocks: 0 deallocated

Although the discard bio starts at LBA 0 and has 50<<< 9) > UINT_MAX);
62
63 bio = blk_next_bio(bio, 0, gfp_mask);
64 bio->bi_iter.bi_sector = sector;
65 bio_set_dev(bio, bdev);
66 bio_set_op_attrs(bio, op, 0);
67
68 bio->bi_iter.bi_size = req_sects << 9;
69 sector += req_sects;
70 nr_sects -= req_sects;
[snipped]
79 }
80
81 *biop = bio;
82 return 0;
83 }
84 EXPORT_SYMBOL(__blkdev_issue_discard);

At line 58-59, to discard a 50GB range, req_sects is set as return value
of bio_allowed_max_sectors(q), which is 8388607 sectors. In the above
case, the discard granularity is 2048 sectors, although the start LBA
and discard length are aligned to discard granularity, req_sects never
has chance to be aligned to discard granularity. This is why there are
some still-mapped 2048 sectors fragment in every 4 or 8 GB range.

If req_sects at line 58 is set to a value aligned to discard_granularity
and close to UNIT_MAX, then all consequent split bios inside device
driver are (almostly) aligned to discard_granularity of the device
queue. The 2048 sectors still-mapped fragment will disappear.

This patch introduces bio_aligned_discard_max_sectors() to return the
the value which is aligned to q->limits.discard_granularity and closest
to UINT_MAX. Then this patch replaces bio_allowed_max_sectors() with
this new routine to decide a more proper split bio length.

But we still need to handle the situation when discard start LBA is not
aligned to q->limits.discard_granularity, otherwise even the length is
aligned, current code may still leave 2048 fragment around every 4GB
range. Therefore, to calculate req_sects, firstly the start LBA of
discard range is checked (including partition offset), if it is not
aligned to discard granularity, the first split location should make
sure following bio has bi_sector aligned to discard granularity. Then
there won't be still-mapped fragment in the middle of the discard range.

The above is how this patch improves discard bio alignment in
__blkdev_issue_discard(). Now with this patch, after discard with same
command line mentiond previously, sg_get_lba_status returns,
descriptor LBA: 0x0000000000000000 blocks: 106954752 deallocated
descriptor LBA: 0x0000000006600000 blocks: 0 deallocated

We an see there is no 2048 sectors segment anymore, everything is clean.

Reported-and-tested-by: Acshai Manoj
Signed-off-by: Coly Li
Reviewed-by: Hannes Reinecke
Reviewed-by: Ming Lei
Reviewed-by: Xiao Ni
Cc: Bart Van Assche
Cc: Christoph Hellwig
Cc: Enzo Matsumiya
Cc: Jens Axboe
Signed-off-by: Jens Axboe

Coly Li
2020-07-17 21:15:10 +0800
b5718d6c0 block: defer flush request no matter whether we have elevator ... Browse Code »

Commit 7520872c0cf4 ("block: don't defer flushes on blk-mq + scheduling")
tried to fix deadlock for cycled wait between flush requests and data
request into flush_data_in_flight. The former holded all driver tags
and wait for data request completion, but the latter can not complete
for waiting free driver tags.

After commit 923218f6166a ("blk-mq: don't allocate driver tag upfront
for flush rq"), flush requests will not get driver tag before queuing
into flush queue.

* With elevator, flush request just get sched_tags before inserting
flush queue. It will not get driver tag until issue them to driver.
data request on list fq->flush_data_in_flight will complete in
the end.

* Without elevator, each flush request will get a driver tag when
allocate request. Then data request on fq->flush_data_in_flight
don't worry about lacking driver tag.

In both of these cases, cycled wait cannot be true. So we may allow
to defer flush request.

Signed-off-by: Yufen Yu
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe

Yufen Yu
2020-07-17 21:14:28 +0800
943c4d907 block: make blk_timeout_init() static ... Browse Code »

The sparse tool complains as follows:

block/blk-timeout.c:93:12: warning:
symbol 'blk_timeout_init' was not declared. Should it be static?

Function blk_timeout_init() is not used outside of blk-timeout.c, so
mark it static.

Fixes: 9054650fac24 ("block: relax jiffies rounding for timeouts")
Reported-by: Hulk Robot
Signed-off-by: Wei Yongjun
Signed-off-by: Jens Axboe

Wei Yongjun
2020-07-17 21:13:42 +0800
3f649ab72 treewide: Remove uninitialized_var() usage ... Browse Code »

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var$([^$]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe # IB
Acked-by: Kalle Valo # wireless drivers
Reviewed-by: Chao Yu # erofs
Signed-off-by: Kees Cook

Kees Cook
2020-07-17 03:35:15 +0800
ab96bbab4 block: remove retry loop in ioc_release_fn() ... Browse Code »

The reverse-order double lock dance in ioc_release_fn() is using a
retry loop. This is a problem on PREEMPT_RT because it could preempt
the task that would release q->queue_lock and thus live lock in the
retry loop.

RCU is already managing the freeing of the request queue and icq. If
the trylock fails, use RCU to guarantee that the request queue and
icq are not freed and re-acquire the locks in the correct order,
allowing forward progress.

Signed-off-by: John Ogness
Reviewed-by: Daniel Wagner
Signed-off-by: Jens Axboe

John Ogness
2020-07-17 00:22:15 +0800
a43f085f8 block: remove unnecessary ioc nested locking ... Browse Code »

The legacy CFQ IO scheduler could call put_io_context() in its exit_icq()
elevator callback. This led to a lockdep warning, which was fixed in
commit d8c66c5d5924 ("block: fix lockdep warning on io_context release
put_io_context()") by using a nested subclass for the ioc spinlock.
However, with commit f382fb0bcef4 ("block: remove legacy IO schedulers")
the CFQ IO scheduler no longer exists.

The BFQ IO scheduler also implements the exit_icq() elevator callback but
does not call put_io_context().

The nested subclass for the ioc spinlock is no longer needed. Since it
existed as an exception and no longer applies, remove the nested subclass
usage.

Signed-off-by: John Ogness
Reviewed-by: Daniel Wagner
Signed-off-by: Jens Axboe

John Ogness
2020-07-17 00:22:15 +0800

16 Jul, 2020

2 commits

659bf827b block: add max_active_zones to blk-sysfs ... Browse Code »

Add a new max_active zones definition in the sysfs documentation.
This definition will be common for all devices utilizing the zoned block
device support in the kernel.

Export max_active_zones according to this new definition for NVMe Zoned
Namespace devices, ZAC ATA devices (which are treated as SCSI devices by
the kernel), and ZBC SCSI devices.

Add the new max_active_zones member to struct request_queue, rather
than as a queue limit, since this property cannot be split across stacking
drivers.

For SCSI devices, even though max active zones is not part of the ZBC/ZAC
spec, export max_active_zones as 0, signifying "no limit".

Signed-off-by: Niklas Cassel
Reviewed-by: Javier González
Reviewed-by: Damien Le Moal
Reviewed-by: Johannes Thumshirn
Reviewed-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Niklas Cassel
2020-07-16 04:26:11 +0800
e15864f8e block: add max_open_zones to blk-sysfs ... Browse Code »

Add a new max_open_zones definition in the sysfs documentation.
This definition will be common for all devices utilizing the zoned block
device support in the kernel.

Export max open zones according to this new definition for NVMe Zoned
Namespace devices, ZAC ATA devices (which are treated as SCSI devices by
the kernel), and ZBC SCSI devices.

Add the new max_open_zones member to struct request_queue, rather
than as a queue limit, since this property cannot be split across stacking
drivers.

Signed-off-by: Niklas Cassel
Reviewed-by: Javier González
Reviewed-by: Damien Le Moal
Reviewed-by: Johannes Thumshirn
Reviewed-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Niklas Cassel
2020-07-16 04:26:11 +0800

15 Jul, 2020

3 commits

e791ee688 Revert "blk-rq-qos: remove redundant finish_wait to rq_qos_wait." ... Browse Code »

This reverts commit 826f2f48da8c331ac51e1381998d318012d66550.

Qian Cai reports that this commit causes stalls with swap. Revert until
the reason can be figured out.

Reported-by: Qian Cai
Signed-off-by: Jens Axboe

Jens Axboe
2020-07-15 23:33:37 +0800
d0f0f1b4c block: always remove partitions from blk_drop_partitions() ... Browse Code »

In theory, when GENHD_FL_NO_PART_SCAN is set, no partitions can be created
on one disk. However, ioctl(BLKPG, BLKPG_ADD_PARTITION) doesn't check
GENHD_FL_NO_PART_SCAN, so partitions still can be added even though
GENHD_FL_NO_PART_SCAN is set.

So far blk_drop_partitions() only removes partitions when disk_part_scan_enabled()
return true. This way can make ghost partition on loop device after changing/clearing
FD in case that PARTSCAN is disabled, such as partitions can be added
via 'parted' on loop disk even though GENHD_FL_NO_PART_SCAN is set.

Fix this issue by always removing partitions in blk_drop_partitions(), and
this way is correct because the current code supposes that no partitions
can be added in case of GENHD_FL_NO_PART_SCAN.

Signed-off-by: Ming Lei
Reviewed-by: Christoph Hellwig
Cc: Christoph Hellwig
Signed-off-by: Jens Axboe

Ming Lei
2020-07-15 23:23:42 +0800
9054650fa block: relax jiffies rounding for timeouts ... Browse Code »

In doing high IOPS testing, blk-mq is generally pretty well optimized.
There are a few things that stuck out as using more CPU than what is
really warranted, and one thing is the round_jiffies_up() that we do
twice for each request. That accounts for about 0.8% of the CPU in
my testing.

We can make this cheaper by avoiding an integer division, by just adding
a rough HZ mask that we can AND with instead. The timeouts are only on a
second granularity already, we don't have to be that accurate here and
this patch barely changes that. All we care about is nice grouping.

Signed-off-by: Jens Axboe

Jens Axboe
2020-07-15 23:23:35 +0800