27 Sep, 2019
1 commit
-
We have updated limits after calling wbt_set_min_lat(). No need to
update again.Reviewed-by: Bob Liu
Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe
26 Sep, 2019
1 commit
-
cecf5d87ff20 ("block: split .sysfs_lock into two locks") starts to
release & acquire sysfs_lock before registering/un-registering elevator
queue during switching elevator for avoiding potential deadlock from
showing & storing 'queue/iosched' attributes and removing elevator's
kobject.Turns out there isn't such deadlock because 'q->sysfs_lock' isn't
required in .show & .store of queue/iosched's attributes, and just
elevator's sysfs lock is acquired in elv_iosched_store() and
elv_iosched_show(). So it is safe to hold queue's sysfs lock when
registering/un-registering elevator queue.The biggest issue is that commit cecf5d87ff20 assumes that concurrent
write on 'queue/scheduler' can't happen. However, this assumption isn't
true, because kernfs_fop_write() only guarantees that concurrent write
aren't called on the same open file, but the write could be from
different open on the file. So we can't release & re-acquire queue's
sysfs lock during switching elevator, otherwise use-after-free on
elevator could be triggered.Fixes the issue by not releasing queue's sysfs lock during switching
elevator.Fixes: cecf5d87ff20 ("block: split .sysfs_lock into two locks")
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Greg KH
Cc: Mike Snitzer
Reviewed-by: Bart Van Assche
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
18 Sep, 2019
1 commit
-
Pull block updates from Jens Axboe:
- Two NVMe pull requests:
- ana log parse fix from Anton
- nvme quirks support for Apple devices from Ben
- fix missing bio completion tracing for multipath stack devices
from Hannes and Mikhail
- IP TOS settings for nvme rdma and tcp transports from Israel
- rq_dma_dir cleanups from Israel
- tracing for Get LBA Status command from Minwoo
- Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
- Some consolidation between the fabrics transports for handling
the CAP register
- reset race with ns scanning fix for fabrics (move fabrics
commands to a dedicated request queue with a different lifetime
from the admin request queue)."
- controller reset and namespace scan races fixes
- nvme discovery log change uevent support
- naming improvements from Keith
- multiple discovery controllers reject fix from James
- some regular cleanups from various people- Series fixing (and re-fixing) null_blk debug printing and nr_devices
checks (André)- A few pull requests from Song, with fixes from Andy, Guoqing,
Guilherme, Neil, Nigel, and Yufen.- REQ_OP_ZONE_RESET_ALL support (Chaitanya)
- Bio merge handling unification (Christoph)
- Pick default elevator correctly for devices with special needs
(Damien)- Block stats fixes (Hou)
- Timeout and support devices nbd fixes (Mike)
- Series fixing races around elevator switching and device add/remove
(Ming)- sed-opal cleanups (Revanth)
- Per device weight support for BFQ (Fam)
- Support for blk-iocost, a new model that can properly account cost of
IO workloads. (Tejun)- blk-cgroup writeback fixes (Tejun)
- paride queue init fixes (zhengbin)
- blk_set_runtime_active() cleanup (Stanley)
- Block segment mapping optimizations (Bart)
- lightnvm fixes (Hans/Minwoo/YueHaibing)
- Various little fixes and cleanups
* tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
null_blk: format pr_* logs with pr_fmt
null_blk: match the type of parameter nr_devices
null_blk: do not fail the module load with zero devices
block: also check RQF_STATS in blk_mq_need_time_stamp()
block: make rq sector size accessible for block stats
bfq: Fix bfq linkage error
raid5: use bio_end_sector in r5_next_bio
raid5: remove STRIPE_OPS_REQ_PENDING
md: add feature flag MD_FEATURE_RAID0_LAYOUT
md/raid0: avoid RAID0 data corruption due to layout confusion.
raid5: don't set STRIPE_HANDLE to stripe which is in batch list
raid5: don't increment read_errors on EILSEQ return
nvmet: fix a wrong error status returned in error log page
nvme: send discovery log page change events to userspace
nvme: add uevent variables for controller devices
nvme: enable aen regardless of the presence of I/O queues
nvme-fabrics: allow discovery subsystems accept a kato
nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
nvme: Remove redundant assignment of cq vector
nvme: Assign subsys instance from first ctrl
...
12 Sep, 2019
1 commit
-
cecf5d87ff20 ("block: split .sysfs_lock into two locks") starts to
release & actuire sysfs_lock again during switching elevator. So it
isn't enough to prevent switching elevator from happening by simply
clearing QUEUE_FLAG_REGISTERED with holding sysfs_lock, because
in-progress switch still can move on after re-acquiring the lock,
meantime the flag of QUEUE_FLAG_REGISTERED won't get checked.Fixes this issue by checking 'q->elevator' directly & locklessly after
q->kobj is removed in blk_unregister_queue(), this way is safe because
q->elevator can't be changed at that time.Fixes: cecf5d87ff20 ("block: split .sysfs_lock into two locks")
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Greg KH
Cc: Mike Snitzer
Cc: Bart Van Assche
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
28 Aug, 2019
2 commits
-
The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
required.However, when mq & iosched kobjects are removed via
blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held
too. This way causes AB-BA lock because the kernfs built-in lock of
'kn-count' is required inside kobject_del() too, see the lockdep warning[1].On the other hand, it isn't necessary to acquire q->sysfs_lock for
both blk_mq_unregister_dev() & elv_unregister_queue() because
clearing REGISTERED flag prevents storing to 'queue/scheduler'
from being happened. Also sysfs write(store) is exclusive, so no
necessary to hold the lock for elv_unregister_queue() when it is
called in switching elevator path.So split .sysfs_lock into two: one is still named as .sysfs_lock for
covering sync .store, the other one is named as .sysfs_dir_lock
for covering kobjects and related status change.sysfs itself can handle the race between add/remove kobjects and
showing/storing attributes under kobjects. For switching scheduler
via storing to 'queue/scheduler', we use the queue flag of
QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then
we can avoid to hold .sysfs_lock during removing/adding kobjects.[1] lockdep warning
======================================================
WARNING: possible circular locking dependency detected
5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted
------------------------------------------------------
rmmod/777 is trying to acquire lock:
00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72but task is already holding lock:
00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10bwhich lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&q->sysfs_lock){+.+.}:
__lock_acquire+0x95f/0xa2f
lock_acquire+0x1b4/0x1e8
__mutex_lock+0x14a/0xa9b
blk_mq_hw_sysfs_show+0x63/0xb6
sysfs_kf_seq_show+0x11f/0x196
seq_read+0x2cd/0x5f2
vfs_read+0xc7/0x18c
ksys_read+0xc4/0x13e
do_syscall_64+0xa7/0x295
entry_SYSCALL_64_after_hwframe+0x49/0xbe-> #0 (kn->count#202){++++}:
check_prev_add+0x5d2/0xc45
validate_chain+0xed3/0xf94
__lock_acquire+0x95f/0xa2f
lock_acquire+0x1b4/0x1e8
__kernfs_remove+0x237/0x40b
kernfs_remove_by_name_ns+0x59/0x72
remove_files+0x61/0x96
sysfs_remove_group+0x81/0xa4
sysfs_remove_groups+0x3b/0x44
kobject_del+0x44/0x94
blk_mq_unregister_dev+0x83/0xdd
blk_unregister_queue+0xa0/0x10b
del_gendisk+0x259/0x3fa
null_del_dev+0x8b/0x1c3 [null_blk]
null_exit+0x5c/0x95 [null_blk]
__se_sys_delete_module+0x204/0x337
do_syscall_64+0xa7/0x295
entry_SYSCALL_64_after_hwframe+0x49/0xbeother info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&q->sysfs_lock);
lock(kn->count#202);
lock(&q->sysfs_lock);
lock(kn->count#202);*** DEADLOCK ***
2 locks held by rmmod/777:
#0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk]
#1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10bstack backtrace:
CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4
Call Trace:
dump_stack+0x9a/0xe6
check_noncircular+0x207/0x251
? print_circular_bug+0x32a/0x32a
? find_usage_backwards+0x84/0xb0
check_prev_add+0x5d2/0xc45
validate_chain+0xed3/0xf94
? check_prev_add+0xc45/0xc45
? mark_lock+0x11b/0x804
? check_usage_forwards+0x1ca/0x1ca
__lock_acquire+0x95f/0xa2f
lock_acquire+0x1b4/0x1e8
? kernfs_remove_by_name_ns+0x59/0x72
__kernfs_remove+0x237/0x40b
? kernfs_remove_by_name_ns+0x59/0x72
? kernfs_next_descendant_post+0x7d/0x7d
? strlen+0x10/0x23
? strcmp+0x22/0x44
kernfs_remove_by_name_ns+0x59/0x72
remove_files+0x61/0x96
sysfs_remove_group+0x81/0xa4
sysfs_remove_groups+0x3b/0x44
kobject_del+0x44/0x94
blk_mq_unregister_dev+0x83/0xdd
blk_unregister_queue+0xa0/0x10b
del_gendisk+0x259/0x3fa
? disk_events_poll_msecs_store+0x12b/0x12b
? check_flags+0x1ea/0x204
? mark_held_locks+0x1f/0x7a
null_del_dev+0x8b/0x1c3 [null_blk]
null_exit+0x5c/0x95 [null_blk]
__se_sys_delete_module+0x204/0x337
? free_module+0x39f/0x39f
? blkcg_maybe_throttle_current+0x8a/0x718
? rwlock_bug+0x62/0x62
? __blkcg_punt_bio_submit+0xd0/0xd0
? trace_hardirqs_on_thunk+0x1a/0x20
? mark_held_locks+0x1f/0x7a
? do_syscall_64+0x4c/0x295
do_syscall_64+0xa7/0x295
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fb696cdbe6b
Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008
RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b
RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828
RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000
R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0
R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Greg KH
Cc: Mike Snitzer
Reviewed-by: Bart Van Assche
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe -
There are 4 users which check if queue is registered, so add one helper
to check it.Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Greg KH
Cc: Mike Snitzer
Cc: Bart Van Assche
Reviewed-by: Bart Van Assche
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
12 Aug, 2019
1 commit
-
blk_exit_queue will free elevator_data, while blk_mq_requeue_work
will access it. Move cancel of requeue_work to the front of
blk_exit_queue to avoid use-after-free.blk_exit_queue blk_mq_requeue_work
__elevator_exit blk_mq_run_hw_queues
blk_mq_exit_sched blk_mq_run_hw_queue
dd_exit_queue blk_mq_hctx_has_pending
kfree(elevator_data) blk_mq_sched_has_work
dd_has_workFixes: fbc2a15e3433 ("blk-mq: move cancel of requeue_work into blk_mq_release")
Cc: stable@vger.kernel.org
Reviewed-by: Ming Lei
Signed-off-by: zhengbin
Signed-off-by: Jens Axboe
07 Jun, 2019
1 commit
-
In theory, IO scheduler belongs to request queue, and the request pool
of sched tags belongs to the request queue too.However, the current tags allocation interfaces are re-used for both
driver tags and sched tags, and driver tags is definitely host wide,
and doesn't belong to any request queue, same with its request pool.
So we need tagset instance for freeing request of sched tags.Meantime, blk_mq_free_tag_set() often follows blk_cleanup_queue() in case
of non-BLK_MQ_F_TAG_SHARED, this way requires that request pool of sched
tags to be freed before calling blk_mq_free_tag_set().Commit 47cdee29ef9d94e ("block: move blk_exit_queue into __blk_release_queue")
moves blk_exit_queue into __blk_release_queue for simplying the fast
path in generic_make_request(), then causes oops during freeing requests
of sched tags in __blk_release_queue().Fix the above issue by move freeing request pool of sched tags into
blk_cleanup_queue(), this way is safe becasue queue has been frozen and no any
in-queue requests at that time. Freeing sched tags has to be kept in queue's
release handler becasue there might be un-completed dispatch activity
which might refer to sched tags.Cc: Bart Van Assche
Cc: Christoph Hellwig
Fixes: 47cdee29ef9d94e485eb08f962c74943023a5271 ("block: move blk_exit_queue into __blk_release_queue")
Tested-by: Yi Zhang
Reported-by: kernel test robot
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
29 May, 2019
1 commit
-
Commit 498f6650aec8 ("block: Fix a race between the cgroup code and
request queue initialization") moves what blk_exit_queue does into
blk_cleanup_queue() for fixing issue caused by changing back
queue lock.However, after legacy request IO path is killed, driver queue lock
won't be used at all, and there isn't story for changing back
queue lock. Then the issue addressed by Commit 498f6650aec8 doesn't
exist any more.So move move blk_exit_queue into __blk_release_queue.
This patch basically reverts the following two commits:
498f6650aec8 block: Fix a race between the cgroup code and request queue initialization
24ecc3585348 block: Ensure that a request queue is dissociated from the cgroup controllerCc: Bart Van Assche
Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
22 Apr, 2019
1 commit
-
If the low level driver has no timeout handler, the
/sys/block//queue/io_timeout will not be displayed.Reviewed-by: Bart Van Assche
Signed-off-by: Weiping Zhang
Signed-off-by: Jens Axboe
21 Mar, 2019
1 commit
-
For q->poll_nsec == -1, means doing classic poll, not hybrid poll.
We introduce a new flag BLK_MQ_POLL_CLASSIC to replace -1, which
may make code much easier to read.Additionally, since val is an int obtained with kstrtoint(), val can be
a negative value other than -1, so return -EINVAL for that case.Thanks to Damien Le Moal for some good suggestion.
Reviewed-by: Damien Le Moal
Signed-off-by: Yufen Yu
Signed-off-by: Jens Axboe
11 Feb, 2019
2 commits
-
There's no reason to set wbt min lat and freeze request queue
if current value is the same.Reviewed-by: Johannes Thumshirn
Signed-off-by: Aleksei Zakharov
Signed-off-by: Jens Axboe -
The Notes section of the comment was removed, because now
blk_release_queue can only be executed from blk_cleanup_queue (being
called when the q->kobj reaches zero), and also blk_init_queue was removed
in a1ce35fa4985.Signed-off-by: Marcos Paulo de Souza
Signed-off-by: Jens Axboe
29 Dec, 2018
1 commit
-
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas.Additionally, we have a pile of annotation, unused variable and minor
updates.The big API change is the updates for Christoph's DMA rework which
include removing the DISABLE_CLUSTERING flag.And finally there are a couple of target tree updates"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (259 commits)
scsi: isci: request: mark expected switch fall-through
scsi: isci: remote_node_context: mark expected switch fall-throughs
scsi: isci: remote_device: Mark expected switch fall-throughs
scsi: isci: phy: Mark expected switch fall-through
scsi: iscsi: Capture iscsi debug messages using tracepoints
scsi: myrb: Mark expected switch fall-throughs
scsi: megaraid: fix out-of-bound array accesses
scsi: mpt3sas: mpt3sas_scsih: Mark expected switch fall-through
scsi: fcoe: remove set but not used variable 'port'
scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
scsi: smartpqi: fix build warnings
scsi: smartpqi: update driver version
scsi: smartpqi: add ofa support
scsi: smartpqi: increase fw status register read timeout
scsi: smartpqi: bump driver version
scsi: smartpqi: add smp_utils support
scsi: smartpqi: correct lun reset issues
scsi: smartpqi: correct volume status
scsi: smartpqi: do not offline disks for transient did no connect conditions
scsi: smartpqi: allow for larger raid maps
...
19 Dec, 2018
1 commit
-
Now that the the SCSI layer replaced the use of the cluster flag with
segment size limits and the DMA boundary we can remove the cluster flag
from the block layer.Signed-off-by: Christoph Hellwig
Reviewed-by: Jens Axboe
Signed-off-by: Martin K. Petersen
18 Dec, 2018
1 commit
-
The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues
is bigger than zero, so enhance the constraint by checking .nr_queues of type poll
before enabling IO poll.Otherwise IO race & timeout can be observed when running block/007.
Cc: Jeff Moyer
Cc: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
05 Dec, 2018
1 commit
-
This avoids having to have differnet mq_ops for different setups
with or without poll queues.Reviewed-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
29 Nov, 2018
1 commit
-
Give a interface to adjust io timeout(ms) by device.
Signed-off-by: Weiping Zhang
Signed-off-by: Jens Axboe
16 Nov, 2018
3 commits
-
Various spots check for q->mq_ops being non-NULL, but provide
a helper to do this instead.Where the ->mq_ops != NULL check is redundant, remove it.
Since mq == rq-based now that legacy is gone, get rid of the
queue_is_rq_based() and just use queue_is_mq() everywhere.Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
With the legacy request path gone there is no good reason to keep
queue_lock as a pointer, we can always use the embedded lock now.Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph HellwigFixed floppy and blk-cgroup missing conversions and half done edits.
Signed-off-by: Jens Axboe
-
->queue_flags is generally not set or cleared in the fast path, and also
generally set or cleared one flag at a time. Make use of the normal
atomic bitops for it so that we don't need to take the queue_lock,
which is otherwise mostly unused in the core block layer now.Reviewed-by: Hannes Reinecke
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
08 Nov, 2018
2 commits
-
This removes a bunch of core and elevator related code. On the core
front, we remove anything related to queue running, draining,
initialization, plugging, and congestions. We also kill anything
related to request allocation, merging, retrieval, and completion.Remove any checking for single queue IO schedulers, as they no
longer exist. This means we can also delete a bunch of code related
to request issue, adding, completion, etc - and all the SQ related
ops and helpers.Also kill the load_default_modules(), as all that did was provide
for a way to load the default single queue elevator.Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe -
It's now unused, kill it.
Reviewed-by: Hannes Reinecke
Tested-by: Ming Lei
Reviewed-by: Omar Sandoval
Signed-off-by: Jens Axboe
31 Oct, 2018
1 commit
-
rq_qos_exit() removes the current q->rq_qos, this action has to be
done after queue is frozen, otherwise the IO queue path may never
be waken up, then IO hang is caused.So fixes this issue by moving rq_qos_exit() after queue is frozen.
Cc: Josef Bacik
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe
26 Oct, 2018
2 commits
-
Drivers exposing zoned block devices have to initialize and maintain
correctness (i.e. revalidate) of the device zone bitmaps attached to
the device request queue (seq_zones_bitmap and seq_zones_wlock).To simplify coding this, introduce a generic helper function
blk_revalidate_disk_zones() suitable for most (and likely all) cases.
This new function always update the seq_zones_bitmap and seq_zones_wlock
bitmaps as well as the queue nr_zones field when called for a disk
using a request based queue. For a disk using a BIO based queue, only
the number of zones is updated since these queues do not have
schedulers and so do not need the zone bitmaps.With this change, the zone bitmap initialization code in sd_zbc.c can be
replaced with a call to this function in sd_zbc_read_zones(), which is
called from the disk revalidate block operation method.A call to blk_revalidate_disk_zones() is also added to the null_blk
driver for devices created with the zoned mode enabled.Finally, to ensure that zoned devices created with dm-linear or
dm-flakey expose the correct number of zones through sysfs, a call to
blk_revalidate_disk_zones() is added to dm_table_set_restrictions().The zone bitmaps allocated and initialized with
blk_revalidate_disk_zones() are freed automatically from
__blk_release_queue() using the block internal function
blk_queue_free_zone_bitmaps().Reviewed-by: Hannes Reinecke
Reviewed-by: Christoph Hellwig
Reviewed-by: Martin K. Petersen
Reviewed-by: Mike Snitzer
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe -
Expose through sysfs the nr_zones field of struct request_queue.
Exposing this value helps in debugging disk issues as well as
facilitating scripts based use of the disk (e.g. blktests).For zoned block devices, the nr_zones field indicates the total number
of zones of the device calculated using the known disk capacity and
zone size. This number of zones is always 0 for regular block devices.Since nr_zones is defined conditionally with CONFIG_BLK_DEV_ZONED,
introduce the blk_queue_nr_zones() function to return the correct value
for any device, regardless if CONFIG_BLK_DEV_ZONED is set.Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Damien Le Moal
Signed-off-by: Jens Axboe
23 Aug, 2018
1 commit
-
A previous commit removed the ability to have per-rq flags. We used
those flags to maintain inflight counts. Since we don't have those
anymore, we have to always maintain inflight counts, even if wbt is
disabled. This is clearly suboptimal.Add a queue quiesce around changing the wbt latency settings from sysfs
to work around this. With that, we can reliably put the enabled check in
our bio_to_wbt_flags(), since we know the WBT_TRACKED flag will be
consistent for the lifetime of the request.Fixes: c1c80384c8f ("block: remove external dependency on wbt_flags")
Reviewed-by: Josef Bacik
Signed-off-by: Jens Axboe
12 Aug, 2018
1 commit
-
For legacy queues the only call of blkg_root_lookup() happens after
bypass mode has been enabled. Since blkg_lookup() returns NULL for
queues in bypass mode, modify the blkg_root_lookup() such that it
no longer depends on bypass mode. Rename the function into
blk_queue_root_blkg() as suggested by Tejun.Suggested-by: Tejun Heo
Fixes: 6bad9b210a22 ("blkcg: Introduce blkg_root_lookup()")
Signed-off-by: Bart Van Assche
Cc: Tejun Heo
Signed-off-by: Jens Axboe
09 Aug, 2018
1 commit
-
Several block drivers call alloc_disk() followed by put_disk() if
something fails before device_add_disk() is called without calling
blk_cleanup_queue(). Make sure that also for this scenario a request
queue is dissociated from the cgroup controller. This patch avoids
that loading the parport_pc, paride and pf drivers triggers the
following kernel crash:BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride]
Read of size 4 at addr 0000000000000008 by task modprobe/744
Call Trace:
dump_stack+0x9a/0xeb
kasan_report+0x139/0x350
pi_init+0x42e/0x580 [paride]
pf_init+0x2bb/0x1000 [pf]
do_one_initcall+0x8e/0x405
do_init_module+0xd9/0x2f2
load_module+0x3ab4/0x4700
SYSC_finit_module+0x176/0x1a0
do_syscall_64+0xee/0x2b0
entry_SYSCALL_64_after_hwframe+0x42/0xb7Reported-by: Alexandru Moise
Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17
Signed-off-by: Bart Van Assche
Tested-by: Alexandru Moise
Reviewed-by: Johannes Thumshirn
Cc: Tejun Heo
Cc: Christoph Hellwig
Cc: Ming Lei
Cc: Alexandru Moise
Cc: Joseph Qi
Cc:
Signed-off-by: Jens Axboe
09 Jul, 2018
1 commit
-
blkcg-qos is going to do essentially what wbt does, only on a cgroup
basis. Break out the common code that will be shared between blkcg-qos
and wbt into blk-rq-qos.* so they can both utilize the same
infrastructure.Signed-off-by: Josef Bacik
Signed-off-by: Jens Axboe
31 May, 2018
1 commit
-
Convert the core block functionality to embedded bio sets.
Reviewed-by: Christoph Hellwig
Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe
25 May, 2018
1 commit
-
Convert the S_ symbolic permissions to their octal equivalents as
using octal and not symbolic permissions is preferred by many as more
readable.see: https://lkml.org/lkml/2016/8/2/1945
Done with automated conversion via:
$ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplaceMiscellanea:
o Wrapped modified multi-line calls to a single line where appropriate
o Realign modified multi-line calls to open parenthesisSigned-off-by: Joe Perches
Signed-off-by: Jens Axboe
15 May, 2018
1 commit
-
Signed-off-by: Kent Overstreet
Signed-off-by: Jens Axboe
09 Mar, 2018
1 commit
-
Introduce functions that modify the queue flags and that protect
these modifications with the request queue lock. Except for moving
one wake_up_all() call from inside to outside a critical section,
this patch does not change any functionality.Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Ming Lei
Reviewed-by: Johannes Thumshirn
Reviewed-by: Martin K. Petersen
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe
01 Mar, 2018
1 commit
-
Avoid that the following race can occur:
blk_cleanup_queue() blkcg_print_blkgs()
spin_lock_irq(lock) (1) spin_lock_irq(blkg->q->queue_lock) (2,5)
q->queue_lock = &q->__queue_lock (3)
spin_unlock_irq(lock) (4)
spin_unlock_irq(blkg->q->queue_lock) (6)(1) take driver lock;
(2) busy loop for driver lock;
(3) override driver lock with internal lock;
(4) unlock driver lock;
(5) can take driver lock now;
(6) but unlock internal lock.This change is safe because only the SCSI core and the NVME core keep
a reference on a request queue after having called blk_cleanup_queue().
Neither driver accesses any of the removed data structures between its
blk_cleanup_queue() and blk_put_queue() calls.Reported-by: Joseph Qi
Signed-off-by: Bart Van Assche
Reviewed-by: Joseph Qi
Cc: Jan Kara
Signed-off-by: Jens Axboe
19 Jan, 2018
1 commit
-
The __blk_mq_register_dev(), blk_mq_unregister_dev(),
elv_register_queue() and elv_unregister_queue() calls need to be
protected with sysfs_lock but other code in these functions not.
Hence protect only this code with sysfs_lock. This patch fixes a
locking inversion issue in blk_unregister_queue() and also in an
error path of blk_register_queue(): it is not allowed to hold
sysfs_lock around the kobject_del(&q->kobj) call.Reviewed-by: Christoph Hellwig
Signed-off-by: Bart Van Assche
Signed-off-by: Jens Axboe
15 Jan, 2018
2 commits
-
Since I can remember DM has forced the block layer to allow the
allocation and initialization of the request_queue to be distinct
operations. Reason for this is block/genhd.c:add_disk() has requires
that the request_queue (and associated bdi) be tied to the gendisk
before add_disk() is called -- because add_disk() also deals with
exposing the request_queue via blk_register_queue().DM's dynamic creation of arbitrary device types (and associated
request_queue types) requires the DM device's gendisk be available so
that DM table loads can establish a master/slave relationship with
subordinate devices that are referenced by loaded DM tables -- using
bd_link_disk_holder(). But until these DM tables, and their associated
subordinate devices, are known DM cannot know what type of request_queue
it needs -- nor what its queue_limits should be.This chicken and egg scenario has created all manner of problems for DM
and, at times, the block layer.Summary of changes:
- Add device_add_disk_no_queue_reg() and add_disk_no_queue_reg() variant
that drivers may use to add a disk without also calling
blk_register_queue(). Driver must call blk_register_queue() once its
request_queue is fully initialized.- Return early from blk_unregister_queue() if QUEUE_FLAG_REGISTERED
is not set. It won't be set if driver used add_disk_no_queue_reg()
but driver encounters an error and must del_gendisk() before calling
blk_register_queue().- Export blk_register_queue().
These changes allow DM to use add_disk_no_queue_reg() to anchor its
gendisk as the "master" for master/slave relationships DM must establish
with subordinate devices referenced in DM tables that get loaded. Once
all "slave" devices for a DM device are known its request_queue can be
properly initialized and then advertised via sysfs -- important
improvement being that no request_queue resource initialization
performed by blk_register_queue() is missed for DM devices anymore.Signed-off-by: Mike Snitzer
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe -
The original commit e9a823fb34a8b (block: fix warning when I/O elevator
is changed as request_queue is being removed) is pretty conflated.
"conflated" because the resource being protected by q->sysfs_lock isn't
the queue_flags (it is the 'queue' kobj).q->sysfs_lock serializes __elevator_change() (via elv_iosched_store)
from racing with blk_unregister_queue():
1) By holding q->sysfs_lock first, __elevator_change() can complete
before a racing blk_unregister_queue().
2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED
in case elv_iosched_store() loses the race with blk_unregister_queue(),
it needs a way to know the 'queue' kobj isn't there.Expand the scope of blk_unregister_queue()'s q->sysfs_lock use so it is
held until after the 'queue' kobj is removed.To do so blk_mq_unregister_dev() must not also take q->sysfs_lock. So
rename __blk_mq_unregister_dev() to blk_mq_unregister_dev().Also, blk_unregister_queue() should use q->queue_lock to protect against
any concurrent writes to q->queue_flags -- even though chances are the
queue is being cleaned up so no concurrent writes are likely.Fixes: e9a823fb34a8b ("block: fix warning when I/O elevator is changed as request_queue is being removed")
Signed-off-by: Mike Snitzer
Reviewed-by: Ming Lei
Signed-off-by: Jens Axboe
24 Nov, 2017
1 commit
-
wbt_init doesn't set q->rq_wb to NULL, if wbt_init return 0,
so check return value is enough, remove NULL checking.Signed-off-by: weiping zhang
Signed-off-by: Jens Axboe
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman