15 Oct, 2019

1 commit

  • A BIO based request queue does not have a tag_set, which prevent testing
    for the flag BLK_MQ_F_NO_SCHED indicating that the queue does not
    require an elevator. This leads to an incorrect initialization of a
    default elevator in some cases such as BIO based null_blk
    (queue_mode == BIO) with zoned mode enabled as the default elevator in
    this case is mq-deadline instead of "none".

    Fix this by testing for a NULL queue mq_ops field which indicates that
    the queue is BIO based and should not have an elevator.

    Reported-by: Shinichiro Kawasaki
    Reviewed-by: Bob Liu
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Damien Le Moal
     

26 Sep, 2019

1 commit

  • cecf5d87ff20 ("block: split .sysfs_lock into two locks") starts to
    release & acquire sysfs_lock before registering/un-registering elevator
    queue during switching elevator for avoiding potential deadlock from
    showing & storing 'queue/iosched' attributes and removing elevator's
    kobject.

    Turns out there isn't such deadlock because 'q->sysfs_lock' isn't
    required in .show & .store of queue/iosched's attributes, and just
    elevator's sysfs lock is acquired in elv_iosched_store() and
    elv_iosched_show(). So it is safe to hold queue's sysfs lock when
    registering/un-registering elevator queue.

    The biggest issue is that commit cecf5d87ff20 assumes that concurrent
    write on 'queue/scheduler' can't happen. However, this assumption isn't
    true, because kernfs_fop_write() only guarantees that concurrent write
    aren't called on the same open file, but the write could be from
    different open on the file. So we can't release & re-acquire queue's
    sysfs lock during switching elevator, otherwise use-after-free on
    elevator could be triggered.

    Fixes the issue by not releasing queue's sysfs lock during switching
    elevator.

    Fixes: cecf5d87ff20 ("block: split .sysfs_lock into two locks")
    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Greg KH
    Cc: Mike Snitzer
    Reviewed-by: Bart Van Assche
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

06 Sep, 2019

6 commits

  • The lookup logic is broken - 'e' will never be NULL, even if the
    list is empty. Maintain lookup hit in a separate variable instead.

    Fixes: a0958ba7fcdc ("block: Improve default elevator selection")
    Reported-by: Julia Lawall
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • When elevator_init_mq() is called from blk_mq_init_allocated_queue(),
    the only information known about the device is the number of hardware
    queues as the block device scan by the device driver is not completed
    yet for most drivers. The device type and elevator required features
    are not set yet, preventing to correctly select the default elevator
    most suitable for the device.

    This currently affects all multi-queue zoned block devices which default
    to the "none" elevator instead of the required "mq-deadline" elevator.
    These drives currently include host-managed SMR disks connected to a
    smartpqi HBA and null_blk block devices with zoned mode enabled.
    Upcoming NVMe Zoned Namespace devices will also be affected.

    Fix this by adding the boolean elevator_init argument to
    blk_mq_init_allocated_queue() to control the execution of
    elevator_init_mq(). Two cases exist:
    1) elevator_init = false is used for calls to
    blk_mq_init_allocated_queue() within blk_mq_init_queue(). In this
    case, a call to elevator_init_mq() is added to __device_add_disk(),
    resulting in the delayed initialization of the queue elevator
    after the device driver finished probing the device information. This
    effectively allows elevator_init_mq() access to more information
    about the device.
    2) elevator_init = true preserves the current behavior of initializing
    the elevator directly from blk_mq_init_allocated_queue(). This case
    is used for the special request based DM devices where the device
    gendisk is created before the queue initialization and device
    information (e.g. queue limits) is already known when the queue
    initialization is executed.

    Additionally, to make sure that the elevator initialization is never
    done while requests are in-flight (there should be none when the device
    driver calls device_add_disk()), freeze and quiesce the device request
    queue before calling blk_mq_init_sched() in elevator_init_mq().

    Reviewed-by: Ming Lei
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Damien Le Moal
     
  • For block devices that do not specify required features, preserve the
    current default elevator selection (mq-deadline for single queue
    devices, none for multi-queue devices). However, for devices specifying
    required features (e.g. zoned block devices ELEVATOR_F_ZBD_SEQ_WRITE
    feature), select the first available elevator providing the required
    features.

    In all cases, default to "none" if no elevator is available or if the
    initialization of the default elevator fails.

    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Ming Lei
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Damien Le Moal
     
  • Introduce the definition of elevator features through the
    elevator_features flags in the elevator_type structure. Each flag can
    represent a feature supported by an elevator. The first feature defined
    by this patch is support for zoned block device sequential write
    constraint with the flag ELEVATOR_F_ZBD_SEQ_WRITE, which is implemented
    by the mq-deadline elevator using zone write locking.

    Other possible features are IO priorities, write hints, latency targets
    or single-LUN dual-actuator disks (for which the elevator could maintain
    one LBA ordered list per actuator).

    The required_elevator_features field is also added to the request_queue
    structure to allow a device driver to specify elevator feature flags
    that an elevator must support for the correct operation of the device
    (e.g. device drivers for zoned block devices can have the
    ELEVATOR_F_ZBD_SEQ_WRITE flag as a required feature).
    The helper function blk_queue_required_elevator_features() is
    defined for setting this new field.

    With these two new fields in place, the elevator functions
    elevator_match() and elevator_find() are modified to allow a user to set
    only an elevator with a set of features that satisfies the device
    required features. Elevators not matching the device requirements are
    not shown in the device sysfs queue/scheduler file to prevent their use.

    The "none" elevator can always be selected as before.

    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Damien Le Moal
     
  • If the default elevator chosen is mq-deadline, elevator_init_mq() may
    return an error if mq-deadline initialization fails, leading to
    blk_mq_init_allocated_queue() returning an error, which in turn will
    cause the block device initialization to fail and the device not being
    exposed.

    Instead of taking such extreme measure, handle mq-deadline
    initialization failures in the same manner as when mq-deadline is not
    available (no module to load), that is, default to the "none" scheduler.
    With this change, elevator_init_mq() return type can be changed to void.

    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Damien Le Moal
     
  • Instead of checking a queue tag_set BLK_MQ_F_NO_SCHED flag before
    calling elevator_init_mq() to make sure that the queue supports IO
    scheduling, use the elevator.c function elv_support_iosched() in
    elevator_init_mq(). This does not introduce any functional change but
    ensure that elevator_init_mq() does the right thing based on the queue
    settings.

    Reviewed-by: Ming Lei
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Damien Le Moal
     

03 Sep, 2019

1 commit


28 Aug, 2019

3 commits

  • The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
    path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
    required.

    However, when mq & iosched kobjects are removed via
    blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held
    too. This way causes AB-BA lock because the kernfs built-in lock of
    'kn-count' is required inside kobject_del() too, see the lockdep warning[1].

    On the other hand, it isn't necessary to acquire q->sysfs_lock for
    both blk_mq_unregister_dev() & elv_unregister_queue() because
    clearing REGISTERED flag prevents storing to 'queue/scheduler'
    from being happened. Also sysfs write(store) is exclusive, so no
    necessary to hold the lock for elv_unregister_queue() when it is
    called in switching elevator path.

    So split .sysfs_lock into two: one is still named as .sysfs_lock for
    covering sync .store, the other one is named as .sysfs_dir_lock
    for covering kobjects and related status change.

    sysfs itself can handle the race between add/remove kobjects and
    showing/storing attributes under kobjects. For switching scheduler
    via storing to 'queue/scheduler', we use the queue flag of
    QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then
    we can avoid to hold .sysfs_lock during removing/adding kobjects.

    [1] lockdep warning
    ======================================================
    WARNING: possible circular locking dependency detected
    5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted
    ------------------------------------------------------
    rmmod/777 is trying to acquire lock:
    00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72

    but task is already holding lock:
    00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&q->sysfs_lock){+.+.}:
    __lock_acquire+0x95f/0xa2f
    lock_acquire+0x1b4/0x1e8
    __mutex_lock+0x14a/0xa9b
    blk_mq_hw_sysfs_show+0x63/0xb6
    sysfs_kf_seq_show+0x11f/0x196
    seq_read+0x2cd/0x5f2
    vfs_read+0xc7/0x18c
    ksys_read+0xc4/0x13e
    do_syscall_64+0xa7/0x295
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    -> #0 (kn->count#202){++++}:
    check_prev_add+0x5d2/0xc45
    validate_chain+0xed3/0xf94
    __lock_acquire+0x95f/0xa2f
    lock_acquire+0x1b4/0x1e8
    __kernfs_remove+0x237/0x40b
    kernfs_remove_by_name_ns+0x59/0x72
    remove_files+0x61/0x96
    sysfs_remove_group+0x81/0xa4
    sysfs_remove_groups+0x3b/0x44
    kobject_del+0x44/0x94
    blk_mq_unregister_dev+0x83/0xdd
    blk_unregister_queue+0xa0/0x10b
    del_gendisk+0x259/0x3fa
    null_del_dev+0x8b/0x1c3 [null_blk]
    null_exit+0x5c/0x95 [null_blk]
    __se_sys_delete_module+0x204/0x337
    do_syscall_64+0xa7/0x295
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    other info that might help us debug this:

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&q->sysfs_lock);
    lock(kn->count#202);
    lock(&q->sysfs_lock);
    lock(kn->count#202);

    *** DEADLOCK ***

    2 locks held by rmmod/777:
    #0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk]
    #1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b

    stack backtrace:
    CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4
    Call Trace:
    dump_stack+0x9a/0xe6
    check_noncircular+0x207/0x251
    ? print_circular_bug+0x32a/0x32a
    ? find_usage_backwards+0x84/0xb0
    check_prev_add+0x5d2/0xc45
    validate_chain+0xed3/0xf94
    ? check_prev_add+0xc45/0xc45
    ? mark_lock+0x11b/0x804
    ? check_usage_forwards+0x1ca/0x1ca
    __lock_acquire+0x95f/0xa2f
    lock_acquire+0x1b4/0x1e8
    ? kernfs_remove_by_name_ns+0x59/0x72
    __kernfs_remove+0x237/0x40b
    ? kernfs_remove_by_name_ns+0x59/0x72
    ? kernfs_next_descendant_post+0x7d/0x7d
    ? strlen+0x10/0x23
    ? strcmp+0x22/0x44
    kernfs_remove_by_name_ns+0x59/0x72
    remove_files+0x61/0x96
    sysfs_remove_group+0x81/0xa4
    sysfs_remove_groups+0x3b/0x44
    kobject_del+0x44/0x94
    blk_mq_unregister_dev+0x83/0xdd
    blk_unregister_queue+0xa0/0x10b
    del_gendisk+0x259/0x3fa
    ? disk_events_poll_msecs_store+0x12b/0x12b
    ? check_flags+0x1ea/0x204
    ? mark_held_locks+0x1f/0x7a
    null_del_dev+0x8b/0x1c3 [null_blk]
    null_exit+0x5c/0x95 [null_blk]
    __se_sys_delete_module+0x204/0x337
    ? free_module+0x39f/0x39f
    ? blkcg_maybe_throttle_current+0x8a/0x718
    ? rwlock_bug+0x62/0x62
    ? __blkcg_punt_bio_submit+0xd0/0xd0
    ? trace_hardirqs_on_thunk+0x1a/0x20
    ? mark_held_locks+0x1f/0x7a
    ? do_syscall_64+0x4c/0x295
    do_syscall_64+0xa7/0x295
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7fb696cdbe6b
    Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008
    RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
    RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b
    RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828
    RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000
    R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0
    R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0

    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Greg KH
    Cc: Mike Snitzer
    Reviewed-by: Bart Van Assche
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • There are 4 users which check if queue is registered, so add one helper
    to check it.

    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Greg KH
    Cc: Mike Snitzer
    Cc: Bart Van Assche
    Reviewed-by: Bart Van Assche
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • The original comment says:

    q->sysfs_lock must be held to provide mutual exclusion between
    elevator_switch() and here.

    Which is simply wrong. elevator_init_mq() is only called from
    blk_mq_init_allocated_queue, which is always called before the request
    queue is registered via blk_register_queue(), for dm-rq or normal rq
    based driver. However, queue's kobject is only exposed and added to sysfs
    in blk_register_queue(). So there isn't such race between elevator_switch()
    and elevator_init_mq().

    So avoid to hold q->sysfs_lock in elevator_init_mq().

    Cc: Christoph Hellwig
    Cc: Hannes Reinecke
    Cc: Greg KH
    Cc: Mike Snitzer
    Cc: Bart Van Assche
    Cc: Damien Le Moal
    Reviewed-by: Bart Van Assche
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

07 Jun, 2019

1 commit

  • In theory, IO scheduler belongs to request queue, and the request pool
    of sched tags belongs to the request queue too.

    However, the current tags allocation interfaces are re-used for both
    driver tags and sched tags, and driver tags is definitely host wide,
    and doesn't belong to any request queue, same with its request pool.
    So we need tagset instance for freeing request of sched tags.

    Meantime, blk_mq_free_tag_set() often follows blk_cleanup_queue() in case
    of non-BLK_MQ_F_TAG_SHARED, this way requires that request pool of sched
    tags to be freed before calling blk_mq_free_tag_set().

    Commit 47cdee29ef9d94e ("block: move blk_exit_queue into __blk_release_queue")
    moves blk_exit_queue into __blk_release_queue for simplying the fast
    path in generic_make_request(), then causes oops during freeing requests
    of sched tags in __blk_release_queue().

    Fix the above issue by move freeing request pool of sched tags into
    blk_cleanup_queue(), this way is safe becasue queue has been frozen and no any
    in-queue requests at that time. Freeing sched tags has to be kept in queue's
    release handler becasue there might be un-completed dispatch activity
    which might refer to sched tags.

    Cc: Bart Van Assche
    Cc: Christoph Hellwig
    Fixes: 47cdee29ef9d94e485eb08f962c74943023a5271 ("block: move blk_exit_queue into __blk_release_queue")
    Tested-by: Yi Zhang
    Reported-by: kernel test robot
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

01 May, 2019

1 commit


08 Apr, 2019

1 commit


11 Feb, 2019

1 commit


16 Nov, 2018

1 commit

  • Various spots check for q->mq_ops being non-NULL, but provide
    a helper to do this instead.

    Where the ->mq_ops != NULL check is redundant, remove it.

    Since mq == rq-based now that legacy is gone, get rid of the
    queue_is_rq_based() and just use queue_is_mq() everywhere.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     

15 Nov, 2018

1 commit

  • The boolean next_sorted is set to false and is never changed, hence
    the code that checks if it is true is dead code and can now be
    removed. This dead code occurred from a previous commit that cleaned
    up the elevator and removed the setting of next_sorted to true.

    Detected by CoverityScan, CID#1475401 ("'Constant' variable guards
    dead code")

    Fixes: a1ce35fa4985 ("block: remove dead elevator code")
    Signed-off-by: Colin Ian King
    Signed-off-by: Jens Axboe

    Colin Ian King
     

08 Nov, 2018

3 commits

  • This is a remnant of when we had ops for both SQ and MQ
    schedulers. Now it's just MQ, so get rid of the union.

    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This removes a bunch of core and elevator related code. On the core
    front, we remove anything related to queue running, draining,
    initialization, plugging, and congestions. We also kill anything
    related to request allocation, merging, retrieval, and completion.

    Remove any checking for single queue IO schedulers, as they no
    longer exist. This means we can also delete a bunch of code related
    to request issue, adding, completion, etc - and all the SQ related
    ops and helpers.

    Also kill the load_default_modules(), as all that did was provide
    for a way to load the default single queue elevator.

    Tested-by: Ming Lei
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Retain the deadline documentation, as that carries over to mq-deadline
    as well.

    Tested-by: Ming Lei
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     

01 Oct, 2018

1 commit

  • Merge -rc6 in, for two reasons:

    1) Resolve a trivial conflict in the blk-mq-tag.c documentation
    2) A few important regression fixes went into upstream directly, so
    they aren't in the 4.20 branch.

    Signed-off-by: Jens Axboe

    * tag 'v4.19-rc6': (780 commits)
    Linux 4.19-rc6
    MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c
    cpufreq: qcom-kryo: Fix section annotations
    perf/core: Add sanity check to deal with pinned event failure
    xen/blkfront: correct purging of persistent grants
    Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
    selftests/powerpc: Fix Makefiles for headers_install change
    blk-mq: I/O and timer unplugs are inverted in blktrace
    dax: Fix deadlock in dax_lock_mapping_entry()
    x86/boot: Fix kexec booting failure in the SEV bit detection code
    bcache: add separate workqueue for journal_write to avoid deadlock
    drm/amd/display: Fix Edid emulation for linux
    drm/amd/display: Fix Vega10 lightup on S3 resume
    drm/amdgpu: Fix vce work queue was not cancelled when suspend
    Revert "drm/panel: Add device_link from panel device to DRM device"
    xen/blkfront: When purging persistent grants, keep them in the buffer
    clocksource/drivers/timer-atmel-pit: Properly handle error cases
    block: fix deadline elevator drain for zoned block devices
    ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
    drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
    ...

    Signed-off-by: Jens Axboe

    Jens Axboe
     

27 Sep, 2018

4 commits

  • When the deadline scheduler is used with a zoned block device, writes
    to a zone will be dispatched one at a time. This causes the warning
    message:

    deadline: forced dispatching is broken (nr_sorted=X), please report this

    to be displayed when switching to another elevator with the legacy I/O
    path while write requests to a zone are being retained in the scheduler
    queue.

    Prevent this message from being displayed when executing
    elv_drain_elevator() for a zoned block device. __blk_drain_queue() will
    loop until all writes are dispatched and completed, resulting in the
    desired elevator queue drain without extensive modifications to the
    deadline code itself to handle forced-dispatch calls.

    Signed-off-by: Damien Le Moal
    Fixes: 8dc8146f9c92 ("deadline-iosched: Introduce zone locking support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Damien Le Moal
     
  • Instead of scheduling runtime resume of a request queue after a
    request has been queued, schedule asynchronous resume during request
    allocation. The new pm_request_resume() calls occur after
    blk_queue_enter() has increased the q_usage_counter request queue
    member. This change is needed for a later patch that will make request
    allocation block while the queue status is not RPM_ACTIVE.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Cc: Jianchao Wang
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Cc: Alan Stern
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Move the pm_request_resume() and pm_runtime_mark_last_busy() calls into
    two new functions and thereby separate legacy block layer code from code
    that works for both the legacy block layer and blk-mq. A later patch will
    add calls to the new functions in the blk-mq code.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Cc: Martin K. Petersen
    Cc: Jianchao Wang
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Cc: Alan Stern
    Signed-off-by: Jens Axboe

    Bart Van Assche
     
  • Move the code for runtime power management from blk-core.c into the
    new source file blk-pm.c. Move the corresponding declarations from
    into . For CONFIG_PM=n, leave out
    the declarations of the functions that are not used in that mode.
    This patch not only reduces the number of #ifdefs in the block layer
    core code but also reduces the size of header file
    and hence should help to reduce the build time of the Linux kernel
    if CONFIG_PM is not defined.

    Signed-off-by: Bart Van Assche
    Reviewed-by: Ming Lei
    Reviewed-by: Christoph Hellwig
    Cc: Jianchao Wang
    Cc: Hannes Reinecke
    Cc: Johannes Thumshirn
    Cc: Alan Stern
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

28 Aug, 2018

1 commit


21 Aug, 2018

1 commit

  • Currently, when update nr_hw_queues, IO scheduler's init_hctx will
    be invoked before the mapping between ctx and hctx is adapted
    correctly by blk_mq_map_swqueue. The IO scheduler init_hctx (kyber)
    may depend on this mapping and get wrong result and panic finally.
    A simply way to fix this is that switch the IO scheduler to 'none'
    before update the nr_hw_queues, and then switch it back after
    update nr_hw_queues. blk_mq_sched_init_/exit_hctx are removed due
    to nobody use them any more.

    Signed-off-by: Jianchao Wang
    Signed-off-by: Jens Axboe

    Jianchao Wang
     

01 Jun, 2018

5 commits


19 Jan, 2018

2 commits


07 Jan, 2018

1 commit

  • Dispatch may still be in-progress after queue is frozen, so we have to
    quiesce queue before switching IO scheduler and updating nr_requests.

    Also when switching io schedulers, blk_mq_run_hw_queue() may still be
    called somewhere(such as from nvme_reset_work()), and io scheduler's
    per-hctx data may not be setup yet, so cause oops even inside
    blk_mq_hctx_has_pending(), such as it can be run just between:

    ret = e->ops.mq.init_sched(q, e);
    AND
    ret = e->ops.mq.init_hctx(hctx, i)

    inside blk_mq_init_sched().

    This reverts commit 7a148c2fcff8330(block: don't call blk_mq_quiesce_queue()
    after queue is frozen) basically, and makes sure blk_mq_hctx_has_pending
    won't be called if queue is quiesced.

    Reviewed-by: Christoph Hellwig
    Fixes: 7a148c2fcff83309(block: don't call blk_mq_quiesce_queue() after queue is frozen)
    Reported-by: Yi Zhang
    Tested-by: Yi Zhang
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

26 Oct, 2017

2 commits

  • Since we now lookup elevator types with the appropriate multiqueue
    capability, allow schedulers to register with an alias alongside
    the real name. This is in preparation for allowing 'mq-deadline'
    to register an alias of 'deadline' as well.

    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • If an IO scheduler is selected via elevator= and it doesn't match
    the driver in question wrt blk-mq support, then we fail to boot.

    The elevator= parameter is deprecated and only supported for
    non-mq devices. Augment the elevator lookup API so that we
    pass in if we're looking for an mq capable scheduler or not,
    so that we only ever return a valid type for the queue in
    question.

    Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196695
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     

06 Oct, 2017

1 commit


29 Aug, 2017

1 commit

  • There is a race between changing I/O elevator and request_queue removal
    which can trigger the warning in kobject_add_internal. A program can
    use sysfs to request a change of elevator at the same time another task
    is unregistering the request_queue the elevator would be attached to.
    The elevator's kobject will then attempt to be connected to the
    request_queue in the object tree when the request_queue has just been
    removed from sysfs. This triggers the warning in kobject_add_internal
    as the request_queue no longer has a sysfs directory:

    kobject_add_internal failed for iosched (error: -2 parent: queue)
    ------------[ cut here ]------------
    WARNING: CPU: 3 PID: 14075 at lib/kobject.c:244 kobject_add_internal+0x103/0x2d0

    To fix this warning, we can check the QUEUE_FLAG_REGISTERED flag when
    changing the elevator and use the request_queue's sysfs_lock to
    serialize between clearing the flag and the elevator testing the flag.

    Signed-off-by: David Jeffery
    Tested-by: Ming Lei
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    David Jeffery