01 Feb, 2019

1 commit

  • The speed at which a bfq_queue receives I/O is one of the parameters by
    which bfq decides whether the queue is soft real-time (i.e., whether the
    queue contains the I/O of a soft real-time application). In particular,
    when a bfq_queue remains without outstanding I/O requests, bfq computes
    the minimum time instant, named soft_rt_next_start, at which the next
    request of the queue may arrive for the queue to be deemed as soft real
    time.

    Unfortunately this filtering may cause problems with a queue in
    interactive weight raising. In fact, such a queue may be conveying the
    I/O needed to load a soft real-time application. The latter will
    actually exhibit a soft real-time I/O pattern after it finally starts
    doing its job. But, if soft_rt_next_start is updated for an interactive
    bfq_queue, and the queue has received a lot of service before remaining
    with no outstanding request (likely to happen on a fast device), then
    soft_rt_next_start is assigned such a high value that, for a very long
    time, the queue is prevented from being possibly considered as soft real
    time.

    This commit removes the updating of soft_rt_next_start for bfq_queues in
    interactive weight raising.

    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     

27 Jan, 2019

1 commit


25 Jan, 2019

1 commit

  • This patch avoids that sparse reports the following warnings:

    CHECK block/blk-wbt.c
    block/blk-wbt.c:600:6: warning: symbol 'wbt_issue' was not declared. Should it be static?
    block/blk-wbt.c:620:6: warning: symbol 'wbt_requeue' was not declared. Should it be static?
    CC block/blk-wbt.o
    block/blk-wbt.c:600:6: warning: no previous prototype for wbt_issue [-Wmissing-prototypes]
    void wbt_issue(struct rq_qos *rqos, struct request *rq)
    ^~~~~~~~~
    block/blk-wbt.c:620:6: warning: no previous prototype for wbt_requeue [-Wmissing-prototypes]
    void wbt_requeue(struct rq_qos *rqos, struct request *rq)
    ^~~~~~~~~~~

    Reviewed-by: Chaitanya Kulkarni
    Signed-off-by: Bart Van Assche
    Signed-off-by: Jens Axboe

    Bart Van Assche
     

24 Jan, 2019

1 commit


23 Jan, 2019

1 commit

  • Except for blk_queue_split(), bio_split() is used for splitting bio too,
    then the remained bio is often resubmit to queue via generic_make_request().
    So the same queue enter recursion exits in this case too. Unfortunatley
    commit cd4a4ae4683dc2 doesn't help this case.

    This patch covers the above case by setting BIO_QUEUE_ENTERED before calling
    q->make_request_fn.

    In theory the per-bio flag is used to simulate one stack variable, it is
    just fine to clear it after q->make_request_fn is returned. Especially
    the same bio can't be submitted from another context.

    Fixes: cd4a4ae4683dc2 ("block: don't use blocking queue entered for recursive bio submits")
    Cc: Tetsuo Handa
    Cc: NeilBrown
    Reviewed-by: Mike Snitzer
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

18 Jan, 2019

1 commit

  • Remove the imprecise and sloppy:

    "This files is licensed under the GPL."

    license notice in the top level comment.

    1) The file already contains a SPDX license identifier which clearly
    states that the license of the file is GPL V2 only

    2) The notice resolves to GPL v1 or later for scanners which is just
    contrary to the intent of SPDX identifiers to provide clear and non
    ambiguous license information. Aside of that the value add of this
    notice is below zero,

    Cc: Damien Le Moal
    Cc: Matias Bjorling
    Cc: Christoph Hellwig
    Cc: Jens Axboe
    Cc: linux-block@vger.kernel.org
    Fixes: 6a5ac9846508 ("block: Make struct request_queue smaller for CONFIG_BLK_DEV_ZONED=n")
    Reviewed-by: Bart Van Assche
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Jens Axboe

    Thomas Gleixner
     

16 Jan, 2019

1 commit

  • We need to pass bio->bi_opf after bio intergrity preparing, otherwise
    the flag of REQ_INTEGRITY may not be set on the allocated request, then
    breaks block integrity.

    Fixes: f9afca4d367b ("blk-mq: pass in request/bio flags to queue mapping")
    Cc: Hannes Reinecke
    Cc: Keith Busch
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

14 Jan, 2019

1 commit

  • Comments on function __bfq_deactivate_entity contains two imprecise or
    wrong statements:
    1) The function performs the deactivation of the entity.
    2) The function must be invoked only if the entity is on a service tree.

    This commits replaces both statements with the correct ones:
    1) The functions updates sched_data and service trees for the entity,
    so as to represent entity as inactive (which is only part of the steps
    needed for the deactivation of the entity).
    2) The function must be invoked on every entity being deactivated.

    Signed-off-by: Paolo Valente
    Signed-off-by: Jens Axboe

    Paolo Valente
     

10 Jan, 2019

1 commit

  • Commit 5f0ed774ed29 ("block: sum requests in the plug structure") removed
    the request_count parameter from block_attempt_plug_merge(), but did not
    remove the associated kerneldoc comment, introducing this warning to the
    docs build:

    ./block/blk-core.c:685: warning: Excess function parameter 'request_count' description in 'blk_attempt_plug_merge'

    Remove the obsolete description and make things a little quieter.

    Signed-off-by: Jonathan Corbet
    Signed-off-by: Jens Axboe

    Jonathan Corbet
     

09 Jan, 2019

1 commit

  • There was some confusion about what these functions did. Make it clear
    that this is a hint for upper layers to pass to the block layer, and
    that it does not guarantee that I/O will not be submitted between a
    start and finish plug.

    Reported-by: "Darrick J. Wong"
    Reviewed-by: Darrick J. Wong
    Reviewed-by: Ming Lei
    Signed-off-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

03 Jan, 2019

1 commit

  • Pull more block updates from Jens Axboe:

    - Dead code removal for loop/sunvdc (Chengguang)

    - Mark BIDI support for bsg as deprecated, logging a single dmesg
    warning if anyone is actually using it (Christoph)

    - blkcg cleanup, killing a dead function and making the tryget_closest
    variant easier to read (Dennis)

    - Floppy fixes, one fixing a regression in swim3 (Finn)

    - lightnvm use-after-free fix (Gustavo)

    - gdrom leak fix (Wenwen)

    - a set of drbd updates (Lars, Luc, Nathan, Roland)

    * tag 'for-4.21/block-20190102' of git://git.kernel.dk/linux-block: (28 commits)
    block/swim3: Fix regression on PowerBook G3
    block/swim3: Fix -EBUSY error when re-opening device after unmount
    block/swim3: Remove dead return statement
    block/amiflop: Don't log error message on invalid ioctl
    gdrom: fix a memory leak bug
    lightnvm: pblk: fix use-after-free bug
    block: sunvdc: remove redundant code
    block: loop: remove redundant code
    bsg: deprecate BIDI support in bsg
    blkcg: remove unused __blkg_release_rcu()
    blkcg: clean up blkg_tryget_closest()
    drbd: Change drbd_request_detach_interruptible's return type to int
    drbd: Avoid Clang warning about pointless switch statment
    drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")
    drbd: skip spurious timeout (ping-timeo) when failing promote
    drbd: don't retry connection if peers do not agree on "authentication" settings
    drbd: fix print_st_err()'s prototype to match the definition
    drbd: avoid spurious self-outdating with concurrent disconnect / down
    drbd: do not block when adjusting "disk-options" while IO is frozen
    drbd: fix comment typos
    ...

    Linus Torvalds
     

30 Dec, 2018

1 commit

  • Pull Kconfig updates from Masahiro Yamada:

    - support -y option for merge_config.sh to avoid downgrading =y to =m

    - remove S_OTHER symbol type, and touch include/config/*.h files correctly

    - fix file name and line number in lexer warnings

    - fix memory leak when EOF is encountered in quotation

    - resolve all shift/reduce conflicts of the parser

    - warn no new line at end of file

    - make 'source' statement more strict to take only string literal

    - rewrite the lexer and remove the keyword lookup table

    - convert to SPDX License Identifier

    - compile C files independently instead of including them from zconf.y

    - fix various warnings of gconfig

    - misc cleanups

    * tag 'kconfig-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits)
    kconfig: surround dbg_sym_flags with #ifdef DEBUG to fix gconf warning
    kconfig: split images.c out of qconf.cc/gconf.c to fix gconf warnings
    kconfig: add static qualifiers to fix gconf warnings
    kconfig: split the lexer out of zconf.y
    kconfig: split some C files out of zconf.y
    kconfig: convert to SPDX License Identifier
    kconfig: remove keyword lookup table entirely
    kconfig: update current_pos in the second lexer
    kconfig: switch to ASSIGN_VAL state in the second lexer
    kconfig: stop associating kconf_id with yylval
    kconfig: refactor end token rules
    kconfig: stop supporting '.' and '/' in unquoted words
    treewide: surround Kconfig file paths with double quotes
    microblaze: surround string default in Kconfig with double quotes
    kconfig: use T_WORD instead of T_VARIABLE for variables
    kconfig: use specific tokens instead of T_ASSIGN for assignments
    kconfig: refactor scanning and parsing "option" properties
    kconfig: use distinct tokens for type and default properties
    kconfig: remove redundant token defines
    kconfig: rename depends_list to comment_option_list
    ...

    Linus Torvalds
     

29 Dec, 2018

2 commits

  • Pull SCSI updates from James Bottomley:
    "This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
    megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas.

    Additionally, we have a pile of annotation, unused variable and minor
    updates.

    The big API change is the updates for Christoph's DMA rework which
    include removing the DISABLE_CLUSTERING flag.

    And finally there are a couple of target tree updates"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (259 commits)
    scsi: isci: request: mark expected switch fall-through
    scsi: isci: remote_node_context: mark expected switch fall-throughs
    scsi: isci: remote_device: Mark expected switch fall-throughs
    scsi: isci: phy: Mark expected switch fall-through
    scsi: iscsi: Capture iscsi debug messages using tracepoints
    scsi: myrb: Mark expected switch fall-throughs
    scsi: megaraid: fix out-of-bound array accesses
    scsi: mpt3sas: mpt3sas_scsih: Mark expected switch fall-through
    scsi: fcoe: remove set but not used variable 'port'
    scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
    scsi: smartpqi: fix build warnings
    scsi: smartpqi: update driver version
    scsi: smartpqi: add ofa support
    scsi: smartpqi: increase fw status register read timeout
    scsi: smartpqi: bump driver version
    scsi: smartpqi: add smp_utils support
    scsi: smartpqi: correct lun reset issues
    scsi: smartpqi: correct volume status
    scsi: smartpqi: do not offline disks for transient did no connect conditions
    scsi: smartpqi: allow for larger raid maps
    ...

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:
    "This is the main pull request for block/storage for 4.21.

    Larger than usual, it was a busy round with lots of goodies queued up.
    Most notable is the removal of the old IO stack, which has been a long
    time coming. No new features for a while, everything coming in this
    week has all been fixes for things that were previously merged.

    This contains:

    - Use atomic counters instead of semaphores for mtip32xx (Arnd)

    - Cleanup of the mtip32xx request setup (Christoph)

    - Fix for circular locking dependency in loop (Jan, Tetsuo)

    - bcache (Coly, Guoju, Shenghui)
    * Optimizations for writeback caching
    * Various fixes and improvements

    - nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith)
    * host and target support for NVMe over TCP
    * Error log page support
    * Support for separate read/write/poll queues
    * Much improved polling
    * discard OOM fallback
    * Tracepoint improvements

    - lightnvm (Hans, Hua, Igor, Matias, Javier)
    * Igor added packed metadata to pblk. Now drives without metadata
    per LBA can be used as well.
    * Fix from Geert on uninitialized value on chunk metadata reads.
    * Fixes from Hans and Javier to pblk recovery and write path.
    * Fix from Hua Su to fix a race condition in the pblk recovery
    code.
    * Scan optimization added to pblk recovery from Zhoujie.
    * Small geometry cleanup from me.

    - Conversion of the last few drivers that used the legacy path to
    blk-mq (me)

    - Removal of legacy IO path in SCSI (me, Christoph)

    - Removal of legacy IO stack and schedulers (me)

    - Support for much better polling, now without interrupts at all.
    blk-mq adds support for multiple queue maps, which enables us to
    have a map per type. This in turn enables nvme to have separate
    completion queues for polling, which can then be interrupt-less.
    Also means we're ready for async polled IO, which is hopefully
    coming in the next release.

    - Killing of (now) unused block exports (Christoph)

    - Unification of the blk-rq-qos and blk-wbt wait handling (Josef)

    - Support for zoned testing with null_blk (Masato)

    - sx8 conversion to per-host tag sets (Christoph)

    - IO priority improvements (Damien)

    - mq-deadline zoned fix (Damien)

    - Ref count blkcg series (Dennis)

    - Lots of blk-mq improvements and speedups (me)

    - sbitmap scalability improvements (me)

    - Make core inflight IO accounting per-cpu (Mikulas)

    - Export timeout setting in sysfs (Weiping)

    - Cleanup the direct issue path (Jianchao)

    - Export blk-wbt internals in block debugfs for easier debugging
    (Ming)

    - Lots of other fixes and improvements"

    * tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits)
    kyber: use sbitmap add_wait_queue/list_del wait helpers
    sbitmap: add helpers for add/del wait queue handling
    block: save irq state in blkg_lookup_create()
    dm: don't reuse bio for flushes
    nvme-pci: trace SQ status on completions
    nvme-rdma: implement polling queue map
    nvme-fabrics: allow user to pass in nr_poll_queues
    nvme-fabrics: allow nvmf_connect_io_queue to poll
    nvme-core: optionally poll sync commands
    block: make request_to_qc_t public
    nvme-tcp: fix spelling mistake "attepmpt" -> "attempt"
    nvme-tcp: fix endianess annotations
    nvmet-tcp: fix endianess annotations
    nvme-pci: refactor nvme_poll_irqdisable to make sparse happy
    nvme-pci: only set nr_maps to 2 if poll queues are supported
    nvmet: use a macro for default error location
    nvmet: fix comparison of a u16 with -1
    blk-mq: enable IO poll if .nr_queues of type poll > 0
    blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
    blk-mq: skip zero-queue maps in blk_mq_map_swqueue
    ...

    Linus Torvalds
     

21 Dec, 2018

5 commits

  • Besides the OSD command set that never got traction, the only SCSI
    command using bidirectional buffers is XDWRITEREAD in the 10 and 32 byte
    variants, which is extremely esoteric and has been removed from the spec
    again as of SBC4r15. It probably doesn't make sense to keep the support
    code around just for that, so start deprecating the support.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • An earlier commit 7fcf2b033b84 ("blkcg: change blkg reference counting
    to use percpu_ref") moved around the release call from blkg_put() to be
    a part of the percpu_ref cleanup. Remove the additional unused code
    which should have been removed earlier.

    Signed-off-by: Dennis Zhou
    Signed-off-by: Jens Axboe

    Dennis Zhou
     
  • The implementation of blkg_tryget_closest() wasn't super obvious and
    became a point of suspicion when debugging [1]. So let's clean it up so
    it's obviously not the problem.

    Also add missing RCU read locking to bio_clone_blkg_association(), which
    got exposed by adding the RCU read lock held check in
    blkg_tryget_closest().

    [1] https://lore.kernel.org/linux-block/a7e97e4b-0dd8-3a54-23b7-a0f27b17fde8@kernel.dk/

    Signed-off-by: Dennis Zhou
    Signed-off-by: Jens Axboe

    Dennis Zhou
     
  • The Kconfig lexer supports special characters such as '.' and '/' in
    the parameter context. In my understanding, the reason is just to
    support bare file paths in the source statement.

    I do not see a good reason to complicate Kconfig for the room of
    ambiguity.

    The majority of code already surrounds file paths with double quotes,
    and it makes sense since file paths are constant string literals.

    Make it treewide consistent now.

    Signed-off-by: Masahiro Yamada
    Acked-by: Wolfram Sang
    Acked-by: Geert Uytterhoeven
    Acked-by: Ingo Molnar

    Masahiro Yamada
     
  • sbq_wake_ptr() checks sbq->ws_active to know if it needs to loop
    the wait indexes or not. This requires the use of the sbitmap
    waitqueue wrappers, but kyber doesn't use those for its domain
    token waitqueue handling.

    Convert kyber to use the helpers. This fixes a hang with waiting
    for domain tokens.

    Fixes: 5d2ee7122c73 ("sbitmap: optimize wakeup check")
    Tested-by: Ming Lei
    Reported-by: Ming Lei
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     

20 Dec, 2018

1 commit

  • blkg_lookup_create() may be called from pool_map() in which
    irq state is saved, so we have to do that in blkg_lookup_create().

    Otherwise, the following lockdep warning can be triggered:

    [ 104.258537] ================================
    [ 104.259129] WARNING: inconsistent lock state
    [ 104.259725] 4.20.0-rc6+ #545 Not tainted
    [ 104.260268] --------------------------------
    [ 104.260865] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    [ 104.261727] swapper/49/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
    [ 104.262444] 00000000db365b5d (&(&pool->lock)->rlock#3){+.?.}, at: thin_endio+0xcf/0x2a3 [dm_thin_pool]
    [ 104.263747] {SOFTIRQ-ON-W} state was registered at:
    [ 104.264417] _raw_spin_unlock_irq+0x29/0x4c
    [ 104.265014] blkg_lookup_create+0xdc/0xe6
    [ 104.265609] bio_associate_blkg_from_css+0xd3/0x13f
    [ 104.266312] bio_associate_blkg+0x15a/0x1bb
    [ 104.266913] pool_map+0xe8/0x103 [dm_thin_pool]
    [ 104.267572] __map_bio+0x98/0x29c [dm_mod]
    [ 104.268162] __split_and_process_non_flush+0x29e/0x306 [dm_mod]
    [ 104.269003] __split_and_process_bio+0x16a/0x25b [dm_mod]
    [ 104.269971] __dm_make_request.isra.14+0xdc/0x124 [dm_mod]
    [ 104.270973] generic_make_request+0x3f5/0x68b
    [ 104.271676] process_prepared_mapping+0x166/0x1ef [dm_thin_pool]
    [ 104.272531] schedule_zero+0x239/0x273 [dm_thin_pool]
    [ 104.273245] process_cell+0x60c/0x6f1 [dm_thin_pool]
    [ 104.273967] do_worker+0x60c/0xca8 [dm_thin_pool]
    [ 104.274635] process_one_work+0x4eb/0x834
    [ 104.275203] worker_thread+0x318/0x484
    [ 104.275740] kthread+0x1d1/0x1e1
    [ 104.276203] ret_from_fork+0x3a/0x50
    [ 104.276714] irq event stamp: 170003
    [ 104.277201] hardirqs last enabled at (170002): [] _raw_spin_unlock_irqrestore+0x44/0x6b
    [ 104.278535] hardirqs last disabled at (170003): [] _raw_spin_lock_irqsave+0x20/0x55
    [ 104.280273] softirqs last enabled at (169978): [] irq_enter+0x4c/0x73
    [ 104.281617] softirqs last disabled at (169979): [] irq_exit+0x7e/0x11d
    [ 104.282744]
    [ 104.282744] other info that might help us debug this:
    [ 104.283640] Possible unsafe locking scenario:
    [ 104.283640]
    [ 104.284452] CPU0
    [ 104.284803] ----
    [ 104.285150] lock(&(&pool->lock)->rlock#3);
    [ 104.285762]
    [ 104.286130] lock(&(&pool->lock)->rlock#3);
    [ 104.286750]
    [ 104.286750] *** DEADLOCK ***
    [ 104.286750]
    [ 104.287564] no locks held by swapper/49/0.
    [ 104.288129]
    [ 104.288129] stack backtrace:
    [ 104.288738] CPU: 49 PID: 0 Comm: swapper/49 Not tainted 4.20.0-rc6+ #545
    [ 104.289700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
    [ 104.290858] Call Trace:
    [ 104.291204]
    [ 104.291502] dump_stack+0x9a/0xe6
    [ 104.291968] mark_lock+0x56c/0x7a6
    [ 104.292442] ? check_usage_backwards+0x209/0x209
    [ 104.293086] __lock_acquire+0x400/0x15bf
    [ 104.293662] ? check_chain_key+0x150/0x1aa
    [ 104.294236] lock_acquire+0x1a6/0x1e3
    [ 104.294768] ? thin_endio+0xcf/0x2a3 [dm_thin_pool]
    [ 104.295444] ? _raw_spin_unlock_irqrestore+0x44/0x6b
    [ 104.296143] ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
    [ 104.297031] _raw_spin_lock_irqsave+0x46/0x55
    [ 104.297659] ? thin_endio+0xcf/0x2a3 [dm_thin_pool]
    [ 104.298335] thin_endio+0xcf/0x2a3 [dm_thin_pool]
    [ 104.298997] ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
    [ 104.299886] ? check_flags+0x20a/0x20a
    [ 104.300408] ? lock_acquire+0x1a6/0x1e3
    [ 104.300954] ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
    [ 104.301865] clone_endio+0x1bb/0x22d [dm_mod]
    [ 104.302491] ? disable_write_zeroes+0x20/0x20 [dm_mod]
    [ 104.303200] ? bio_disassociate_blkg+0xc6/0x15f
    [ 104.303836] ? bio_endio+0x2b2/0x2da
    [ 104.304349] clone_endio+0x1f3/0x22d [dm_mod]
    [ 104.304978] ? disable_write_zeroes+0x20/0x20 [dm_mod]
    [ 104.305709] ? bio_disassociate_blkg+0xc6/0x15f
    [ 104.306333] ? bio_endio+0x2b2/0x2da
    [ 104.306853] clone_endio+0x1f3/0x22d [dm_mod]
    [ 104.307476] ? disable_write_zeroes+0x20/0x20 [dm_mod]
    [ 104.308185] ? bio_disassociate_blkg+0xc6/0x15f
    [ 104.308817] ? bio_endio+0x2b2/0x2da
    [ 104.309319] blk_update_request+0x2de/0x4cc
    [ 104.309927] blk_mq_end_request+0x2a/0x183
    [ 104.310498] blk_done_softirq+0x16a/0x1a6
    [ 104.311051] ? blk_softirq_cpu_dead+0xe2/0xe2
    [ 104.311653] ? __lock_is_held+0x2a/0x87
    [ 104.312186] __do_softirq+0x250/0x4e8
    [ 104.312705] irq_exit+0x7e/0x11d
    [ 104.313157] call_function_single_interrupt+0xf/0x20
    [ 104.313860]
    [ 104.314163] RIP: 0010:native_safe_halt+0x2/0x3
    [ 104.314792] Code: 63 02 df f0 83 44 24 fc 00 48 89 df e8 cc 3f 7a ff 48 8b 03 a8 08 74 0b 65 81 25 9d 31 45 7e ff ff ff 7f 5b 5d 41 5c c3 fb f4 f4 c3 0f 1f 44 00 00 41 56 41 55 41 54 55 53 e8 a2 0d 5c ff e8
    [ 104.317339] RSP: 0018:ffff888106c9fdc0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
    [ 104.318390] RAX: 1ffff11020d92100 RBX: 0000000000000000 RCX: ffffffff81159ac7
    [ 104.319366] RDX: 1ffffffff05d5e69 RSI: 0000000000000007 RDI: ffff888106c90d1c
    [ 104.320339] RBP: 0000000000000000 R08: dffffc0000000000 R09: 0000000000000001
    [ 104.321313] R10: ffffed1025d57ba0 R11: ffffed1025d57b9f R12: 1ffff11020d93fbf
    [ 104.322328] R13: 0000000000000031 R14: ffff888106c90040 R15: 0000000000000000
    [ 104.323307] ? lockdep_hardirqs_on+0x26b/0x278
    [ 104.323927] default_idle+0xd9/0x1a8
    [ 104.324427] do_idle+0x162/0x2b2
    [ 104.324891] ? arch_cpu_idle_exit+0x28/0x28
    [ 104.325467] ? mark_held_locks+0x28/0x7f
    [ 104.326031] ? _raw_spin_unlock_irqrestore+0x44/0x6b
    [ 104.326719] cpu_startup_entry+0x1d/0x1f
    [ 104.327261] start_secondary+0x2cb/0x308
    [ 104.327806] ? set_cpu_sibling_map+0x8a3/0x8a3
    [ 104.328421] secondary_startup_64+0xa4/0xb0

    Fixes: b978962ad4f7f9 ("blkcg: update blkg_lookup_create() to do locking")
    Cc: Mike Snitzer
    Cc: Dennis Zhou
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

19 Dec, 2018

2 commits

  • Now that the the SCSI layer replaced the use of the cluster flag with
    segment size limits and the DMA boundary we can remove the cluster flag
    from the block layer.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jens Axboe
    Signed-off-by: Martin K. Petersen

    Christoph Hellwig
     
  • block consumers will need it for polling requests that
    are sent with blk_execute_rq_nowait. Also, get rid of
    blk_tag_to_qc_t and open-code it instead.

    Reviewed-by: Jens Axboe
    Signed-off-by: Sagi Grimberg
    Signed-off-by: Christoph Hellwig

    Sagi Grimberg
     

18 Dec, 2018

6 commits

  • The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues
    is bigger than zero, so enhance the constraint by checking .nr_queues of type poll
    before enabling IO poll.

    Otherwise IO race & timeout can be observed when running block/007.

    Cc: Jeff Moyer
    Cc: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • There's a single user of this function, dm, and dm just wants
    to check if IO is inflight, not that it's just allocated.

    This fixes a hang with srp/002 in blktests with dm, where it tries
    to suspend but waits for inflight IO to finish first. As it checks
    for just allocated requests, this fails.

    Tested-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • From 7e849dd9cf37 ("nvme-pci: don't share queue maps"), the mapping
    table won't be initialized actually if map->nr_queues is zero, so
    we can't use blk_mq_map_queue_type() to retrieve hctx any more.

    This way still may cause broken mapping, fix it by skipping zero-queues
    maps in blk_mq_map_swqueue().

    Cc: Jeff Moyer
    Cc: Mike Snitzer
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • The blk-iolatency controller measures the time from rq_qos_throttle() to
    rq_qos_done_bio() and attributes this time to the first bio that needs
    to create the request. This means if a bio is plug-mergeable or
    bio-mergeable, it gets to bypass the blk-iolatency controller.

    The recent series [1], to tag all bios w/ blkgs undermined how iolatency
    was determining which bios it was charging and should process in
    rq_qos_done_bio(). Because all bios are being tagged, this caused the
    atomic_t for the struct rq_wait inflight count to underflow and result
    in a stall.

    This patch adds a new flag BIO_TRACKED to let controllers know that a
    bio is going through the rq_qos path. blk-iolatency now checks if this
    flag is set to see if it should process the bio in rq_qos_done_bio().

    Overloading BLK_QUEUE_ENTERED works, but makes the flag rules confusing.
    BIO_THROTTLED was another candidate, but the flag is set for all bios
    that have gone through blk-throttle code. Overloading a flag comes with
    the burden of making sure that when either implementation changes, a
    change in setting rules for one doesn't cause a bug in the other. So
    here, we unfortunately opt for adding a new flag.

    [1] https://lore.kernel.org/lkml/20181205171039.73066-1-dennis@kernel.org/

    Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device")
    Signed-off-by: Dennis Zhou
    Cc: Josef Bacik
    Signed-off-by: Jens Axboe

    Dennis Zhou
     
  • When a request is added to rq list of sw queue(ctx), the rq may be from
    a different type of hctx, especially after multi queue mapping is
    introduced.

    So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
    blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
    hctx can be dispatched to current hctx in case that read queue or poll
    queue is enabled.

    This patch fixes this issue by introducing per-queue-type list.

    Cc: Christoph Hellwig
    Signed-off-by: Ming Lei

    Changed by me to not use separately cacheline aligned lists, just
    place them all in the same cacheline where we had just the one list
    and lock before.

    Signed-off-by: Jens Axboe

    Ming Lei
     
  • For a zoned block device using mq-deadline, if a write request for a
    zone is received while another write was already dispatched for the same
    zone, dd_dispatch_request() will return NULL and the newly inserted
    write request is kept in the scheduler queue waiting for the ongoing
    zone write to complete. With this behavior, when no other request has
    been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
    and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
    __blk_mq_free_request() call of blk_mq_sched_restart() to not run the
    queue when the already dispatched write request completes. The newly
    dispatched request stays stuck in the scheduler queue until eventually
    another request is submitted.

    This problem does not affect SCSI disk as the SCSI stack handles queue
    restart on request completion. However, this problem is can be triggered
    the nullblk driver with zoned mode enabled.

    Fix this by always requesting a queue restart in dd_dispatch_request()
    if no request was dispatched while WRITE requests are queued.

    Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support")
    Cc:
    Signed-off-by: Damien Le Moal

    Add missing export of blk_mq_sched_restart()

    Signed-off-by: Jens Axboe

    Damien Le Moal
     

17 Dec, 2018

6 commits

  • We should check if a given queue map actually has queues enabled before
    dispatching to it. This allows drivers to not initialize optional but
    not used map types, which subsequently will allow fixing problems with
    queue map rebuilds for that case.

    Reviewed-by: Ming Lei
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Now we only export hctx->type via sysfs, and there isn't such info
    in hctx entry under debugfs. We often use debugfs only to diagnose
    queue mapping issue, so add the support in debugfs.

    Queue mapping becomes a bit more complicated after multiple queue
    mapping is supported, we may write blktest to verify if queue mapping
    is valid based on blk-mq-debugfs.

    Given not necessary to export hctx->type twice, so remove the export
    from sysfs.

    Cc: Jeff Moyer
    Cc: Mike Snitzer
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Type of each element in queue mapping table is 'unsigned int,
    intead of 'struct blk_mq_queue_map)', so fix it.

    Cc: Jeff Moyer
    Cc: Mike Snitzer
    Cc: Christoph Hellwig
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • This information is helpful to either investigate issues, or understand
    wbt's internal behaviour.

    Cc: Bart Van Assche
    Cc: Omar Sandoval
    Cc: Christoph Hellwig
    Cc: Josef Bacik
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • blk-mq-debugfs has been proved as very helpful for debug some
    tough issues, such as IO hang.

    We have seen blk-wbt related IO hang several times, even inside
    Red Hat BZ, there is such report not sovled yet, so this patch
    adds support debugfs on rq_qos.

    Cc: Bart Van Assche
    Cc: Omar Sandoval
    Cc: Christoph Hellwig
    Cc: Josef Bacik
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • This prevents a HIPRI bio from being submitted through a stacking
    driver that does not support polling and thus won't poll for I/O
    completion.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

16 Dec, 2018

5 commits


14 Dec, 2018

1 commit