20 Jan, 2017

1 commit

  • commit c02ebfdddbafa9a6a0f52fbd715e6bfa229af9d3 upstream.

    Commit 0e87e58bf60e ("blk-mq: improve warning for running a queue on the
    wrong CPU") attempts to avoid triggering the WARN_ON in
    __blk_mq_run_hw_queue when the expected CPU is dead. Problem is, in the
    last batch execution before round robin, blk_mq_hctx_next_cpu can
    schedule a dead CPU and also update next_cpu to the next alive CPU in
    the mask, which will trigger the WARN_ON despite the previous
    workaround.

    The following patch fixes this scenario by always scheduling the value
    in hctx->next_cpu. This changes the moment when we round-robin the CPU
    running the hctx, but it really doesn't matter, since it still executes
    BLK_MQ_CPU_WORK_BATCH times in a row before switching to another CPU.

    Fixes: 0e87e58bf60e ("blk-mq: improve warning for running a queue on the wrong CPU")
    Signed-off-by: Gabriel Krisman Bertazi
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Gabriel Krisman Bertazi
     

06 Jan, 2017

1 commit

  • commit bc27c01b5c46d3bfec42c96537c7a3fae0bb2cc4 upstream.

    The meaning of the BLK_MQ_S_STOPPED flag is "do not call
    .queue_rq()". Hence modify blk_mq_make_request() such that requests
    are queued instead of issued if a queue has been stopped.

    Reported-by: Ming Lei
    Signed-off-by: Bart Van Assche
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Sagi Grimberg
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Bart Van Assche
     

27 Oct, 2016

1 commit

  • If we end up sleeping due to running out of requests, we should
    update the hardware and software queues in the map ctx structure.
    Otherwise we could end up having rq->mq_ctx point to the pre-sleep
    context, and risk corrupting ctx->rq_list since we'll be
    grabbing the wrong lock when inserting the request.

    Reported-by: Dave Jones
    Reported-by: Chris Mason
    Tested-by: Chris Mason
    Fixes: 63581af3f31e ("blk-mq: remove non-blocking pass in blk_mq_map_request")
    Signed-off-by: Jens Axboe

    Jens Axboe
     

10 Oct, 2016

2 commits

  • Pull blk-mq CPU hotplug update from Jens Axboe:
    "This is the conversion of blk-mq to the new hotplug state machine"

    * 'for-4.9/block-smp' of git://git.kernel.dk/linux-block:
    blk-mq: fixup "Convert to new hotplug state machine"
    blk-mq: Convert to new hotplug state machine
    blk-mq/cpu-notif: Convert to new hotplug state machine

    Linus Torvalds
     
  • Pull blk-mq irq/cpu mapping updates from Jens Axboe:
    "This is the block-irq topic branch for 4.9-rc. It's mostly from
    Christoph, and it allows drivers to specify their own mappings, and
    more importantly, to share the blk-mq mappings with the IRQ affinity
    mappings. It's a good step towards making this work better out of the
    box"

    * 'for-4.9/block-irq' of git://git.kernel.dk/linux-block:
    blk_mq: linux/blk-mq.h does not include all the headers it depends on
    blk-mq: kill unused blk_mq_create_mq_map()
    blk-mq: get rid of the cpumask in struct blk_mq_tags
    nvme: remove the post_scan callout
    nvme: switch to use pci_alloc_irq_vectors
    blk-mq: provide a default queue mapping for PCI device
    blk-mq: allow the driver to pass in a queue mapping
    blk-mq: remove ->map_queue
    blk-mq: only allocate a single mq_map per tag_set
    blk-mq: don't redistribute hardware queues on a CPU hotplug event

    Linus Torvalds
     

08 Oct, 2016

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the main pull request for block layer changes in 4.9.

    As mentioned at the last merge window, I've changed things up and now
    do just one branch for core block layer changes, and driver changes.
    This avoids dependencies between the two branches. Outside of this
    main pull request, there are two topical branches coming as well.

    This pull request contains:

    - A set of fixes, and a conversion to blk-mq, of nbd. From Josef.

    - Set of fixes and updates for lightnvm from Matias, Simon, and Arnd.
    Followup dependency fix from Geert.

    - General fixes from Bart, Baoyou, Guoqing, and Linus W.

    - CFQ async write starvation fix from Glauber.

    - Add supprot for delayed kick of the requeue list, from Mike.

    - Pull out the scalable bitmap code from blk-mq-tag.c and make it
    generally available under the name of sbitmap. Only blk-mq-tag uses
    it for now, but the blk-mq scheduling bits will use it as well.
    From Omar.

    - bdev thaw error progagation from Pierre.

    - Improve the blk polling statistics, and allow the user to clear
    them. From Stephen.

    - Set of minor cleanups from Christoph in block/blk-mq.

    - Set of cleanups and optimizations from me for block/blk-mq.

    - Various nvme/nvmet/nvmeof fixes from the various folks"

    * 'for-4.9/block' of git://git.kernel.dk/linux-block: (54 commits)
    fs/block_dev.c: return the right error in thaw_bdev()
    nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
    nvme/scsi: Remove power management support
    nvmet: Make dsm number of ranges zero based
    nvmet: Use direct IO for writes
    admin-cmd: Added smart-log command support.
    nvme-fabrics: Add host_traddr options field to host infrastructure
    nvme-fabrics: revise host transport option descriptions
    nvme-fabrics: rework nvmf_get_address() for variable options
    nbd: use BLK_MQ_F_BLOCKING
    blkcg: Annotate blkg_hint correctly
    cfq: fix starvation of asynchronous writes
    blk-mq: add flag for drivers wanting blocking ->queue_rq()
    blk-mq: remove non-blocking pass in blk_mq_map_request
    blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
    block: export bio_free_pages to other modules
    lightnvm: propagate device_add() error code
    lightnvm: expose device geometry through sysfs
    lightnvm: control life of nvm_dev in driver
    blk-mq: register device instead of disk
    ...

    Linus Torvalds
     

24 Sep, 2016

1 commit


23 Sep, 2016

3 commits


22 Sep, 2016

3 commits


17 Sep, 2016

3 commits

  • Allocating your own per-cpu allocation hint separately makes for an
    awkward API. Instead, allocate the per-cpu hint as part of the struct
    sbitmap_queue. There's no point for a struct sbitmap_queue without the
    cache, but you can still use a bare struct sbitmap.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • This is a generally useful data structure, so make it available to
    anyone else who might want to use it. It's also a nice cleanup
    separating the allocation logic from the rest of the tag handling logic.

    The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only
    selected by CONFIG_BLOCK for now.

    This should be a complete noop functionality-wise.

    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     
  • We currently account a '0' dispatch, and anything above that still falls
    below the range set by BLK_MQ_MAX_DISPATCH_ORDER. If we dispatch more,
    we don't account it.

    Change the last bucket to be inclusive of anything above the range we
    track, and have the sysfs file reflect that by including a '+' in the
    output:

    $ cat /sys/block/nvme0n1/mq/0/dispatched
    0 1006
    1 20229
    2 1
    4 0
    8 0
    16 0
    32+ 0

    Signed-off-by: Jens Axboe
    Reviewed-by: Omar Sandoval

    Jens Axboe
     

15 Sep, 2016

7 commits

  • Fixes 1b157939f92a ("blk-mq: get rid of the cpumask in struct blk_mq_tags")
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Unused now that NVMe sets up irq affinity before calling into blk-mq.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • This allows drivers specify their own queue mapping by overriding the
    setup-time function that builds the mq_map. This can be used for
    example to build the map based on the MSI-X vector mapping provided
    by the core interrupt layer for PCI devices.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • All drivers use the default, so provide an inline version of it. If we
    ever need other queue mapping we can add an optional method back,
    although supporting will also require major changes to the queue setup
    code.

    This provides better code generation, and better debugability as well.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • The mapping is identical for all queues in a tag_set, so stop wasting
    memory for building multiple. Note that for now I've kept the mq_map
    pointer in the request_queue, but we'll need to investigate if we can
    remove it without suffering too much from the additional pointer chasing.
    The same would apply to the mq_ops pointer as well.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Currently blk-mq will totally remap hardware context when a CPU hotplug
    even happened, which causes major havoc for drivers, as they are never
    told about this remapping. E.g. any carefully sorted out CPU affinity
    will just be completely messed up.

    The rebuild also doesn't really help for the common case of cpu
    hotplug, which is soft onlining / offlining of cpus - in this case we
    should just leave the queue and irq mapping as is. If it actually
    worked it would have helped in the case of physical cpu hotplug,
    although for that we'd need a way to actually notify the driver.
    Note that drivers may already be able to accommodate such a topology
    change on their own, e.g. using the reset_controller sysfs file in NVMe
    will cause the driver to get things right for this case.

    With the rebuild removed we will simplify retain the queue mapping for
    a soft offlined CPU that will work when it comes back online, and will
    map any newly onlined CPU to queue 0 until the driver initiates
    a rebuild of the queue map.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • blk_mq_delay_kick_requeue_list() provides the ability to kick the
    q->requeue_list after a specified time. To do this the request_queue's
    'requeue_work' member was changed to a delayed_work.

    blk_mq_delay_kick_requeue_list() allows DM to defer processing requeued
    requests while it doesn't make sense to immediately requeue them
    (e.g. when all paths in a DM multipath have failed).

    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

29 Aug, 2016

2 commits


25 Aug, 2016

2 commits

  • __blk_mq_run_hw_queue() currently warns if we are running the queue on a
    CPU that isn't set in its mask. However, this can happen if a CPU is
    being offlined, and the workqueue handling will place the work on CPU0
    instead. Improve the warning so that it only triggers if the batch cpu
    in the hardware queue is currently online. If it triggers for that
    case, then it's indicative of a flow problem in blk-mq, so we want to
    retain it for that case.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We do this in a few places, if the CPU is offline. This isn't allowed,
    though, since on multi queue hardware, we can't just move a request
    from one software queue to another, if they map to different hardware
    queues. The request and tag isn't valid on another hardware queue.

    This can happen if plugging races with CPU offlining. But it does
    no harm, since it can only happen in the window where we are
    currently busy freezing the queue and flushing IO, in preparation
    for redoing the software hardware queue mappings.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

08 Aug, 2016

1 commit

  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

05 Aug, 2016

1 commit

  • In case a submitted request gets stuck for some reason, the block layer
    can prevent the request starvation by starting the scheduled timeout work.
    If this stuck request occurs at the same time another thread has started
    a queue freeze, the blk_mq_timeout_work will not be able to acquire the
    queue reference and will return silently, thus not issuing the timeout.
    But since the request is already holding a q_usage_counter reference and
    is unable to complete, it will never release its reference, preventing
    the queue from completing the freeze started by first thread. This puts
    the request_queue in a hung state, forever waiting for the freeze
    completion.

    This was observed while running IO to a NVMe device at the same time we
    toggled the CPU hotplug code. Eventually, once a request got stuck
    requiring a timeout during a queue freeze, we saw the CPU Hotplug
    notification code get stuck inside blk_mq_freeze_queue_wait, as shown in
    the trace below.

    [c000000deaf13690] [c000000deaf13738] 0xc000000deaf13738 (unreliable)
    [c000000deaf13860] [c000000000015ce8] __switch_to+0x1f8/0x350
    [c000000deaf138b0] [c000000000ade0e4] __schedule+0x314/0x990
    [c000000deaf13940] [c000000000ade7a8] schedule+0x48/0xc0
    [c000000deaf13970] [c0000000005492a4] blk_mq_freeze_queue_wait+0x74/0x110
    [c000000deaf139e0] [c00000000054b6a8] blk_mq_queue_reinit_notify+0x1a8/0x2e0
    [c000000deaf13a40] [c0000000000e7878] notifier_call_chain+0x98/0x100
    [c000000deaf13a90] [c0000000000b8e08] cpu_notify_nofail+0x48/0xa0
    [c000000deaf13ac0] [c0000000000b92f0] _cpu_down+0x2a0/0x400
    [c000000deaf13b90] [c0000000000b94a8] cpu_down+0x58/0xa0
    [c000000deaf13bc0] [c0000000006d5dcc] cpu_subsys_offline+0x2c/0x50
    [c000000deaf13bf0] [c0000000006cd244] device_offline+0x104/0x140
    [c000000deaf13c30] [c0000000006cd40c] online_store+0x6c/0xc0
    [c000000deaf13c80] [c0000000006c8c78] dev_attr_store+0x68/0xa0
    [c000000deaf13cc0] [c0000000003974d0] sysfs_kf_write+0x80/0xb0
    [c000000deaf13d00] [c0000000003963e8] kernfs_fop_write+0x188/0x200
    [c000000deaf13d50] [c0000000002e0f6c] __vfs_write+0x6c/0xe0
    [c000000deaf13d90] [c0000000002e1ca0] vfs_write+0xc0/0x230
    [c000000deaf13de0] [c0000000002e2cdc] SyS_write+0x6c/0x110
    [c000000deaf13e30] [c000000000009204] system_call+0x38/0xb4

    The fix is to allow the timeout work to execute in the window between
    dropping the initial refcount reference and the release of the last
    reference, which actually marks the freeze completion. This can be
    achieved with percpu_refcount_tryget, which does not require the counter
    to be alive. This way the timeout work can do it's job and terminate a
    stuck request even during a freeze, returning its reference and avoiding
    the deadlock.

    Allowing the timeout to run is just a part of the fix, since for some
    devices, we might get stuck again inside the device driver's timeout
    handler, should it attempt to allocate a new request in that path -
    which is a quite common action for Abort commands, which need to be sent
    after a timeout. In NVMe, for instance, we call blk_mq_alloc_request
    from inside the timeout handler, which will fail during a freeze, since
    it also tries to acquire a queue reference.

    I considered a similar change to blk_mq_alloc_request as a generic
    solution for further device driver hangs, but we can't do that, since it
    would allow new requests to disturb the freeze process. I thought about
    creating a new function in the block layer to support unfreezable
    requests for these occasions, but after working on it for a while, I
    feel like this should be handled in a per-driver basis. I'm now
    experimenting with changes to the NVMe timeout path, but I'm open to
    suggestions of ways to make this generic.

    Signed-off-by: Gabriel Krisman Bertazi
    Cc: Brian King
    Cc: Keith Busch
    Cc: linux-nvme@lists.infradead.org
    Cc: linux-block@vger.kernel.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Gabriel Krisman Bertazi
     

27 Jul, 2016

2 commits

  • Pull block driver updates from Jens Axboe:
    "This branch also contains core changes. I've come to the conclusion
    that from 4.9 and forward, I'll be doing just a single branch. We
    often have dependencies between core and drivers, and it's hard to
    always split them up appropriately without pulling core into drivers
    when that happens.

    That said, this contains:

    - separate secure erase type for the core block layer, from
    Christoph.

    - set of discard fixes, from Christoph.

    - bio shrinking fixes from Christoph, as a followup up to the
    op/flags change in the core branch.

    - map and append request fixes from Christoph.

    - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
    exciting!

    - nvme-loop fixes from Arnd.

    - removal of ->driverfs_dev from Dan, after providing a
    device_add_disk() helper.

    - bcache fixes from Bhaktipriya and Yijing.

    - cdrom subchannel read fix from Vchannaiah.

    - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

    - set of drbd updates and fixes from Fabian, Lars, and Philipp.

    - mg_disk error path fix from Bart.

    - user notification for failed device add for loop, from Minfei.

    - NVMe in general:
    + NVMe delay quirk from Guilherme.
    + SR-IOV support and command retry limits from Keith.
    + fix for memory-less NUMA node from Masayoshi.
    + use UINT_MAX for discard sectors, from Minfei.
    + cancel IO fixes from Ming.
    + don't allocate unused major, from Neil.
    + error code fixup from Dan.
    + use constants for PSDT/FUSE from James.
    + variable init fix from Jay.
    + fabrics fixes from Ming, Sagi, and Wei.
    + various fixes"

    * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
    nvme/pci: Provide SR-IOV support
    nvme: initialize variable before logical OR'ing it
    block: unexport various bio mapping helpers
    scsi/osd: open code blk_make_request
    target: stop using blk_make_request
    block: simplify and export blk_rq_append_bio
    block: ensure bios return from blk_get_request are properly initialized
    virtio_blk: use blk_rq_map_kern
    memstick: don't allow REQ_TYPE_BLOCK_PC requests
    block: shrink bio size again
    block: simplify and cleanup bvec pool handling
    block: get rid of bio_rw and READA
    block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
    block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
    NVMe: don't allocate unused nvme_major
    nvme: avoid crashes when node 0 is memoryless node.
    nvme: Limit command retries
    loop: Make user notify for adding loop device failed
    nvme-loop: fix nvme-loop Kconfig dependencies
    nvmet: fix return value check in nvmet_subsys_alloc()
    ...

    Linus Torvalds
     
  • Pull core block updates from Jens Axboe:

    - the big change is the cleanup from Mike Christie, cleaning up our
    uses of command types and modified flags. This is what will throw
    some merge conflicts

    - regression fix for the above for btrfs, from Vincent

    - following up to the above, better packing of struct request from
    Christoph

    - a 2038 fix for blktrace from Arnd

    - a few trivial/spelling fixes from Bart Van Assche

    - a front merge check fix from Damien, which could cause issues on
    SMR drives

    - Atari partition fix from Gabriel

    - convert cfq to highres timers, since jiffies isn't granular enough
    for some devices these days. From Jan and Jeff

    - CFQ priority boost fix idle classes, from me

    - cleanup series from Ming, improving our bio/bvec iteration

    - a direct issue fix for blk-mq from Omar

    - fix for plug merging not involving the IO scheduler, like we do for
    other types of merges. From Tahsin

    - expose DAX type internally and through sysfs. From Toshi and Yigal

    * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
    block: Fix front merge check
    block: do not merge requests without consulting with io scheduler
    block: Fix spelling in a source code comment
    block: expose QUEUE_FLAG_DAX in sysfs
    block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
    Btrfs: fix comparison in __btrfs_map_block()
    block: atari: Return early for unsupported sector size
    Doc: block: Fix a typo in queue-sysfs.txt
    cfq-iosched: Charge at least 1 jiffie instead of 1 ns
    cfq-iosched: Fix regression in bonnie++ rewrite performance
    cfq-iosched: Convert slice_resid from u64 to s64
    block: Convert fifo_time from ulong to u64
    blktrace: avoid using timespec
    block/blk-cgroup.c: Declare local symbols static
    block/bio-integrity.c: Add #include "blk.h"
    block/partition-generic.c: Remove a set-but-not-used variable
    block: bio: kill BIO_MAX_SIZE
    cfq-iosched: temporarily boost queue priority for idle classes
    block: drbd: avoid to use BIO_MAX_SIZE
    block: bio: remove BIO_MAX_SECTORS
    ...

    Linus Torvalds
     

21 Jul, 2016

1 commit

  • blk_get_request is used for BLOCK_PC and similar passthrough requests.
    Currently we always need to call blk_rq_set_block_pc or an open coded
    version of it to allow appending bios using the request mapping helpers
    later on, which is a somewhat awkward API. Instead move the
    initialization part of blk_rq_set_block_pc into blk_get_request, so that
    we always have a safe to use request.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

06 Jul, 2016

1 commit

  • For some protocols like NVMe over Fabrics we need to be able to send
    initialization commands to a specific queue.

    Based on an earlier patch from Christoph Hellwig .

    Signed-off-by: Ming Lin
    [hch: disallow sleeping allocation, req_op fixes]
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Keith Busch
    Signed-off-by: Jens Axboe

    Ming Lin
     

09 Jun, 2016

1 commit

  • If ->queue_rq() returns BLK_MQ_RQ_QUEUE_OK, we use continue and skip
    over the rest of the loop body. However, dptr is assigned later in the
    loop body, and the BLK_MQ_RQ_QUEUE_OK case is exactly the case that we'd
    want it for.

    NVMe isn't actually using BLK_MQ_F_DEFER_ISSUE yet, nor is any other
    in-tree driver, but if the code's going to be there, it might as well
    work.

    Fixes: 74c450521dd8 ("blk-mq: add a 'list' parameter to ->queue_rq()")
    Signed-off-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

08 Jun, 2016

3 commits

  • To avoid confusion between REQ_OP_FLUSH, which is handled by
    request_fn drivers, and upper layers requesting the block layer
    perform a flush sequence along with possibly a WRITE, this patch
    renames REQ_FLUSH to REQ_PREFLUSH.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • This patch converts the is_sync helpers to use separate variables
    for the operation and flags.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • This patch modifies the blk mq request creation code to use
    separate variables for the operation and flags, because in the
    the next patches the struct request users will be converted like
    was done for bios.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     

03 Jun, 2016

1 commit

  • Commit 0809e3ac6231 ("block: fix plug list flushing for nomerge queues")
    updated blk_mq_make_request() to set request_count even when
    blk_queue_nomerges() returns true. However, blk_mq_make_request() only
    does limited plugging and doesn't use request_count;
    blk_sq_make_request() is the one that should have been fixed. Do that
    and get rid of the unnecessary work in the mq version.

    Fixes: 0809e3ac6231 ("block: fix plug list flushing for nomerge queues")
    Signed-off-by: Omar Sandoval
    Reviewed-by: Ming Lei
    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Omar Sandoval
     

26 May, 2016

1 commit

  • blk_mq_init_queue() calls blk_mq_init_allocated_queue(), but q->mq_ops
    was not cleared when blk_mq_init_allocated_queue() fails.
    Then blk_cleanup_queue() calls blk_mq_free_queue() which will crash because:
    - q->all_q_node is not added to all_q_list yet
    - q->tag_set is NULL
    - hctx was not setup yet or already freed

    Fixed it by clearing q->mq_ops on error path.

    Signed-off-by: Ming Lin
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Ming Lin
     

16 May, 2016

1 commit

  • When this_order variable in blk_mq_init_rq_map() becomes zero
    the code incorrectly decrements the variable and passes the result
    to order_to_size() helper causing undefined behaviour:

    UBSAN: Undefined behaviour in block/blk-mq.c:1459:27
    shift exponent 4294967295 is too large for 32-bit type 'unsigned int'
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-00072-g33656a1 #22

    Fix the code by checking this_order variable for not having the zero
    value first.

    Reported-by: Meelis Roos
    Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism")
    Signed-off-by: Bartlomiej Zolnierkiewicz
    Signed-off-by: Jens Axboe

    Bartlomiej Zolnierkiewicz