Eric Lee / smarc-fsl-linux-kernel

20 Jan, 2017

1 commit

d9c19f90f blk-mq: Always schedule hctx->next_cpu ... Browse Code »

commit c02ebfdddbafa9a6a0f52fbd715e6bfa229af9d3 upstream.

Commit 0e87e58bf60e ("blk-mq: improve warning for running a queue on the
wrong CPU") attempts to avoid triggering the WARN_ON in
__blk_mq_run_hw_queue when the expected CPU is dead. Problem is, in the
last batch execution before round robin, blk_mq_hctx_next_cpu can
schedule a dead CPU and also update next_cpu to the next alive CPU in
the mask, which will trigger the WARN_ON despite the previous
workaround.

The following patch fixes this scenario by always scheduling the value
in hctx->next_cpu. This changes the moment when we round-robin the CPU
running the hctx, but it really doesn't matter, since it still executes
BLK_MQ_CPU_WORK_BATCH times in a row before switching to another CPU.

Fixes: 0e87e58bf60e ("blk-mq: improve warning for running a queue on the wrong CPU")
Signed-off-by: Gabriel Krisman Bertazi
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Gabriel Krisman Bertazi
2017-01-20 03:18:07 +0800

06 Jan, 2017

1 commit

67b0069a5 blk-mq: Do not invoke .queue_rq() for a stopped queue ... Browse Code »

commit bc27c01b5c46d3bfec42c96537c7a3fae0bb2cc4 upstream.

The meaning of the BLK_MQ_S_STOPPED flag is "do not call
.queue_rq()". Hence modify blk_mq_make_request() such that requests
are queued instead of issued if a queue has been stopped.

Reported-by: Ming Lei
Signed-off-by: Bart Van Assche
Reviewed-by: Christoph Hellwig
Reviewed-by: Ming Lei
Reviewed-by: Hannes Reinecke
Reviewed-by: Johannes Thumshirn
Reviewed-by: Sagi Grimberg
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2017-01-06 17:40:15 +0800

27 Oct, 2016

1 commit

7fe311302 blk-mq: update hardware and software queues for sleeping alloc ... Browse Code »

If we end up sleeping due to running out of requests, we should
update the hardware and software queues in the map ctx structure.
Otherwise we could end up having rq->mq_ctx point to the pre-sleep
context, and risk corrupting ctx->rq_list since we'll be
grabbing the wrong lock when inserting the request.

Reported-by: Dave Jones
Reported-by: Chris Mason
Tested-by: Chris Mason
Fixes: 63581af3f31e ("blk-mq: remove non-blocking pass in blk_mq_map_request")
Signed-off-by: Jens Axboe

Jens Axboe
2016-10-27 23:56:03 +0800

10 Oct, 2016

2 commits

24532f768 Merge branch 'for-4.9/block-smp' of git://git.kernel.dk/linux-block ... Browse Code »

Pull blk-mq CPU hotplug update from Jens Axboe:
"This is the conversion of blk-mq to the new hotplug state machine"

* 'for-4.9/block-smp' of git://git.kernel.dk/linux-block:
blk-mq: fixup "Convert to new hotplug state machine"
blk-mq: Convert to new hotplug state machine
blk-mq/cpu-notif: Convert to new hotplug state machine

Linus Torvalds
2016-10-10 08:32:20 +0800
12e3d3cdd Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block ... Browse Code »

Pull blk-mq irq/cpu mapping updates from Jens Axboe:
"This is the block-irq topic branch for 4.9-rc. It's mostly from
Christoph, and it allows drivers to specify their own mappings, and
more importantly, to share the blk-mq mappings with the IRQ affinity
mappings. It's a good step towards making this work better out of the
box"

* 'for-4.9/block-irq' of git://git.kernel.dk/linux-block:
blk_mq: linux/blk-mq.h does not include all the headers it depends on
blk-mq: kill unused blk_mq_create_mq_map()
blk-mq: get rid of the cpumask in struct blk_mq_tags
nvme: remove the post_scan callout
nvme: switch to use pci_alloc_irq_vectors
blk-mq: provide a default queue mapping for PCI device
blk-mq: allow the driver to pass in a queue mapping
blk-mq: remove ->map_queue
blk-mq: only allocate a single mq_map per tag_set
blk-mq: don't redistribute hardware queues on a CPU hotplug event

Linus Torvalds
2016-10-10 08:29:33 +0800

08 Oct, 2016

1 commit

513a4befa Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block layer updates from Jens Axboe:
"This is the main pull request for block layer changes in 4.9.

As mentioned at the last merge window, I've changed things up and now
do just one branch for core block layer changes, and driver changes.
This avoids dependencies between the two branches. Outside of this
main pull request, there are two topical branches coming as well.

This pull request contains:

- A set of fixes, and a conversion to blk-mq, of nbd. From Josef.

- Set of fixes and updates for lightnvm from Matias, Simon, and Arnd.
Followup dependency fix from Geert.

- General fixes from Bart, Baoyou, Guoqing, and Linus W.

- CFQ async write starvation fix from Glauber.

- Add supprot for delayed kick of the requeue list, from Mike.

- Pull out the scalable bitmap code from blk-mq-tag.c and make it
generally available under the name of sbitmap. Only blk-mq-tag uses
it for now, but the blk-mq scheduling bits will use it as well.
From Omar.

- bdev thaw error progagation from Pierre.

- Improve the blk polling statistics, and allow the user to clear
them. From Stephen.

- Set of minor cleanups from Christoph in block/blk-mq.

- Set of cleanups and optimizations from me for block/blk-mq.

- Various nvme/nvmet/nvmeof fixes from the various folks"

* 'for-4.9/block' of git://git.kernel.dk/linux-block: (54 commits)
fs/block_dev.c: return the right error in thaw_bdev()
nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
nvme/scsi: Remove power management support
nvmet: Make dsm number of ranges zero based
nvmet: Use direct IO for writes
admin-cmd: Added smart-log command support.
nvme-fabrics: Add host_traddr options field to host infrastructure
nvme-fabrics: revise host transport option descriptions
nvme-fabrics: rework nvmf_get_address() for variable options
nbd: use BLK_MQ_F_BLOCKING
blkcg: Annotate blkg_hint correctly
cfq: fix starvation of asynchronous writes
blk-mq: add flag for drivers wanting blocking ->queue_rq()
blk-mq: remove non-blocking pass in blk_mq_map_request
blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
block: export bio_free_pages to other modules
lightnvm: propagate device_add() error code
lightnvm: expose device geometry through sysfs
lightnvm: control life of nvm_dev in driver
blk-mq: register device instead of disk
...

Linus Torvalds
2016-10-08 05:42:05 +0800

24 Sep, 2016

1 commit

c8712c6a6 blk-mq: skip unmapped queues in blk_mq_alloc_request_hctx ... Browse Code »

This provides the caller a feedback that a given hctx is not mapped and thus
no command can be sent on it.

Signed-off-by: Christoph Hellwig
Tested-by: Steve Wise
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-09-24 00:25:48 +0800

23 Sep, 2016

3 commits

97a32864e blk-mq: fixup "Convert to new hotplug state machine" ... Browse Code »

The "blk_mq_queue_reinit_dead()" just cleared the cpumask instead doing
a copy. Since we might never had an online callback we could end up with
a ZERO mask which in turn leads to crash as test robot demonstarted.

Fixes: 65d5291eee66 ("blk-mq: Convert to new hotplug state machine")
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Jens Axboe

Sebastian Andrzej Siewior
2016-09-23 23:49:32 +0800
1b792f2f9 blk-mq: add flag for drivers wanting blocking ->queue_rq() ... Browse Code »

If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its
->queue_rq() handler. For that case, blk-mq ensures that we always
calls it from a safe context.

Signed-off-by: Jens Axboe
Tested-by: Josef Bacik

Jens Axboe
2016-09-23 04:28:38 +0800
63581af3f blk-mq: remove non-blocking pass in blk_mq_map_request ... Browse Code »

bt_get already does a non-blocking pass as well as running the queue
when scheduling internally, no need to duplicate it.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-09-23 04:27:39 +0800

22 Sep, 2016

3 commits

841bac2c8 blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue() ... Browse Code »

Two cases:

1) blk_mq_alloc_request() needlessly re-runs the queue, after
calling into the tag allocation without NOWAIT set. We don't
need to do that.

2) blk_mq_map_request() should just use blk_mq_run_hw_queue() with
the async flag set to false.

Signed-off-by: Jens Axboe
Reviewed-by: Christoph Hellwig

Jens Axboe
2016-09-22 23:39:53 +0800
65d5291ee blk-mq: Convert to new hotplug state machine ... Browse Code »

Install the callbacks via the state machine so we can phase out the cpu
hotplug notifiers mess.

Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: linux-block@vger.kernel.org
Cc: rt@linutronix.de
Cc: Christoph Hellwing
Link: http://lkml.kernel.org/r/20160919212601.180033814@linutronix.de
Signed-off-by: Thomas Gleixner
Signed-off-by: Jens Axboe

Sebastian Andrzej Siewior
2016-09-22 22:05:19 +0800
9467f8596 blk-mq/cpu-notif: Convert to new hotplug state machine ... Browse Code »

Replace the block-mq notifier list management with the multi instance
facility in the cpu hotplug state machine.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: linux-block@vger.kernel.org
Cc: rt@linutronix.de
Cc: Christoph Hellwing
Signed-off-by: Jens Axboe

Thomas Gleixner
2016-09-22 22:05:17 +0800

17 Sep, 2016

3 commits

40aabb674 sbitmap: push per-cpu last_tag into sbitmap_queue ... Browse Code »

Allocating your own per-cpu allocation hint separately makes for an
awkward API. Instead, allocate the per-cpu hint as part of the struct
sbitmap_queue. There's no point for a struct sbitmap_queue without the
cache, but you can still use a bare struct sbitmap.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2016-09-17 22:39:10 +0800
88459642c blk-mq: abstract tag allocation out into sbitmap library ... Browse Code »

This is a generally useful data structure, so make it available to
anyone else who might want to use it. It's also a nice cleanup
separating the allocation logic from the rest of the tag handling logic.

The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only
selected by CONFIG_BLOCK for now.

This should be a complete noop functionality-wise.

Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2016-09-17 22:38:44 +0800
703fd1c0f blk-mq: account higher order dispatch ... Browse Code »

We currently account a '0' dispatch, and anything above that still falls
below the range set by BLK_MQ_MAX_DISPATCH_ORDER. If we dispatch more,
we don't account it.

Change the last bucket to be inclusive of anything above the range we
track, and have the sysfs file reflect that by including a '+' in the
output:

$ cat /sys/block/nvme0n1/mq/0/dispatched
0 1006
1 20229
2 1
4 0
8 0
16 0
32+ 0

Signed-off-by: Jens Axboe
Reviewed-by: Omar Sandoval

Jens Axboe
2016-09-17 04:03:04 +0800

15 Sep, 2016

7 commits

9151bcb4f blk-mq: kill unused blk_mq_create_mq_map() ... Browse Code »

Fixes 1b157939f92a ("blk-mq: get rid of the cpumask in struct blk_mq_tags")
Signed-off-by: Jens Axboe

Jens Axboe
2016-09-15 22:45:45 +0800
1b157939f blk-mq: get rid of the cpumask in struct blk_mq_tags ... Browse Code »

Unused now that NVMe sets up irq affinity before calling into blk-mq.

Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-09-15 22:42:03 +0800
da695ba23 blk-mq: allow the driver to pass in a queue mapping ... Browse Code »

This allows drivers specify their own queue mapping by overriding the
setup-time function that builds the mq_map. This can be used for
example to build the map based on the MSI-X vector mapping provided
by the core interrupt layer for PCI devices.

Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-09-15 22:42:03 +0800
7d7e0f90b blk-mq: remove ->map_queue ... Browse Code »

All drivers use the default, so provide an inline version of it. If we
ever need other queue mapping we can add an optional method back,
although supporting will also require major changes to the queue setup
code.

This provides better code generation, and better debugability as well.

Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-09-15 22:42:03 +0800
bdd17e75c blk-mq: only allocate a single mq_map per tag_set ... Browse Code »

The mapping is identical for all queues in a tag_set, so stop wasting
memory for building multiple. Note that for now I've kept the mq_map
pointer in the request_queue, but we'll need to investigate if we can
remove it without suffering too much from the additional pointer chasing.
The same would apply to the mq_ops pointer as well.

Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-09-15 22:42:03 +0800
4e68a0114 blk-mq: don't redistribute hardware queues on a CPU hotplug event ... Browse Code »

Currently blk-mq will totally remap hardware context when a CPU hotplug
even happened, which causes major havoc for drivers, as they are never
told about this remapping. E.g. any carefully sorted out CPU affinity
will just be completely messed up.

The rebuild also doesn't really help for the common case of cpu
hotplug, which is soft onlining / offlining of cpus - in this case we
should just leave the queue and irq mapping as is. If it actually
worked it would have helped in the case of physical cpu hotplug,
although for that we'd need a way to actually notify the driver.
Note that drivers may already be able to accommodate such a topology
change on their own, e.g. using the reset_controller sysfs file in NVMe
will cause the driver to get things right for this case.

With the rebuild removed we will simplify retain the queue mapping for
a soft offlined CPU that will work when it comes back online, and will
map any newly onlined CPU to queue 0 until the driver initiates
a rebuild of the queue map.

Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-09-15 22:42:03 +0800
2849450ad blk-mq: introduce blk_mq_delay_kick_requeue_list() ... Browse Code »

blk_mq_delay_kick_requeue_list() provides the ability to kick the
q->requeue_list after a specified time. To do this the request_queue's
'requeue_work' member was changed to a delayed_work.

blk_mq_delay_kick_requeue_list() allows DM to defer processing requeued
requests while it doesn't make sense to immediately requeue them
(e.g. when all paths in a DM multipath have failed).

Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mike Snitzer
2016-09-15 01:48:34 +0800

29 Aug, 2016

2 commits

88c7b2b75 blk-mq: prefetch request in blk_mq_tag_to_rq() ... Browse Code »

When drivers or the core calls this function, they usually
dereference the request shortly there after. Prefetch the first
cache line.

Profiling IO workloads shows that this is the most common cache
miss on the block side of things.

Signed-off-by: Jens Axboe

Jens Axboe
2016-08-29 22:13:21 +0800
27489a3c8 blk-mq: turn hctx->run_work into a regular work struct ... Browse Code »

We don't need the larger delayed work struct, since we always run it
immediately.

Signed-off-by: Jens Axboe

Jens Axboe
2016-08-29 22:13:21 +0800

25 Aug, 2016

2 commits

0e87e58bf blk-mq: improve warning for running a queue on the wrong CPU ... Browse Code »

__blk_mq_run_hw_queue() currently warns if we are running the queue on a
CPU that isn't set in its mask. However, this can happen if a CPU is
being offlined, and the workqueue handling will place the work on CPU0
instead. Improve the warning so that it only triggers if the batch cpu
in the hardware queue is currently online. If it triggers for that
case, then it's indicative of a flow problem in blk-mq, so we want to
retain it for that case.

Signed-off-by: Jens Axboe

Jens Axboe
2016-08-25 05:38:01 +0800
e57690fe0 blk-mq: don't overwrite rq->mq_ctx ... Browse Code »

We do this in a few places, if the CPU is offline. This isn't allowed,
though, since on multi queue hardware, we can't just move a request
from one software queue to another, if they map to different hardware
queues. The request and tag isn't valid on another hardware queue.

This can happen if plugging races with CPU offlining. But it does
no harm, since it can only happen in the window where we are
currently busy freezing the queue and flushing IO, in preparation
for redoing the software hardware queue mappings.

Signed-off-by: Jens Axboe

Jens Axboe
2016-08-25 05:34:35 +0800

08 Aug, 2016

1 commit

1eff9d322 block: rename bio bi_rw to bi_opf ... Browse Code »

Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe

Jens Axboe
2016-08-08 04:41:02 +0800

05 Aug, 2016

1 commit

71f79fb31 blk-mq: Allow timeouts to run while queue is freezing ... Browse Code »

In case a submitted request gets stuck for some reason, the block layer
can prevent the request starvation by starting the scheduled timeout work.
If this stuck request occurs at the same time another thread has started
a queue freeze, the blk_mq_timeout_work will not be able to acquire the
queue reference and will return silently, thus not issuing the timeout.
But since the request is already holding a q_usage_counter reference and
is unable to complete, it will never release its reference, preventing
the queue from completing the freeze started by first thread. This puts
the request_queue in a hung state, forever waiting for the freeze
completion.

This was observed while running IO to a NVMe device at the same time we
toggled the CPU hotplug code. Eventually, once a request got stuck
requiring a timeout during a queue freeze, we saw the CPU Hotplug
notification code get stuck inside blk_mq_freeze_queue_wait, as shown in
the trace below.

[c000000deaf13690] [c000000deaf13738] 0xc000000deaf13738 (unreliable)
[c000000deaf13860] [c000000000015ce8] __switch_to+0x1f8/0x350
[c000000deaf138b0] [c000000000ade0e4] __schedule+0x314/0x990
[c000000deaf13940] [c000000000ade7a8] schedule+0x48/0xc0
[c000000deaf13970] [c0000000005492a4] blk_mq_freeze_queue_wait+0x74/0x110
[c000000deaf139e0] [c00000000054b6a8] blk_mq_queue_reinit_notify+0x1a8/0x2e0
[c000000deaf13a40] [c0000000000e7878] notifier_call_chain+0x98/0x100
[c000000deaf13a90] [c0000000000b8e08] cpu_notify_nofail+0x48/0xa0
[c000000deaf13ac0] [c0000000000b92f0] _cpu_down+0x2a0/0x400
[c000000deaf13b90] [c0000000000b94a8] cpu_down+0x58/0xa0
[c000000deaf13bc0] [c0000000006d5dcc] cpu_subsys_offline+0x2c/0x50
[c000000deaf13bf0] [c0000000006cd244] device_offline+0x104/0x140
[c000000deaf13c30] [c0000000006cd40c] online_store+0x6c/0xc0
[c000000deaf13c80] [c0000000006c8c78] dev_attr_store+0x68/0xa0
[c000000deaf13cc0] [c0000000003974d0] sysfs_kf_write+0x80/0xb0
[c000000deaf13d00] [c0000000003963e8] kernfs_fop_write+0x188/0x200
[c000000deaf13d50] [c0000000002e0f6c] __vfs_write+0x6c/0xe0
[c000000deaf13d90] [c0000000002e1ca0] vfs_write+0xc0/0x230
[c000000deaf13de0] [c0000000002e2cdc] SyS_write+0x6c/0x110
[c000000deaf13e30] [c000000000009204] system_call+0x38/0xb4

The fix is to allow the timeout work to execute in the window between
dropping the initial refcount reference and the release of the last
reference, which actually marks the freeze completion. This can be
achieved with percpu_refcount_tryget, which does not require the counter
to be alive. This way the timeout work can do it's job and terminate a
stuck request even during a freeze, returning its reference and avoiding
the deadlock.

Allowing the timeout to run is just a part of the fix, since for some
devices, we might get stuck again inside the device driver's timeout
handler, should it attempt to allocate a new request in that path -
which is a quite common action for Abort commands, which need to be sent
after a timeout. In NVMe, for instance, we call blk_mq_alloc_request
from inside the timeout handler, which will fail during a freeze, since
it also tries to acquire a queue reference.

I considered a similar change to blk_mq_alloc_request as a generic
solution for further device driver hangs, but we can't do that, since it
would allow new requests to disturb the freeze process. I thought about
creating a new function in the block layer to support unfreezable
requests for these occasions, but after working on it for a while, I
feel like this should be handled in a per-driver basis. I'm now
experimenting with changes to the NVMe timeout path, but I'm open to
suggestions of ways to make this generic.

Signed-off-by: Gabriel Krisman Bertazi
Cc: Brian King
Cc: Keith Busch
Cc: linux-nvme@lists.infradead.org
Cc: linux-block@vger.kernel.org
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Gabriel Krisman Bertazi
2016-08-05 04:19:16 +0800

27 Jul, 2016

2 commits

3fc9d6909 Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"This branch also contains core changes. I've come to the conclusion
that from 4.9 and forward, I'll be doing just a single branch. We
often have dependencies between core and drivers, and it's hard to
always split them up appropriately without pulling core into drivers
when that happens.

That said, this contains:

- separate secure erase type for the core block layer, from
Christoph.

- set of discard fixes, from Christoph.

- bio shrinking fixes from Christoph, as a followup up to the
op/flags change in the core branch.

- map and append request fixes from Christoph.

- NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
exciting!

- nvme-loop fixes from Arnd.

- removal of ->driverfs_dev from Dan, after providing a
device_add_disk() helper.

- bcache fixes from Bhaktipriya and Yijing.

- cdrom subchannel read fix from Vchannaiah.

- set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

- set of drbd updates and fixes from Fabian, Lars, and Philipp.

- mg_disk error path fix from Bart.

- user notification for failed device add for loop, from Minfei.

- NVMe in general:
+ NVMe delay quirk from Guilherme.
+ SR-IOV support and command retry limits from Keith.
+ fix for memory-less NUMA node from Masayoshi.
+ use UINT_MAX for discard sectors, from Minfei.
+ cancel IO fixes from Ming.
+ don't allocate unused major, from Neil.
+ error code fixup from Dan.
+ use constants for PSDT/FUSE from James.
+ variable init fix from Jay.
+ fabrics fixes from Ming, Sagi, and Wei.
+ various fixes"

* 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
nvme/pci: Provide SR-IOV support
nvme: initialize variable before logical OR'ing it
block: unexport various bio mapping helpers
scsi/osd: open code blk_make_request
target: stop using blk_make_request
block: simplify and export blk_rq_append_bio
block: ensure bios return from blk_get_request are properly initialized
virtio_blk: use blk_rq_map_kern
memstick: don't allow REQ_TYPE_BLOCK_PC requests
block: shrink bio size again
block: simplify and cleanup bvec pool handling
block: get rid of bio_rw and READA
block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
NVMe: don't allocate unused nvme_major
nvme: avoid crashes when node 0 is memoryless node.
nvme: Limit command retries
loop: Make user notify for adding loop device failed
nvme-loop: fix nvme-loop Kconfig dependencies
nvmet: fix return value check in nvmet_subsys_alloc()
...

Linus Torvalds
2016-07-27 06:37:51 +0800
d05d7f407 Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:

- the big change is the cleanup from Mike Christie, cleaning up our
uses of command types and modified flags. This is what will throw
some merge conflicts

- regression fix for the above for btrfs, from Vincent

- following up to the above, better packing of struct request from
Christoph

- a 2038 fix for blktrace from Arnd

- a few trivial/spelling fixes from Bart Van Assche

- a front merge check fix from Damien, which could cause issues on
SMR drives

- Atari partition fix from Gabriel

- convert cfq to highres timers, since jiffies isn't granular enough
for some devices these days. From Jan and Jeff

- CFQ priority boost fix idle classes, from me

- cleanup series from Ming, improving our bio/bvec iteration

- a direct issue fix for blk-mq from Omar

- fix for plug merging not involving the IO scheduler, like we do for
other types of merges. From Tahsin

- expose DAX type internally and through sysfs. From Toshi and Yigal

* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
block: Fix front merge check
block: do not merge requests without consulting with io scheduler
block: Fix spelling in a source code comment
block: expose QUEUE_FLAG_DAX in sysfs
block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
Btrfs: fix comparison in __btrfs_map_block()
block: atari: Return early for unsupported sector size
Doc: block: Fix a typo in queue-sysfs.txt
cfq-iosched: Charge at least 1 jiffie instead of 1 ns
cfq-iosched: Fix regression in bonnie++ rewrite performance
cfq-iosched: Convert slice_resid from u64 to s64
block: Convert fifo_time from ulong to u64
blktrace: avoid using timespec
block/blk-cgroup.c: Declare local symbols static
block/bio-integrity.c: Add #include "blk.h"
block/partition-generic.c: Remove a set-but-not-used variable
block: bio: kill BIO_MAX_SIZE
cfq-iosched: temporarily boost queue priority for idle classes
block: drbd: avoid to use BIO_MAX_SIZE
block: bio: remove BIO_MAX_SECTORS
...

Linus Torvalds
2016-07-27 06:03:07 +0800

21 Jul, 2016

1 commit

0c4de0f33 block: ensure bios return from blk_get_request are properly initialized ... Browse Code »

blk_get_request is used for BLOCK_PC and similar passthrough requests.
Currently we always need to call blk_rq_set_block_pc or an open coded
version of it to allow appending bios using the request mapping helpers
later on, which is a somewhat awkward API. Instead move the
initialization part of blk_rq_set_block_pc into blk_get_request, so that
we always have a safe to use request.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-07-21 07:38:30 +0800

06 Jul, 2016

1 commit

1f5bd336b blk-mq: add blk_mq_alloc_request_hctx ... Browse Code »

For some protocols like NVMe over Fabrics we need to be able to send
initialization commands to a specific queue.

Based on an earlier patch from Christoph Hellwig .

Signed-off-by: Ming Lin
[hch: disallow sleeping allocation, req_op fixes]
Signed-off-by: Christoph Hellwig
Reviewed-by: Keith Busch
Signed-off-by: Jens Axboe

Ming Lin
2016-07-06 01:28:07 +0800

09 Jun, 2016

1 commit

52b9c330c blk-mq: actually hook up defer list when running requests ... Browse Code »

If ->queue_rq() returns BLK_MQ_RQ_QUEUE_OK, we use continue and skip
over the rest of the loop body. However, dptr is assigned later in the
loop body, and the BLK_MQ_RQ_QUEUE_OK case is exactly the case that we'd
want it for.

NVMe isn't actually using BLK_MQ_F_DEFER_ISSUE yet, nor is any other
in-tree driver, but if the code's going to be there, it might as well
work.

Fixes: 74c450521dd8 ("blk-mq: add a 'list' parameter to ->queue_rq()")
Signed-off-by: Omar Sandoval
Signed-off-by: Jens Axboe

Omar Sandoval
2016-06-09 23:55:15 +0800

08 Jun, 2016

3 commits

28a8f0d31 block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH ... Browse Code »

To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
d9d8c5c48 block: convert is_sync helpers to use REQ_OPs. ... Browse Code »

This patch converts the is_sync helpers to use separate variables
for the operation and flags.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
cc6e3b109 block: prepare mq request creation to use REQ_OPs ... Browse Code »

This patch modifies the blk mq request creation code to use
separate variables for the operation and flags, because in the
the next patches the struct request users will be converted like
was done for bios.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800

03 Jun, 2016

1 commit

87c279e61 blk-mq: really fix plug list flushing for nomerge queues ... Browse Code »

Commit 0809e3ac6231 ("block: fix plug list flushing for nomerge queues")
updated blk_mq_make_request() to set request_count even when
blk_queue_nomerges() returns true. However, blk_mq_make_request() only
does limited plugging and doesn't use request_count;
blk_sq_make_request() is the one that should have been fixed. Do that
and get rid of the unnecessary work in the mq version.

Fixes: 0809e3ac6231 ("block: fix plug list flushing for nomerge queues")
Signed-off-by: Omar Sandoval
Reviewed-by: Ming Lei
Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe

Omar Sandoval
2016-06-03 01:47:32 +0800

26 May, 2016

1 commit

c7de57263 blk-mq: clear q->mq_ops if init fail ... Browse Code »

blk_mq_init_queue() calls blk_mq_init_allocated_queue(), but q->mq_ops
was not cleared when blk_mq_init_allocated_queue() fails.
Then blk_cleanup_queue() calls blk_mq_free_queue() which will crash because:
- q->all_q_node is not added to all_q_list yet
- q->tag_set is NULL
- hctx was not setup yet or already freed

Fixed it by clearing q->mq_ops on error path.

Signed-off-by: Ming Lin
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Ming Lin
2016-05-26 22:51:43 +0800

16 May, 2016

1 commit

b3a834b15 blk-mq: fix undefined behaviour in order_to_size() ... Browse Code »

When this_order variable in blk_mq_init_rq_map() becomes zero
the code incorrectly decrements the variable and passes the result
to order_to_size() helper causing undefined behaviour:

UBSAN: Undefined behaviour in block/blk-mq.c:1459:27
shift exponent 4294967295 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-00072-g33656a1 #22

Fix the code by checking this_order variable for not having the zero
value first.

Reported-by: Meelis Roos
Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism")
Signed-off-by: Bartlomiej Zolnierkiewicz
Signed-off-by: Jens Axboe

Bartlomiej Zolnierkiewicz
2016-05-16 23:54:47 +0800