Eric Lee / smarc-fsl-linux-kernel

14 Sep, 2016

1 commit

6e219353a block: add poll_considered statistic ... Browse Code »

In order to help determine the effectiveness of polling in a running
system it is usful to determine the ratio of how often the poll
function is called vs how often the completion is checked. For this
reason we add a poll_considered variable and add it to the sysfs entry
for io_poll.

Signed-off-by: Stephen Bates
Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Stephen Bates
2016-09-14 22:41:21 +0800

29 Aug, 2016

2 commits

27489a3c8 blk-mq: turn hctx->run_work into a regular work struct ... Browse Code »

We don't need the larger delayed work struct, since we always run it
immediately.

Signed-off-by: Jens Axboe

Jens Axboe
2016-08-29 22:13:21 +0800
ee63cfa7f block: add kblockd_schedule_work_on() ... Browse Code »

Add a helper to schedule a regular struct work on a particular CPU.

Signed-off-by: Jens Axboe

Jens Axboe
2016-08-29 22:13:21 +0800

17 Aug, 2016

1 commit

1b8560868 block: Fix race triggered by blk_set_queue_dying() ... Browse Code »

blk_set_queue_dying() can be called while another thread is
submitting I/O or changing queue flags, e.g. through dm_stop_queue().
Hence protect the QUEUE_FLAG_DYING flag change with locking.

Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: Mike Snitzer
Cc: stable
Signed-off-by: Jens Axboe

Bart Van Assche
2016-08-17 09:36:14 +0800

08 Aug, 2016

1 commit

1eff9d322 block: rename bio bi_rw to bi_opf ... Browse Code »

Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe

Jens Axboe
2016-08-08 04:41:02 +0800

27 Jul, 2016

1 commit

3fc9d6909 Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"This branch also contains core changes. I've come to the conclusion
that from 4.9 and forward, I'll be doing just a single branch. We
often have dependencies between core and drivers, and it's hard to
always split them up appropriately without pulling core into drivers
when that happens.

That said, this contains:

- separate secure erase type for the core block layer, from
Christoph.

- set of discard fixes, from Christoph.

- bio shrinking fixes from Christoph, as a followup up to the
op/flags change in the core branch.

- map and append request fixes from Christoph.

- NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
exciting!

- nvme-loop fixes from Arnd.

- removal of ->driverfs_dev from Dan, after providing a
device_add_disk() helper.

- bcache fixes from Bhaktipriya and Yijing.

- cdrom subchannel read fix from Vchannaiah.

- set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

- set of drbd updates and fixes from Fabian, Lars, and Philipp.

- mg_disk error path fix from Bart.

- user notification for failed device add for loop, from Minfei.

- NVMe in general:
+ NVMe delay quirk from Guilherme.
+ SR-IOV support and command retry limits from Keith.
+ fix for memory-less NUMA node from Masayoshi.
+ use UINT_MAX for discard sectors, from Minfei.
+ cancel IO fixes from Ming.
+ don't allocate unused major, from Neil.
+ error code fixup from Dan.
+ use constants for PSDT/FUSE from James.
+ variable init fix from Jay.
+ fabrics fixes from Ming, Sagi, and Wei.
+ various fixes"

* 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
nvme/pci: Provide SR-IOV support
nvme: initialize variable before logical OR'ing it
block: unexport various bio mapping helpers
scsi/osd: open code blk_make_request
target: stop using blk_make_request
block: simplify and export blk_rq_append_bio
block: ensure bios return from blk_get_request are properly initialized
virtio_blk: use blk_rq_map_kern
memstick: don't allow REQ_TYPE_BLOCK_PC requests
block: shrink bio size again
block: simplify and cleanup bvec pool handling
block: get rid of bio_rw and READA
block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
NVMe: don't allocate unused nvme_major
nvme: avoid crashes when node 0 is memoryless node.
nvme: Limit command retries
loop: Make user notify for adding loop device failed
nvme-loop: fix nvme-loop Kconfig dependencies
nvmet: fix return value check in nvmet_subsys_alloc()
...

Linus Torvalds
2016-07-27 06:37:51 +0800

21 Jul, 2016

3 commits

4613c5f1d scsi/osd: open code blk_make_request ... Browse Code »

I wish the OSD code could simply use blk_rq_map_* helpers like
everyone else, but the complex nature of deciding if we have
DATA IN and/or DATA OUT buffers might make this impossible
(at least for a mere human like me).

But using blk_rq_append_bio at least allows sharing the setup code
between request with or without dat a buffers, and given that this
is the last user of blk_make_request it allows getting rid of that
somewhat awkward interface.

Signed-off-by: Christoph Hellwig
Acked-by: Boaz Harrosh
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-07-21 07:38:35 +0800
98d61d5b1 block: simplify and export blk_rq_append_bio ... Browse Code »

The target SCSI passthrough backend is much better served with the low-level
blk_rq_append_bio construct then the helpers built on top of it, so export it.

Also use the opportunity to remove the pointless request_queue argument and
make the code flow a little more readable.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-07-21 07:38:32 +0800
0c4de0f33 block: ensure bios return from blk_get_request are properly initialized ... Browse Code »

blk_get_request is used for BLOCK_PC and similar passthrough requests.
Currently we always need to call blk_rq_set_block_pc or an open coded
version of it to allow appending bios using the request mapping helpers
later on, which is a somewhat awkward API. Instead move the
initialization part of blk_rq_set_block_pc into blk_get_request, so that
we always have a safe to use request.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-07-21 07:38:30 +0800

06 Jul, 2016

1 commit

9645c1a23 block: Export blk_poll ... Browse Code »

The new NVMe over fabrics target will make use of this outside from a
module.

Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Reviewed-by: Steve Wise
Signed-off-by: Jens Axboe

Sagi Grimberg
2016-07-06 01:30:31 +0800

10 Jun, 2016

1 commit

b8269db45 cfq-iosched: temporarily boost queue priority for idle classes ... Browse Code »

If we're queuing REQ_PRIO IO and the task is running at an idle IO
class, then temporarily boost the priority. This prevents livelocks
due to priority inversion, when a low priority task is holding file
system resources while attempting to do IO.

An example of that is shown below. An ioniced idle task is holding
the directory mutex, while a normal priority task is trying to do
a directory lookup.

[478381.198925] ------------[ cut here ]------------
[478381.200315] INFO: task ionice:1168369 blocked for more than 120 seconds.
[478381.201324] Not tainted 4.0.9-38_fbk5_hotfix1_2936_g85409c6 #1
[478381.202278] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[478381.203462] ionice D ffff8803692736a8 0 1168369 1 0x00000080
[478381.203466] ffff8803692736a8 ffff880399c21300 ffff880276adcc00 ffff880369273698
[478381.204589] ffff880369273fd8 0000000000000000 7fffffffffffffff 0000000000000002
[478381.205752] ffffffff8177d5e0 ffff8803692736c8 ffffffff8177cea7 0000000000000000
[478381.206874] Call Trace:
[478381.207253] [] ? bit_wait_io_timeout+0x80/0x80
[478381.208175] [] schedule+0x37/0x90
[478381.208932] [] schedule_timeout+0x1dc/0x250
[478381.209805] [] ? __blk_run_queue+0x37/0x50
[478381.210706] [] ? ktime_get+0x45/0xb0
[478381.211489] [] io_schedule_timeout+0xa7/0x110
[478381.212402] [] ? prepare_to_wait+0x5b/0x90
[478381.213280] [] bit_wait_io+0x36/0x50
[478381.214063] [] __wait_on_bit+0x65/0x90
[478381.214961] [] ? bit_wait_io_timeout+0x80/0x80
[478381.215872] [] out_of_line_wait_on_bit+0x7c/0x90
[478381.216806] [] ? wake_atomic_t_function+0x40/0x40
[478381.217773] [] __wait_on_buffer+0x2a/0x30
[478381.218641] [] ext4_bread+0x57/0x70
[478381.219425] [] __ext4_read_dirblock+0x3c/0x380
[478381.220467] [] ext4_dx_find_entry+0x7d/0x170
[478381.221357] [] ? find_get_entry+0x1e/0xa0
[478381.222208] [] ext4_find_entry+0x484/0x510
[478381.223090] [] ext4_lookup+0x52/0x160
[478381.223882] [] lookup_real+0x1d/0x60
[478381.224675] [] __lookup_hash+0x38/0x50
[478381.225697] [] lookup_slow+0x45/0xab
[478381.226941] [] link_path_walk+0x7ae/0x820
[478381.227880] [] path_init+0xc2/0x430
[478381.228677] [] ? security_file_alloc+0x16/0x20
[478381.229776] [] path_openat+0x77/0x620
[478381.230767] [] ? page_add_file_rmap+0x2e/0x70
[478381.232019] [] do_filp_open+0x43/0xa0
[478381.233016] [] ? creds_are_invalid+0x29/0x70
[478381.234072] [] do_open_execat+0x70/0x170
[478381.235039] [] do_execveat_common.isra.36+0x1b8/0x6e0
[478381.236051] [] do_execve+0x2c/0x30
[478381.236809] [] ? getname+0x12/0x20
[478381.237564] [] SyS_execve+0x2e/0x40
[478381.238338] [] stub_execve+0x6d/0xa0
[478381.239126] ------------[ cut here ]------------
[478381.239915] ------------[ cut here ]------------
[478381.240606] INFO: task python2.7:1168375 blocked for more than 120 seconds.
[478381.242673] Not tainted 4.0.9-38_fbk5_hotfix1_2936_g85409c6 #1
[478381.243653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[478381.244902] python2.7 D ffff88005cf8fb98 0 1168375 1168248 0x00000080
[478381.244904] ffff88005cf8fb98 ffff88016c1f0980 ffffffff81c134c0 ffff88016c1f11a0
[478381.246023] ffff88005cf8ffd8 ffff880466cd0cbc ffff88016c1f0980 00000000ffffffff
[478381.247138] ffff880466cd0cc0 ffff88005cf8fbb8 ffffffff8177cea7 ffff88005cf8fcc8
[478381.248252] Call Trace:
[478381.248630] [] schedule+0x37/0x90
[478381.249382] [] schedule_preempt_disabled+0xe/0x10
[478381.250465] [] __mutex_lock_slowpath+0x92/0x100
[478381.251409] [] mutex_lock+0x1b/0x2f
[478381.252199] [] lookup_slow+0x36/0xab
[478381.253023] [] link_path_walk+0x7ae/0x820
[478381.253877] [] ? try_charge+0xc1/0x700
[478381.254690] [] path_init+0xc2/0x430
[478381.255525] [] ? security_file_alloc+0x16/0x20
[478381.256450] [] path_openat+0x77/0x620
[478381.257256] [] ? lru_cache_add_active_or_unevictable+0x2b/0xa0
[478381.258390] [] ? handle_mm_fault+0x13f3/0x1720
[478381.259309] [] do_filp_open+0x43/0xa0
[478381.260139] [] ? __alloc_fd+0x42/0x120
[478381.260962] [] do_sys_open+0x13c/0x230
[478381.261779] [] ? syscall_trace_enter_phase1+0x113/0x170
[478381.262851] [] SyS_open+0x22/0x30
[478381.263598] [] system_call_fastpath+0x12/0x17
[478381.264551] ------------[ cut here ]------------
[478381.265377] ------------[ cut here ]------------

Signed-off-by: Jens Axboe
Reviewed-by: Jeff Moyer

Jens Axboe
2016-06-10 06:15:01 +0800

09 Jun, 2016

1 commit

288dab8a3 block: add a separate operation type for secure erase ... Browse Code »

Instead of overloading the discard support with the REQ_SECURE flag.
Use the opportunity to rename the queue flag as well, and remove the
dead checks for this flag in the RAID 1 and RAID 10 drivers that don't
claim support for secure erase.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2016-06-09 23:52:25 +0800

08 Jun, 2016

10 commits

28a8f0d31 block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH ... Browse Code »

To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
6296b9604 block, drivers, fs: shrink bi_rw from long to int ... Browse Code »

We don't need bi_rw to be so large on 64 bit archs, so
reduce it to unsigned int.

Signed-off-by: Mike Christie
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
d9d8c5c48 block: convert is_sync helpers to use REQ_OPs. ... Browse Code »

This patch converts the is_sync helpers to use separate variables
for the operation and flags.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
8fe0d473f block: convert merge/insert code to check for REQ_OPs. ... Browse Code »

This patch converts the block layer merging code to use separate variables
for the operation and flags, and to check req_op for the REQ_OP.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
ba568ea0a block: prepare elevator to use REQ_OPs. ... Browse Code »

This patch converts the elevator code to use separate variables
for the operation and flags, and to check req_op for the REQ_OP.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
e6a40b096 block: prepare request creation/destruction code to use REQ_OPs ... Browse Code »

This patch prepares *_get_request/*_put_request and freed_request,
to use separate variables for the operation and flags. In the
next patches the struct request users will be converted like
was done for bios where the op and flags are set separately.

Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
4993b77d3 block: copy bio op to request op ... Browse Code »

The bio users should now always be setting up the bio op. This patch
has the block layer copy that to the request.

Signed-off-by: Mike Christie
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
95fe6c1a2 block, fs, mm, drivers: use bio set/get op accessors ... Browse Code »

This patch converts the simple bi_rw use cases in the block,
drivers, mm and fs code to set/get the bio operation using
bio_set_op_attrs/bio_op

These should be simple one or two liner cases, so I just did them
in one patch. The next patches handle the more complicated
cases in a module per patch.

Signed-off-by: Mike Christie
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
a8ebb056a block, drivers, cgroup: use op_is_write helper instead of checking for REQ_WRITE ... Browse Code »

We currently set REQ_WRITE/WRITE for all non READ IOs
like discard, flush, writesame, etc. In the next patches where we
no longer set up the op as a bitmap, we will not be able to
detect a operation direction like writesame by testing if REQ_WRITE is
set.

This patch converts the drivers and cgroup to use the
op_is_write helper. This should just cover the simple
cases. I did dm, md and bcache in their own patches
because they were more involved.

Signed-off-by: Mike Christie
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800
4e49ea4a3 block/fs/drivers: remove rw argument from submit_bio ... Browse Code »

This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.

Signed-off-by: Mike Christie

Fixed up fs/ext4/crypto.c

Signed-off-by: Jens Axboe

Mike Christie
2016-06-08 03:41:38 +0800

14 Apr, 2016

1 commit

c888a8f95 block: kill off q->flush_flags ... Browse Code »

Now that we converted everything to the newer block write cache
interface, kill off the queue flush_flags and queueable flush
entries.

Signed-off-by: Jens Axboe

Jens Axboe
2016-04-14 03:33:19 +0800

13 Apr, 2016

1 commit

37e58237a block: add offset in blk_add_request_payload() ... Browse Code »

We could kmalloc() the payload, so need the offset in page.

Signed-off-by: Ming Lin
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Ming Lin
2016-04-13 03:13:23 +0800

05 Apr, 2016

1 commit

09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

19 Mar, 2016

1 commit

fcab86add Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata ... Browse Code »

Pull libata updates from Tejun Heo:

- ahci grew runtime power management support so that the controller can
be turned off if no devices are attached.

- sata_via isn't dead yet. It got hotplug support and more refined
workaround for certain WD drives.

- Misc cleanups. There's a merge from for-4.5-fixes to avoid confusing
conflicts in ahci PCI ID table.

* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: ahci_xgene: dereferencing uninitialized pointer in probe
AHCI: Remove obsolete Intel Lewisburg SATA RAID device IDs
ata: sata_rcar: Use ARCH_RENESAS
sata_via: Implement hotplug for VT6421
sata_via: Apply WD workaround only when needed on VT6421
ahci: Add runtime PM support for the host controller
ahci: Add functions to manage runtime PM of AHCI ports
ahci: Convert driver to use modern PM hooks
ahci: Cache host controller version
scsi: Drop runtime PM usage count after host is added
scsi: Set request queue runtime PM status back to active on resume
block: Add blk_set_runtime_active()
ata: ahci_mvebu: add support for Armada 3700 variant
libata: fix unbalanced spin_lock_irqsave/spin_unlock_irq() in ata_scsi_park_show()
libata: support AHCI on OCTEON platform

Linus Torvalds
2016-03-19 11:06:46 +0800

23 Feb, 2016

1 commit

6acfe68ba dm: fix excessive dm-mq context switching ... Browse Code »

Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
than if an underlying null_blk device were used directly. One of the
reasons for this drop in performance is that blk_insert_clone_request()
was calling blk_mq_insert_request() with @async=true. This forced the
use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
which ushered in ping-ponging between process context (fio in this case)
and kblockd's kworker to submit the cloned request. The ftrace
function_graph tracer showed:

kworker-2013 => fio-12190
fio-12190 => kworker-2013
...
kworker-2013 => fio-12190
fio-12190 => kworker-2013
...

Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
_not_ use kblockd to submit the cloned requests isn't enough to
eliminate the observed context switches.

In addition to this dm-mq specific blk-core fix, there are 2 DM core
fixes to dm-mq that (when paired with the blk-core fix) completely
eliminate the observed context switching:

1) don't blk_mq_run_hw_queues in blk-mq request completion

Motivated by desire to reduce overhead of dm-mq, punting to kblockd
just increases context switches.

In my testing against a really fast null_blk device there was no benefit
to running blk_mq_run_hw_queues() on completion (and no other blk-mq
driver does this). So hopefully this change doesn't induce the need for
yet another revert like commit 621739b00e16ca2d !

2) use blk_mq_complete_request() in dm_complete_request()

blk_complete_request() doesn't offer the traditional q->mq_ops vs
.request_fn branching pattern that other historic block interfaces
do (e.g. blk_get_request). Using blk_mq_complete_request() for
blk-mq requests is important for performance. It should be noted
that, like blk_complete_request(), blk_mq_complete_request() doesn't
natively handle partial completions -- but the request-based
DM-multipath target does provide the required partial completion
support by dm.c:end_clone_bio() triggering requeueing of the request
via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.

dm-mq fix #2 is _much_ more important than #1 for eliminating the
context switches.
Before: cpu : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
After: cpu : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472

With these changes multithreaded async read IOPs improved from ~950K
to ~1350K for this dm-mq stacked on null_blk test-case. The raw read
IOPs of the underlying null_blk device for the same workload is ~1950K.

Fixes: 7fb4898e0 ("block: add blk-mq support to blk_insert_cloned_request()")
Fixes: bfebd1cdb ("dm: add full blk-mq support to request-based DM")
Cc: stable@vger.kernel.org # 4.1+
Reported-by: Sagi Grimberg
Signed-off-by: Mike Snitzer
Acked-by: Jens Axboe

Mike Snitzer
2016-02-23 00:04:40 +0800

19 Feb, 2016

1 commit

d07ab6d11 block: Add blk_set_runtime_active() ... Browse Code »

If block device is left runtime suspended during system suspend, resume
hook of the driver typically corrects runtime PM status of the device back
to "active" after it is resumed. However, this is not enough as queue's
runtime PM status is still "suspended". As long as it is in this state
blk_pm_peek_request() returns NULL and thus prevents new requests to be
processed.

Add new function blk_set_runtime_active() that can be used to force the
queue status back to "active" as needed.

Signed-off-by: Mika Westerberg
Acked-by: Jens Axboe
Signed-off-by: Tejun Heo

Mika Westerberg
2016-02-19 23:52:45 +0800

05 Feb, 2016

2 commits

12ffbbe94 Merge remote-tracking branch 'mkp-scsi/4.5/scsi-fixes' into fixes Browse Code »

James Bottomley
2016-02-05 13:37:52 +0800
0fb5b1fb3 block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabled ... Browse Code »

When a storage device rejects a WRITE SAME command we will disable write
same functionality for the device and return -EREMOTEIO to the block
layer. -EREMOTEIO will in turn prevent DM from retrying the I/O and/or
failing the path.

Yiwen Jiang discovered a small race where WRITE SAME requests issued
simultaneously would cause -EIO to be returned. This happened because
any requests being prepared after WRITE SAME had been disabled for the
device caused us to return BLKPREP_KILL. The latter caused the block
layer to return -EIO upon completion.

To overcome this we introduce BLKPREP_INVALID which indicates that this
is an invalid request for the device. blk_peek_request() is modified to
return -EREMOTEIO in that case.

Reported-by: Yiwen Jiang
Suggested-by: Mike Snitzer
Reviewed-by: Hannes Reinicke
Reviewed-by: Ewan Milne
Reviewed-by: Yiwen Jiang
Signed-off-by: Martin K. Petersen

Martin K. Petersen
2016-02-05 11:42:58 +0800

22 Jan, 2016

1 commit

3e1e21c7b Merge branch 'for-4.5/nvme' of git://git.kernel.dk/linux-block ... Browse Code »

Pull NVMe updates from Jens Axboe:
"Last branch for this series is the nvme changes. It's in a separate
branch to avoid splitting too much between core and NVMe changes,
since NVMe is still helping drive some blk-mq changes. That said, not
a huge amount of core changes in here. The grunt of the work is the
continued split of the code"

* 'for-4.5/nvme' of git://git.kernel.dk/linux-block: (67 commits)
uapi: update install list after nvme.h rename
NVMe: Export NVMe attributes to sysfs group
NVMe: Shutdown controller only for power-off
NVMe: IO queue deletion re-write
NVMe: Remove queue freezing on resets
NVMe: Use a retryable error code on reset
NVMe: Fix admin queue ring wrap
nvme: make SG_IO support optional
nvme: fixes for NVME_IOCTL_IO_CMD on the char device
nvme: synchronize access to ctrl->namespaces
nvme: Move nvme_freeze/unfreeze_queues to nvme core
PCI/AER: include header file
NVMe: Export namespace attributes to sysfs
NVMe: Add pci error handlers
block: remove REQ_NO_TIMEOUT flag
nvme: merge iod and cmd_info
nvme: meta_sg doesn't have to be an array
nvme: properly free resources for cancelled command
nvme: simplify completion handling
nvme: special case AEN requests
...

Linus Torvalds
2016-01-22 11:58:02 +0800

20 Jan, 2016

1 commit

7c24d9f3b Merge branch 'for-4.5/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block updates from Jens Axboe:
"We don't have a lot of core changes this time around, it's mostly in
drivers, which will come in a subsequent pull.

The cores changes include:

- blk-mq
- Prep patch from Christoph, changing blk_mq_alloc_request() to
take flags instead of just using gfp_t for sleep/nosleep.
- Doc patch from me, clarifying the difference between legacy
and blk-mq for timer usage.
- Fixes from Raghavendra for memory-less numa nodes, and a reuse
of CPU masks.

- Cleanup from Geliang Tang, using offset_in_page() instead of open
coding it.

- From Ilya, rename request_queue slab to it reflects what it holds,
and a fix for proper use of bdgrab/put.

- A real fix for the split across stripe boundaries from Keith. We
yanked a broken version of this from 4.4-rc final, this one works.

- From Mike Krinkin, emit a trace message when we split.

- From Wei Tang, two small cleanups, not explicitly clearing memory
that is already cleared"

* 'for-4.5/core' of git://git.kernel.dk/linux-block:
block: use bd{grab,put}() instead of open-coding
block: split bios to max possible length
block: add call to split trace point
blk-mq: Avoid memoryless numa node encoded in hctx numa_node
blk-mq: Reuse hardware context cpumask for tags
blk-mq: add a flags parameter to blk_mq_alloc_request
Revert "blk-flush: Queue through IO scheduler when flush not required"
block: clarify blk_add_timer() use case for blk-mq
bio: use offset_in_page macro
block: do not initialise statics to 0 or NULL
block: do not initialise globals to 0 or NULL
block: rename request_queue slab cache

Linus Torvalds
2016-01-20 07:03:34 +0800

29 Dec, 2015

1 commit

21491412f block: add blk_start_queue_async() ... Browse Code »

We currently only have an inline/sync helper to restart a stopped
queue. If drivers need an async version, they have to roll their
own. Add a generic helper instead.

Signed-off-by: Jens Axboe

Jens Axboe
2015-12-29 04:07:07 +0800

23 Dec, 2015

2 commits

23688bf4f block: ensure to split after potentially bouncing a bio ... Browse Code »

blk_queue_bio() does split then bounce, which makes the segment
counting based on pages before bouncing and could go wrong. Move
the split to after bouncing, like we do for blk-mq, and the we
fix the issue of having the bio count for segments be wrong.

Fixes: 54efd50bfd87 ("block: make generic_make_request handle arbitrarily sized bios")
Cc: stable@vger.kernel.org
Tested-by: Artem S. Tashkinov
Signed-off-by: Jens Axboe

Junichi Nomura
2015-12-23 01:26:53 +0800
287922eb0 block: defer timeouts to a workqueue ... Browse Code »

Timer context is not very useful for drivers to perform any meaningful abort
action from. So instead of calling the driver from this useless context
defer it to a workqueue as soon as possible.

Note that while a delayed_work item would seem the right thing here I didn't
dare to use it due to the magic in blk_add_timer that pokes deep into timer
internals. But maybe this encourages Tejun to add a sensible API for that to
the workqueue API and we'll all be fine in the end :)

Contains a major update from Keith Bush:

"This patch removes synchronizing the timeout work so that the timer can
start a freeze on its own queue. The timer enters the queue, so timer
context can only start a freeze, but not wait for frozen."

Signed-off-by: Christoph Hellwig
Acked-by: Keith Busch
Signed-off-by: Jens Axboe

Christoph Hellwig
2015-12-23 00:38:16 +0800

04 Dec, 2015

1 commit

4fd41a855 SCSI: Fix NULL pointer dereference in runtime PM ... Browse Code »

The routines in scsi_pm.c assume that if a runtime-PM callback is
invoked for a SCSI device, it can only mean that the device's driver
has asked the block layer to handle the runtime power management (by
calling blk_pm_runtime_init(), which among other things sets q->dev).

However, this assumption turns out to be wrong for things like the ses
driver. Normally ses devices are not allowed to do runtime PM, but
userspace can override this setting. If this happens, the kernel gets
a NULL pointer dereference when blk_post_runtime_resume() tries to use
the uninitialized q->dev pointer.

This patch fixes the problem by checking q->dev in block layer before
handle runtime PM. Since ses doesn't define any PM callbacks and call
blk_pm_runtime_init(), the crash won't occur.

This fixes Bugzilla #101371.
https://bugzilla.kernel.org/show_bug.cgi?id=101371

More discussion can be found from below link.
http://marc.info/?l=linux-scsi&m=144163730531875&w=2

Signed-off-by: Ken Xue
Acked-by: Alan Stern
Cc: Xiangliang Yu
Cc: James E.J. Bottomley
Cc: Jens Axboe
Cc: Michael Terry
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe

Ken Xue
2015-12-04 11:35:02 +0800

02 Dec, 2015

1 commit

6f3b0e8bc blk-mq: add a flags parameter to blk_mq_alloc_request ... Browse Code »

We already have the reserved flag, and a nowait flag awkwardly encoded as
a gfp_t. Add a real flags argument to make the scheme more extensible and
allow for a nicer calling convention.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2015-12-02 01:53:59 +0800

30 Nov, 2015

1 commit

bf4e6b4e7 block: Always check queue limits for cloned requests ... Browse Code »

When a cloned request is retried on other queues it always needs
to be checked against the queue limits of that queue.
Otherwise the calculations for nr_phys_segments might be wrong,
leading to a crash in scsi_init_sgtable().

To clarify this the patch renames blk_rq_check_limits()
to blk_cloned_rq_check_limits() and removes the symbol
export, as the new function should only be used for
cloned requests and never exported.

Cc: Mike Snitzer
Cc: Ewan Milne
Cc: Jeff Moyer
Signed-off-by: Hannes Reinecke
Fixes: e2a60da74 ("block: Clean up special command handling logic")
Cc: stable@vger.kernel.org # 3.7+
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe

Hannes Reinecke
2015-11-30 05:37:27 +0800

25 Nov, 2015

2 commits

d674d4145 block: do not initialise globals to 0 or NULL ... Browse Code »

This patch fixes the checkpatch.pl error to blk-exec.c:

ERROR: do not initialise globals to 0 or NULL

Signed-off-by: Wei Tang
Signed-off-by: Jens Axboe

Wei Tang
2015-11-25 06:24:25 +0800
c2789bd40 block: rename request_queue slab cache ... Browse Code »

Name the cache after the actual name of the struct.

Signed-off-by: Ilya Dryomov
Signed-off-by: Jens Axboe

Ilya Dryomov
2015-11-25 06:24:25 +0800