Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

16 Jan, 2015

1 commit

b041392d4 blk-mq: Fix a use-after-free ... Browse Code »

commit 45a9c9d909b24c6ad0e28a7946e7486e73010319 upstream.

blk-mq users are allowed to free the memory request_queue.tag_set
points at after blk_cleanup_queue() has finished but before
blk_release_queue() has started. This can happen e.g. in the SCSI
core. The SCSI core namely embeds the tag_set structure in a SCSI
host structure. The SCSI host structure is freed by
scsi_host_dev_release(). This function is called after
blk_cleanup_queue() finished but can be called before
blk_release_queue().

This means that it is not safe to access request_queue.tag_set from
inside blk_release_queue(). Hence remove the blk_sync_queue() call
from blk_release_queue(). This call is not necessary - outstanding
requests must have finished before blk_release_queue() is
called. Additionally, move the blk_mq_free_queue() call from
blk_release_queue() to blk_cleanup_queue() to avoid that struct
request_queue.tag_set gets accessed after it has been freed.

This patch avoids that the following kernel oops can be triggered
when deleting a SCSI host for which scsi-mq was enabled:

Call Trace:
[] lock_acquire+0xc4/0x270
[] mutex_lock_nested+0x61/0x380
[] blk_mq_free_queue+0x30/0x180
[] blk_release_queue+0x84/0xd0
[] kobject_cleanup+0x7b/0x1a0
[] kobject_put+0x30/0x70
[] blk_put_queue+0x15/0x20
[] disk_release+0x99/0xd0
[] device_release+0x36/0xb0
[] kobject_cleanup+0x7b/0x1a0
[] kobject_put+0x30/0x70
[] put_disk+0x1a/0x20
[] __blkdev_put+0x135/0x1b0
[] blkdev_put+0x50/0x160
[] kill_block_super+0x44/0x70
[] deactivate_locked_super+0x44/0x60
[] deactivate_super+0x4e/0x70
[] cleanup_mnt+0x43/0x90
[] __cleanup_mnt+0x12/0x20
[] task_work_run+0xac/0xe0
[] do_notify_resume+0x61/0xa0
[] int_signal+0x12/0x17

Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: Robert Elliott
Cc: Ming Lei
Cc: Alexander Gordeev
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Bart Van Assche
2015-01-16 22:59:48 +0800

19 Oct, 2014

1 commit

d3dc366bb Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block layer changes from Jens Axboe:
"This is the core block IO pull request for 3.18. Apart from the new
and improved flush machinery for blk-mq, this is all mostly bug fixes
and cleanups.

- blk-mq timeout updates and fixes from Christoph.

- Removal of REQ_END, also from Christoph. We pass it through the
->queue_rq() hook for blk-mq instead, freeing up one of the request
bits. The space was overly tight on 32-bit, so Martin also killed
REQ_KERNEL since it's no longer used.

- blk integrity updates and fixes from Martin and Gu Zheng.

- Update to the flush machinery for blk-mq from Ming Lei. Now we
have a per hardware context flush request, which both cleans up the
code should scale better for flush intensive workloads on blk-mq.

- Improve the error printing, from Rob Elliott.

- Backing device improvements and cleanups from Tejun.

- Fixup of a misplaced rq_complete() tracepoint from Hannes.

- Make blk_get_request() return error pointers, fixing up issues
where we NULL deref when a device goes bad or missing. From Joe
Lawrence.

- Prep work for drastically reducing the memory consumption of dm
devices from Junichi Nomura. This allows creating clone bio sets
without preallocating a lot of memory.

- Fix a blk-mq hang on certain combinations of queue depths and
hardware queues from me.

- Limit memory consumption for blk-mq devices for crash dump
scenarios and drivers that use crazy high depths (certain SCSI
shared tag setups). We now just use a single queue and limited
depth for that"

* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
block: Remove REQ_KERNEL
blk-mq: allocate cpumask on the home node
bio-integrity: remove the needless fail handle of bip_slab creating
block: include func name in __get_request prints
block: make blk_update_request print prefix match ratelimited prefix
blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
block: fix alignment_offset math that assumes io_min is a power-of-2
blk-mq: Make bt_clear_tag() easier to read
blk-mq: fix potential hang if rolling wakeup depth is too high
block: add bioset_create_nobvec()
block: use bio_clone_fast() in blk_rq_prep_clone()
block: misplaced rq_complete tracepoint
sd: Honor block layer integrity handling flags
block: Replace strnicmp with strncasecmp
block: Add T10 Protection Information functions
block: Don't merge requests if integrity flags differ
block: Integrity checksum flag
block: Relocate bio integrity flags
block: Add a disk flag to block integrity profile
block: Add prefix to block integrity profile flags
...

Linus Torvalds
2014-10-19 02:53:51 +0800

13 Oct, 2014

2 commits

7b2b10e0e block: include func name in __get_request prints ... Browse Code »

In __get_request calls to printk_ratelimited, include the function name so
the callbacks suppressed message matches the messages that are printed,
and add "dev" before the device name so it matches other block layer
messages.

Signed-off-by: Robert Elliott
Reviewed-by: Webb Scales
Signed-off-by: Jens Axboe

Robert Elliott
2014-10-13 22:34:23 +0800
ef3ecb66b block: make blk_update_request print prefix match ratelimited prefix ... Browse Code »

In blk_update_request, change the printk_ratelimited
prefix from end_request to blk_update_request so it
matches the name printed if rate limiting occurs.

Old:
[10234.933106] blk_update_request: 174 callbacks suppressed
[10234.934940] end_request: critical target error, dev sdr, sector 16
[10234.949788] end_request: critical target error, dev sdr, sector 16

New:
[16863.445173] blk_update_request: 398 callbacks suppressed
[16863.447029] blk_update_request: critical target error, dev sdr, sector
1442066176
[16863.449383] blk_update_request: critical target error, dev sdr, sector
802802888
[16863.451680] blk_update_request: critical target error, dev sdr, sector
1609535456

Signed-off-by: Robert Elliott
Reviewed-by: Webb Scales
Signed-off-by: Jens Axboe

Robert Elliott
2014-10-13 22:34:21 +0800

08 Oct, 2014

1 commit

28596c972 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull "trivial tree" updates from Jiri Kosina:
"Usual pile from trivial tree everyone is so eagerly waiting for"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
Remove MN10300_PROC_MN2WS0038
mei: fix comments
treewide: Fix typos in Kconfig
kprobes: update jprobe_example.c for do_fork() change
Documentation: change "&" to "and" in Documentation/applying-patches.txt
Documentation: remove obsolete pcmcia-cs from Changes
Documentation: update links in Changes
Documentation: Docbook: Fix generated DocBook/kernel-api.xml
score: Remove GENERIC_HAS_IOMAP
gpio: fix 'CONFIG_GPIO_IRQCHIP' comments
tty: doc: Fix grammar in serial/tty
dma-debug: modify check_for_stack output
treewide: fix errors in printk
genirq: fix reference in devm_request_threaded_irq comment
treewide: fix synchronize_rcu() in comments
checkstack.pl: port to AArch64
doc: queue-sysfs: minor fixes
init/do_mounts: better syntax description
MIPS: fix comment spelling
powerpc/simpleboot: fix comment
...

Linus Torvalds
2014-10-08 09:16:26 +0800

04 Oct, 2014

1 commit

11dfce509 block: use bio_clone_fast() in blk_rq_prep_clone() ... Browse Code »

Request cloning clones bios in the request to track the completion
of each bio.
For that purpose, we can use bio_clone_fast() instead of bio_clone()
to avoid unnecessary allocation and copy of bvecs.

This patch reduces memory footprint of request-based device-mapper
(about 1-4KB for each request) and is a preparation for further
reduction of memory usage by removing unused bvec mempool.

Signed-off-by: Jun'ichi Nomura
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Junichi Nomura
2014-10-04 05:28:16 +0800

01 Oct, 2014

1 commit

4a0efdc93 block: misplaced rq_complete tracepoint ... Browse Code »

The rq_complete tracepoint was never issued for empty requests,
causing the resulting blktrace information to never show any
completion for those request.

Signed-off-by: Hannes Reinecke
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe

Hannes Reinecke
2014-10-01 22:17:42 +0800

26 Sep, 2014

6 commits

f70ced091 blk-mq: support per-distpatch_queue flush machinery ... Browse Code »
5

This patch supports to run one single flush machinery for
each blk-mq dispatch queue, so that:

- current init_request and exit_request callbacks can
cover flush request too, then the buggy copying way of
initializing flush request's pdu can be fixed

- flushing performance gets improved in case of multi hw-queue

In fio sync write test over virtio-blk(4 hw queues, ioengine=sync,
iodepth=64, numjobs=4, bs=4K), it is observed that througput gets
increased a lot over my test environment:
- throughput: +70% in case of virtio-blk over null_blk
- throughput: +30% in case of virtio-blk over SSD image

The multi virtqueue feature isn't merged to QEMU yet, and patches for
the feature can be found in below tree:

git://kernel.ubuntu.com/ming/qemu.git v2.1.0-mq.4

And simply passing 'num_queues=4 vectors=5' should be enough to
enable multi queue(quad queue) feature for QEMU virtio-blk.

Suggested-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:45 +0800
e97c293cd block: introduce 'blk_mq_ctx' parameter to blk_get_flush_queue ... Browse Code »

This patch adds 'blk_mq_ctx' parameter to blk_get_flush_queue(),
so that this function can find the corresponding blk_flush_queue
bound with current mq context since the flush queue will become
per hw-queue.

For legacy queue, the parameter can be simply 'NULL'.

For multiqueue case, the parameter should be set as the context
from which the related request is originated. With this context
info, the hw queue and related flush queue can be found easily.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:44 +0800
ba483388e block: remove blk_init_flush() and its pair ... Browse Code »

Now mission of the two helpers is over, and just call
blk_alloc_flush_queue() and blk_free_flush_queue() directly.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:41 +0800
7c94e1c15 block: introduce blk_flush_queue to drive flush machinery ... Browse Code »

This patch introduces 'struct blk_flush_queue' and puts all
flush machinery related fields into this structure, so that

- flush implementation details aren't exposed to driver
- it is easy to convert to per dispatch-queue flush machinery

This patch is basically a mechanical replacement.

Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:40 +0800
3c09676c1 block: move flush initialization to blk_flush_init ... Browse Code »

These fields are always used with the flush request, so
initialize them together.

Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:37 +0800
f35526557 block: introduce blk_init_flush and its pair ... Browse Code »

These two temporary functions are introduced for holding flush
initialization and de-initialization, so that we can
introduce 'flush queue' easier in the following patch. And
once 'flush queue' and its allocation/free functions are ready,
they will be removed for sake of code readability.

Reviewed-by: Christoph Hellwig
Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-09-26 05:22:35 +0800

11 Sep, 2014

1 commit

b207892b0 Merge branch 'for-linus' into for-3.18/core ... Browse Code »

A bit of churn on the for-linus side that would be nice to have
in the core bits for 3.18, so pull it in to catch us up and make
forward progress easier.

Signed-off-by: Jens Axboe

Conflicts:
block/scsi_ioctl.c

Jens Axboe
2014-09-11 23:31:18 +0800

09 Sep, 2014

2 commits

da3dae54e Documentation: Docbook: Fix generated DocBook/kernel-api.xml ... Browse Code »

This patch fix spelling typo found in DocBook/kernel-api.xml.
It is because the file is generated from the source comments,
I have to fix the comments in source codes.

Signed-off-by: Masanari Iida
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina

Masanari Iida
2014-09-09 16:34:56 +0800
ff9ea3238 block, bdi: an active gendisk always has a request_queue associated with it ... Browse Code »

bdev_get_queue() returns the request_queue associated with the
specified block_device. blk_get_backing_dev_info() makes use of
bdev_get_queue() to determine the associated bdi given a block_device.

All the callers of bdev_get_queue() including
blk_get_backing_dev_info() assume that bdev_get_queue() may return
NULL and implement NULL handling; however, bdev_get_queue() requires
the passed in block_device is opened and attached to its gendisk.
Because an active gendisk always has a valid request_queue associated
with it, bdev_get_queue() can never return NULL and neither can
blk_get_backing_dev_info().

Make it clear that neither of the two functions can return NULL and
remove NULL handling from all the callers.

Signed-off-by: Tejun Heo
Cc: Chris Mason
Cc: Dave Chinner
Signed-off-by: Jens Axboe

Tejun Heo
2014-09-09 00:00:35 +0800

29 Aug, 2014

1 commit

a492f0754 block,scsi: fixup blk_get_request dead queue scenarios ... Browse Code »
13

The blk_get_request function may fail in low-memory conditions or during
device removal (even if __GFP_WAIT is set). To distinguish between these
errors, modify the blk_get_request call stack to return the appropriate
ERR_PTR. Verify that all callers check the return status and consider
IS_ERR instead of a simple NULL pointer check.

For consistency, make a similar change to the blk_mq_alloc_request leg
of blk_get_request. It may fail if the queue is dead, or the caller was
unwilling to wait.

Signed-off-by: Joe Lawrence
Acked-by: Jiri Kosina [for pktdvd]
Acked-by: Boaz Harrosh [for osd]
Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe

Joe Lawrence
2014-08-29 00:03:46 +0800

23 Aug, 2014

1 commit

6f4a16266 scsi-mq: fix requests that use a separate CDB buffer ... Browse Code »

This patch fixes code such as the following with scsi-mq enabled:

rq = blk_get_request(...);
blk_rq_set_block_pc(rq);

rq->cmd = my_cmd_buffer; /* separate CDB buffer */

blk_execute_rq_nowait(...);

Code like this appears in e.g. sg_start_req() in drivers/scsi/sg.c (for
large CDBs only). Without this patch, scsi_mq_prep_fn() will set
rq->cmd back to rq->__cmd, causing the wrong CDB to be sent to the device.

Signed-off-by: Tony Battersby
Signed-off-by: Jens Axboe

Tony Battersby
2014-08-23 04:04:31 +0800

02 Jul, 2014

2 commits

780db2071 blk-mq: decouble blk-mq freezing from generic bypassing ... Browse Code »

blk_mq freezing is entangled with generic bypassing which bypasses
blkcg and io scheduler and lets IO requests fall through the block
layer to the drivers in FIFO order. This allows forward progress on
IOs with the advanced features disabled so that those features can be
configured or altered without worrying about stalling IO which may
lead to deadlock through memory allocation.

However, generic bypassing doesn't quite fit blk-mq. blk-mq currently
doesn't make use of blkcg or ioscheds and it maps bypssing to
freezing, which blocks request processing and drains all the in-flight
ones. This causes problems as bypassing assumes that request
processing is online. blk-mq works around this by conditionally
allowing request processing for the problem case - during queue
initialization.

Another weirdity is that except for during queue cleanup, bypassing
started on the generic side prevents blk-mq from processing new
requests but doesn't drain the in-flight ones. This shouldn't break
anything but again highlights that something isn't quite right here.

The root cause is conflating blk-mq freezing and generic bypassing
which are two different mechanisms. The only intersecting purpose
that they serve is during queue cleanup. Let's properly separate
blk-mq freezing from generic bypassing and simply use it where
necessary.

* request_queue->mq_freeze_depth is added and
blk_mq_[un]freeze_queue() now operate on this counter instead of
->bypass_depth. The replacement for QUEUE_FLAG_BYPASS isn't added
but the counter is tested directly. This will be further updated by
later changes.

* blk_mq_drain_queue() is dropped and "__" prefix is dropped from
blk_mq_freeze_queue(). Queue cleanup path now calls
blk_mq_freeze_queue() directly.

* blk_queue_enter()'s fast path condition is simplified to simply
check @q->mq_freeze_depth. Previously, the condition was

!blk_queue_dying(q) &&
(!blk_queue_bypass(q) || !blk_queue_init_done(q))

mq_freeze_depth is incremented right after dying is set and
blk_queue_init_done() exception isn't necessary as blk-mq doesn't
start frozen, which only leaves the blk_queue_bypass() test which
can be replaced by @q->mq_freeze_depth test.

This change simplifies the code and reduces confusion in the area.

Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Nicholas A. Bellinger
Signed-off-by: Jens Axboe

Tejun Heo
2014-07-02 00:31:13 +0800
776687bce block, blk-mq: draining can't be skipped even if bypass_depth was non-zero ... Browse Code »
32

Currently, both blk_queue_bypass_start() and blk_mq_freeze_queue()
skip queue draining if bypass_depth was already above zero. The
assumption is that the one which bumped the bypass_depth should have
performed draining already; however, there's nothing which prevents a
new instance of bypassing/freezing from starting before the previous
one finishes draining. The current code may allow the later
bypassing/freezing instances to complete while there still are
in-flight requests which haven't finished draining.

Fix it by draining regardless of bypass_depth. We still skip draining
from blk_queue_bypass_start() while the queue is initializing to avoid
introducing excessive delays during boot. INIT_DONE setting is moved
above the initial blk_queue_bypass_end() so that bypassing attempts
can't slip inbetween.

Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Nicholas A. Bellinger
Signed-off-by: Jens Axboe

Tejun Heo
2014-07-02 00:29:17 +0800

20 Jun, 2014

1 commit

f1d702487 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:
"A smaller collection of fixes for the block core that would be nice to
have in -rc2. This pull request contains:

- Fixes for races in the wait/wakeup logic used in blk-mq from
Alexander. No issues have been observed, but it is definitely a
bit flakey currently. Alternatively, we may drop the cyclic
wakeups going forward, but that needs more testing.

- Some cleanups from Christoph.

- Fix for an oops in null_blk if queue_mode=1 and softirq completions
are used. From me.

- A fix for a regression caused by the chunk size setting. It
inadvertently used max_hw_sectors instead of max_sectors, which is
incorrect, and causes hangs on btrfs multi-disk setups (where hw
sectors apparently isn't set). From me.

- Removal of WQ_POWER_EFFICIENT in the kblockd creation. This was a
recent addition as well, but it actually breaks blk-mq which relies
on strict scheduling. If the workqueue power_efficient mode is
turned on, this breaks blk-mq. From Matias.

- null_blk module parameter description fix from Mike"

* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: bitmap tag: fix races in bt_get() function
blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt
blk-mq: bitmap tag: fix races on shared ::wake_index fields
block: blk_max_size_offset() should check ->max_sectors
null_blk: fix softirq completions for queue_mode == 1
blk-mq: merge blk_mq_drain_queue and __blk_mq_drain_queue
blk-mq: properly drain stopped queues
block: remove WQ_POWER_EFFICIENT from kblockd
null_blk: fix name and description of 'queue_mode' module parameter
block: remove elv_abort_queue and blk_abort_flushes

Linus Torvalds
2014-06-20 11:56:43 +0800

16 Jun, 2014

1 commit

b55b39020 Merge git://git.infradead.org/users/willy/linux-nvme ... Browse Code »

Pull NVMe update from Matthew Wilcox:
"Mostly bugfixes again for the NVMe driver. I'd like to call out the
exported tracepoint in the block layer; I believe Keith has cleared
this with Jens.

We've had a few reports from people who're really pounding on NVMe
devices at scale, hence the timeout changes (and new module
parameters), hotplug cpu deadlock, tracepoints, and minor performance
tweaks"

[ Jens hadn't seen that tracepoint thing, but is ok with it - it will
end up going away when mq conversion happens ]

* git://git.infradead.org/users/willy/linux-nvme: (22 commits)
NVMe: Fix START_STOP_UNIT Scsi->NVMe translation.
NVMe: Use Log Page constants in SCSI emulation
NVMe: Define Log Page constants
NVMe: Fix hot cpu notification dead lock
NVMe: Rename io_timeout to nvme_io_timeout
NVMe: Use last bytes of f/w rev SCSI Inquiry
NVMe: Adhere to request queue block accounting enable/disable
NVMe: Fix nvme get/put queue semantics
NVMe: Delete NVME_GET_FEAT_TEMP_THRESH
NVMe: Make admin timeout a module parameter
NVMe: Make iod bio timeout a parameter
NVMe: Prevent possible NULL pointer dereference
NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD)
NVMe: Update data structures for NVMe 1.2
NVMe: Enable BUILD_BUG_ON checks
NVMe: Update namespace and controller identify structures to the 1.1a spec
NVMe: Flush with data support
NVMe: Configure support for block flush
NVMe: Add tracepoints
NVMe: Protect against badly formatted CQEs
...

Linus Torvalds
2014-06-16 09:58:03 +0800

12 Jun, 2014

1 commit

28747fcd2 block: remove WQ_POWER_EFFICIENT from kblockd ... Browse Code »

blk-mq issues async requests through kblockd. To issue a work request on
a specific CPU, kblockd_schedule_delayed_work_on is used. However, the
specific CPU choice may not be honored, if the power_efficient option
for workqueues is set. blk-mq requires that we have strict per-cpu
scheduling, so it wont work properly if kblockd is marked
POWER_EFFICIENT and power_efficient is set.

Remove the kblockd WQ_POWER_EFFICIENT flag to prevent this behavior.
This essentially reverts part of commit 695588f9454b, which added
the WQ_POWER_EFFICIENT marker to kblockd.

Signed-off-by: Matias Bjørling
Signed-off-by: Jens Axboe

Matias Bjørling
2014-06-12 05:53:39 +0800

06 Jun, 2014

1 commit

f27b087b8 block: add blk_rq_set_block_pc() ... Browse Code »
13

With the optimizations around not clearing the full request at alloc
time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC
up to the user allocating the request.

Add a blk_rq_set_block_pc() that sets the command type to
REQ_TYPE_BLOCK_PC, and properly initializes the members associated
with this type of request. Update callers to use this function instead
of manipulating rq->cmd_type directly.

Includes fixes from Christoph Hellwig for my half-assed
attempt.

Signed-off-by: Jens Axboe

Jens Axboe
2014-06-06 21:57:37 +0800

29 May, 2014

1 commit

4d92a9beb block: remove 'magic' from struct blk_plug ... Browse Code »

I don't think we've ever caught any bugs with this, and there's the
list poisoning for the plug lists to catch uninitialized cases.
So remove the magic member and save 8 bytes in the struct.

Signed-off-by: Jens Axboe

Jens Axboe
2014-05-29 22:09:00 +0800

28 May, 2014

1 commit

4ce01dd1a blk-mq: merge blk_mq_alloc_reserved_request into blk_mq_alloc_request ... Browse Code »

Instead of having two almost identical copies of the same code just let
the callers pass in the reserved flag directly.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-05-28 23:49:19 +0800

27 May, 2014

1 commit

3d2936f45 block: only allocate/free mq_usage_counter in blk-mq ... Browse Code »

The percpu counter is only used for blk-mq, so move
its allocation and free inside blk-mq, and don't
allocate it for legacy queue device.

Signed-off-by: Ming Lei
Signed-off-by: Jens Axboe

Ming Lei
2014-05-27 23:37:08 +0800

21 May, 2014

2 commits

da41a589f blk-mq: Micro-optimize blk_queue_nomerges() check ... Browse Code »

In blk_mq_make_request(), do the blk_queue_nomerges() check
outside the call to blk_attempt_plug_merge() to eliminate
function call overhead when nomerges=2 (disabled)

Signed-off-by: Robert Elliott
Signed-off-by: Jens Axboe

Robert Elliott
2014-05-21 05:49:03 +0800
e3a2b3f93 blk-mq: allow changing of queue depth through sysfs ... Browse Code »

For request_fn based devices, the block layer exports a 'nr_requests'
file through sysfs to allow adjusting of queue depth on the fly.
Currently this returns -EINVAL for blk-mq, since it's not wired up.
Wire this up for blk-mq, so that it now also always dynamic
adjustments of the allowed queue depth for any given block device
managed by blk-mq.

Signed-off-by: Jens Axboe

Jens Axboe
2014-05-21 01:49:02 +0800

10 May, 2014

1 commit

7276d02e2 block: only calculate part_in_flight() once ... Browse Code »

We first check if we have inflight IO, then retrieve that
same number again. Usually this isn't that costly since the
chance of having the data dirtied in between is small, but
there's no reason for calling part_in_flight() twice.

Signed-off-by: Jens Axboe

Jens Axboe
2014-05-10 05:48:23 +0800

05 May, 2014

1 commit

3291fa57c NVMe: Add tracepoints ... Browse Code »

Adding tracepoints for bio_complete and block_split into nvme to help
with gathering IO info using blktrace and blkparse.

Signed-off-by: Keith Busch
Signed-off-by: Matthew Wilcox

Keith Busch
2014-05-05 22:41:26 +0800

17 Apr, 2014

2 commits

12120077b block: export blk_finish_request ... Browse Code »

This allows to mirror the blk-mq code flow for more a more readable I/O
completion handler in SCSI.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2014-04-17 04:15:25 +0800
70f4db639 blk-mq: add blk_mq_delay_queue ... Browse Code »

Add a blk-mq equivalent to blk_delay_queue so that the scsi layer can ask
to be kicked again after a delay.

Signed-off-by: Christoph Hellwig

Modified by me to kill the unnecessary preempt disable/enable
in the delayed workqueue handler.

Signed-off-by: Jens Axboe

Christoph Hellwig
2014-04-17 04:15:25 +0800

16 Apr, 2014

2 commits

b4f42e283 block: remove struct request buffer member ... Browse Code »

This was used in the olden days, back when onions were proper
yellow. Basically it mapped to the current buffer to be
transferred. With highmem being added more than a decade ago,
most drivers map pages out of a bio, and rq->buffer isn't
pointing at anything valid.

Convert old style drivers to just use bio_data().

For the discard payload use case, just reference the page
in the bio.

Signed-off-by: Jens Axboe

Jens Axboe
2014-04-16 04:03:02 +0800
f89e0dd9d Merge tag 'v3.15-rc1' into for-3.16/core ... Browse Code »

We don't like this, but things have diverged with the blk-mq fixes
in 3.15-rc1. So merge it in.

Jens Axboe
2014-04-16 04:02:24 +0800

11 Apr, 2014

1 commit

21f9fcd81 block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO ... Browse Code »

This patch fixes coccinelle error regarding usage of IS_ERR and
PTR_ERR instead of PTR_ERR_OR_ZERO.

Signed-off-by: Duan Jiong
Signed-off-by: Jens Axboe

Duan Jiong
2014-04-11 22:09:47 +0800

10 Apr, 2014

3 commits

360f92c24 block: fix regression with block enabled tagging ... Browse Code »

Martin reported that his test system would not boot with
current git, it oopsed with this:

BUG: unable to handle kernel paging request at ffff88046c6c9e80
IP: [] blk_queue_start_tag+0x90/0x150
PGD 1ddf067 PUD 1de2067 PMD 47fc7d067 PTE 800000046c6c9060
Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: sd_mod lpfc(+) scsi_transport_fc scsi_tgt oracleasm
rpcsec_gss_krb5 ipv6 igb dca i2c_algo_bit i2c_core hwmon
CPU: 3 PID: 87 Comm: kworker/u17:1 Not tainted 3.14.0+ #246
Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
Workqueue: events_unbound async_run_entry_fn
task: ffff8802743c2150 ti: ffff880273d02000 task.ti: ffff880273d02000
RIP: 0010:[] []
blk_queue_start_tag+0x90/0x150
RSP: 0018:ffff880273d03a58 EFLAGS: 00010092
RAX: ffff88046c6c9e78 RBX: ffff880077208e78 RCX: 00000000fffc8da6
RDX: 00000000fffc186d RSI: 0000000000000009 RDI: 00000000fffc8d9d
RBP: ffff880273d03a88 R08: 0000000000000001 R09: ffff8800021c2410
R10: 0000000000000005 R11: 0000000000015b30 R12: ffff88046c5bb8a0
R13: ffff88046c5c0890 R14: 000000000000001e R15: 000000000000001e
FS: 0000000000000000(0000) GS:ffff880277b00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88046c6c9e80 CR3: 00000000018f6000 CR4: 00000000000407e0
Stack:
ffff880273d03a98 ffff880474b18800 0000000000000000 ffff880474157000
ffff88046c5c0890 ffff880077208e78 ffff880273d03ae8 ffffffff813b9e62
ffff880200000010 ffff880474b18968 ffff880474b18848 ffff88046c5c0cd8
Call Trace:
[] scsi_request_fn+0xf2/0x510
[] __blk_run_queue+0x37/0x50
[] blk_execute_rq_nowait+0xb3/0x130
[] blk_execute_rq+0x64/0xf0
[] ? bit_waitqueue+0xd0/0xd0
[] scsi_execute+0xe5/0x180
[] scsi_execute_req_flags+0x9a/0x110
[] sd_spinup_disk+0x94/0x460 [sd_mod]
[] ? __unmap_hugepage_range+0x200/0x2f0
[] sd_revalidate_disk+0xaa/0x3f0 [sd_mod]
[] sd_probe_async+0xd8/0x200 [sd_mod]
[] async_run_entry_fn+0x3f/0x140
[] process_one_work+0x175/0x410
[] worker_thread+0x123/0x400
[] ? manage_workers+0x160/0x160
[] kthread+0xce/0xf0
[] ? kthread_freezable_should_stop+0x70/0x70
[] ret_from_fork+0x7c/0xb0
[] ? kthread_freezable_should_stop+0x70/0x70
Code: 48 0f ab 11 72 db 48 81 4b 40 00 00 10 00 89 83 08 01 00 00 48 89
df 49 8b 04 24 48 89 1c d0 e8 f7 a8 ff ff 49 8b 85 28 05 00 00 89
58 08 48 89 03 49 8d 85 28 05 00 00 48 89 43 08 49 89 9d
RIP [] blk_queue_start_tag+0x90/0x150
RSP
CR2: ffff88046c6c9e80

Martin bisected and found this to be the problem patch;

commit 6d113398dcf4dfcd9787a4ead738b186f7b7ff0f
Author: Jan Kara
Date: Mon Feb 24 16:39:54 2014 +0100

block: Stop abusing rq->csd.list in blk-softirq

and the problem was immediately apparent. The patch states that
it is safe to reuse queuelist at completion time, since it is
no longer used. However, that is not true if a device is using
block enabled tagging. If that is the case, then the queuelist
is reused to keep track of busy tags. If a device also ended
up using softirq completions, we'd reuse ->queuelist for the
IPI handling while block tagging was still using it. Boom.

Fix this by adding a new ipi_list list head, and share the
memory used with the request hash table. The hash table is
never used after the request is moved to the dispatch list,
which happens long before any potential completion of the
request. Add a new request bit for this, so we don't have
cases that check rq->hash while it could potentially have
been reused for the IPI completion.

Reported-by: Martin K. Petersen
Tested-by: Benjamin Herrenschmidt
Signed-off-by: Jens Axboe

Jens Axboe
2014-04-10 11:54:06 +0800
8ab14595b block: add kblockd_schedule_delayed_work_on() ... Browse Code »

Same function as kblockd_schedule_delayed_work(), but allow the
caller to pass in a CPU that the work should be executed on. This
just directly extends and maps into the workqueue API, and will
be used to make the blk-mq mappings more strict.

Signed-off-by: Jens Axboe

Jens Axboe
2014-04-10 00:17:03 +0800
59c3d45e4 block: remove 'q' parameter from kblockd_schedule_*_work() ... Browse Code »

The queue parameter is never used, just get rid of it.

Signed-off-by: Jens Axboe

Jens Axboe
2014-04-10 00:17:00 +0800

03 Apr, 2014

1 commit

159d8133d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree updates from Jiri Kosina:
"Usual rocket science -- mostly documentation and comment updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
sparse: fix comment
doc: fix double words
isdn: capi: fix "CAPI_VERSION" comment
doc: DocBook: Fix typos in xml and template file
Bluetooth: add module name for btwilink
driver core: unexport static function create_syslog_header
mmc: core: typo fix in printk specifier
ARM: spear: clean up editing mistake
net-sysfs: fix comment typo 'CONFIG_SYFS'
doc: Insert MODULE_ in module-signing macros
Documentation: update URL to hfsplus Technote 1150
gpio: update path to documentation
ixgbe: Fix format string in ixgbe_fcoe.
Kconfig: Remove useless "default N" lines
user_namespace.c: Remove duplicated word in comment
CREDITS: fix formatting
treewide: Fix typo in Documentation/DocBook
mm: Fix warning on make htmldocs caused by slab.c
ata: ata-samsung_cf: cleanup in header file
idr: remove unused prototype of idr_free()

Linus Torvalds
2014-04-03 07:23:38 +0800