14 Sep, 2016
1 commit
-
In order to help determine the effectiveness of polling in a running
system it is usful to determine the ratio of how often the poll
function is called vs how often the completion is checked. For this
reason we add a poll_considered variable and add it to the sysfs entry
for io_poll.Signed-off-by: Stephen Bates
Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe
29 Aug, 2016
2 commits
-
We don't need the larger delayed work struct, since we always run it
immediately.Signed-off-by: Jens Axboe
-
Add a helper to schedule a regular struct work on a particular CPU.
Signed-off-by: Jens Axboe
17 Aug, 2016
1 commit
-
blk_set_queue_dying() can be called while another thread is
submitting I/O or changing queue flags, e.g. through dm_stop_queue().
Hence protect the QUEUE_FLAG_DYING flag change with locking.Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: Mike Snitzer
Cc: stable
Signed-off-by: Jens Axboe
08 Aug, 2016
1 commit
-
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.No intended functional changes in this commit.
Signed-off-by: Jens Axboe
27 Jul, 2016
1 commit
-
Pull block driver updates from Jens Axboe:
"This branch also contains core changes. I've come to the conclusion
that from 4.9 and forward, I'll be doing just a single branch. We
often have dependencies between core and drivers, and it's hard to
always split them up appropriately without pulling core into drivers
when that happens.That said, this contains:
- separate secure erase type for the core block layer, from
Christoph.- set of discard fixes, from Christoph.
- bio shrinking fixes from Christoph, as a followup up to the
op/flags change in the core branch.- map and append request fixes from Christoph.
- NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
exciting!- nvme-loop fixes from Arnd.
- removal of ->driverfs_dev from Dan, after providing a
device_add_disk() helper.- bcache fixes from Bhaktipriya and Yijing.
- cdrom subchannel read fix from Vchannaiah.
- set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.
- set of drbd updates and fixes from Fabian, Lars, and Philipp.
- mg_disk error path fix from Bart.
- user notification for failed device add for loop, from Minfei.
- NVMe in general:
+ NVMe delay quirk from Guilherme.
+ SR-IOV support and command retry limits from Keith.
+ fix for memory-less NUMA node from Masayoshi.
+ use UINT_MAX for discard sectors, from Minfei.
+ cancel IO fixes from Ming.
+ don't allocate unused major, from Neil.
+ error code fixup from Dan.
+ use constants for PSDT/FUSE from James.
+ variable init fix from Jay.
+ fabrics fixes from Ming, Sagi, and Wei.
+ various fixes"* 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
nvme/pci: Provide SR-IOV support
nvme: initialize variable before logical OR'ing it
block: unexport various bio mapping helpers
scsi/osd: open code blk_make_request
target: stop using blk_make_request
block: simplify and export blk_rq_append_bio
block: ensure bios return from blk_get_request are properly initialized
virtio_blk: use blk_rq_map_kern
memstick: don't allow REQ_TYPE_BLOCK_PC requests
block: shrink bio size again
block: simplify and cleanup bvec pool handling
block: get rid of bio_rw and READA
block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
NVMe: don't allocate unused nvme_major
nvme: avoid crashes when node 0 is memoryless node.
nvme: Limit command retries
loop: Make user notify for adding loop device failed
nvme-loop: fix nvme-loop Kconfig dependencies
nvmet: fix return value check in nvmet_subsys_alloc()
...
21 Jul, 2016
3 commits
-
I wish the OSD code could simply use blk_rq_map_* helpers like
everyone else, but the complex nature of deciding if we have
DATA IN and/or DATA OUT buffers might make this impossible
(at least for a mere human like me).But using blk_rq_append_bio at least allows sharing the setup code
between request with or without dat a buffers, and given that this
is the last user of blk_make_request it allows getting rid of that
somewhat awkward interface.Signed-off-by: Christoph Hellwig
Acked-by: Boaz Harrosh
Signed-off-by: Jens Axboe -
The target SCSI passthrough backend is much better served with the low-level
blk_rq_append_bio construct then the helpers built on top of it, so export it.Also use the opportunity to remove the pointless request_queue argument and
make the code flow a little more readable.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
blk_get_request is used for BLOCK_PC and similar passthrough requests.
Currently we always need to call blk_rq_set_block_pc or an open coded
version of it to allow appending bios using the request mapping helpers
later on, which is a somewhat awkward API. Instead move the
initialization part of blk_rq_set_block_pc into blk_get_request, so that
we always have a safe to use request.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
06 Jul, 2016
1 commit
-
The new NVMe over fabrics target will make use of this outside from a
module.Signed-off-by: Sagi Grimberg
Signed-off-by: Christoph Hellwig
Reviewed-by: Steve Wise
Signed-off-by: Jens Axboe
10 Jun, 2016
1 commit
-
If we're queuing REQ_PRIO IO and the task is running at an idle IO
class, then temporarily boost the priority. This prevents livelocks
due to priority inversion, when a low priority task is holding file
system resources while attempting to do IO.An example of that is shown below. An ioniced idle task is holding
the directory mutex, while a normal priority task is trying to do
a directory lookup.[478381.198925] ------------[ cut here ]------------
[478381.200315] INFO: task ionice:1168369 blocked for more than 120 seconds.
[478381.201324] Not tainted 4.0.9-38_fbk5_hotfix1_2936_g85409c6 #1
[478381.202278] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[478381.203462] ionice D ffff8803692736a8 0 1168369 1 0x00000080
[478381.203466] ffff8803692736a8 ffff880399c21300 ffff880276adcc00 ffff880369273698
[478381.204589] ffff880369273fd8 0000000000000000 7fffffffffffffff 0000000000000002
[478381.205752] ffffffff8177d5e0 ffff8803692736c8 ffffffff8177cea7 0000000000000000
[478381.206874] Call Trace:
[478381.207253] [] ? bit_wait_io_timeout+0x80/0x80
[478381.208175] [] schedule+0x37/0x90
[478381.208932] [] schedule_timeout+0x1dc/0x250
[478381.209805] [] ? __blk_run_queue+0x37/0x50
[478381.210706] [] ? ktime_get+0x45/0xb0
[478381.211489] [] io_schedule_timeout+0xa7/0x110
[478381.212402] [] ? prepare_to_wait+0x5b/0x90
[478381.213280] [] bit_wait_io+0x36/0x50
[478381.214063] [] __wait_on_bit+0x65/0x90
[478381.214961] [] ? bit_wait_io_timeout+0x80/0x80
[478381.215872] [] out_of_line_wait_on_bit+0x7c/0x90
[478381.216806] [] ? wake_atomic_t_function+0x40/0x40
[478381.217773] [] __wait_on_buffer+0x2a/0x30
[478381.218641] [] ext4_bread+0x57/0x70
[478381.219425] [] __ext4_read_dirblock+0x3c/0x380
[478381.220467] [] ext4_dx_find_entry+0x7d/0x170
[478381.221357] [] ? find_get_entry+0x1e/0xa0
[478381.222208] [] ext4_find_entry+0x484/0x510
[478381.223090] [] ext4_lookup+0x52/0x160
[478381.223882] [] lookup_real+0x1d/0x60
[478381.224675] [] __lookup_hash+0x38/0x50
[478381.225697] [] lookup_slow+0x45/0xab
[478381.226941] [] link_path_walk+0x7ae/0x820
[478381.227880] [] path_init+0xc2/0x430
[478381.228677] [] ? security_file_alloc+0x16/0x20
[478381.229776] [] path_openat+0x77/0x620
[478381.230767] [] ? page_add_file_rmap+0x2e/0x70
[478381.232019] [] do_filp_open+0x43/0xa0
[478381.233016] [] ? creds_are_invalid+0x29/0x70
[478381.234072] [] do_open_execat+0x70/0x170
[478381.235039] [] do_execveat_common.isra.36+0x1b8/0x6e0
[478381.236051] [] do_execve+0x2c/0x30
[478381.236809] [] ? getname+0x12/0x20
[478381.237564] [] SyS_execve+0x2e/0x40
[478381.238338] [] stub_execve+0x6d/0xa0
[478381.239126] ------------[ cut here ]------------
[478381.239915] ------------[ cut here ]------------
[478381.240606] INFO: task python2.7:1168375 blocked for more than 120 seconds.
[478381.242673] Not tainted 4.0.9-38_fbk5_hotfix1_2936_g85409c6 #1
[478381.243653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[478381.244902] python2.7 D ffff88005cf8fb98 0 1168375 1168248 0x00000080
[478381.244904] ffff88005cf8fb98 ffff88016c1f0980 ffffffff81c134c0 ffff88016c1f11a0
[478381.246023] ffff88005cf8ffd8 ffff880466cd0cbc ffff88016c1f0980 00000000ffffffff
[478381.247138] ffff880466cd0cc0 ffff88005cf8fbb8 ffffffff8177cea7 ffff88005cf8fcc8
[478381.248252] Call Trace:
[478381.248630] [] schedule+0x37/0x90
[478381.249382] [] schedule_preempt_disabled+0xe/0x10
[478381.250465] [] __mutex_lock_slowpath+0x92/0x100
[478381.251409] [] mutex_lock+0x1b/0x2f
[478381.252199] [] lookup_slow+0x36/0xab
[478381.253023] [] link_path_walk+0x7ae/0x820
[478381.253877] [] ? try_charge+0xc1/0x700
[478381.254690] [] path_init+0xc2/0x430
[478381.255525] [] ? security_file_alloc+0x16/0x20
[478381.256450] [] path_openat+0x77/0x620
[478381.257256] [] ? lru_cache_add_active_or_unevictable+0x2b/0xa0
[478381.258390] [] ? handle_mm_fault+0x13f3/0x1720
[478381.259309] [] do_filp_open+0x43/0xa0
[478381.260139] [] ? __alloc_fd+0x42/0x120
[478381.260962] [] do_sys_open+0x13c/0x230
[478381.261779] [] ? syscall_trace_enter_phase1+0x113/0x170
[478381.262851] [] SyS_open+0x22/0x30
[478381.263598] [] system_call_fastpath+0x12/0x17
[478381.264551] ------------[ cut here ]------------
[478381.265377] ------------[ cut here ]------------Signed-off-by: Jens Axboe
Reviewed-by: Jeff Moyer
09 Jun, 2016
1 commit
-
Instead of overloading the discard support with the REQ_SECURE flag.
Use the opportunity to rename the queue flag as well, and remove the
dead checks for this flag in the RAID 1 and RAID 10 drivers that don't
claim support for secure erase.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
08 Jun, 2016
10 commits
-
To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
We don't need bi_rw to be so large on 64 bit archs, so
reduce it to unsigned int.Signed-off-by: Mike Christie
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
This patch converts the is_sync helpers to use separate variables
for the operation and flags.Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
This patch converts the block layer merging code to use separate variables
for the operation and flags, and to check req_op for the REQ_OP.Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
This patch converts the elevator code to use separate variables
for the operation and flags, and to check req_op for the REQ_OP.Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
This patch prepares *_get_request/*_put_request and freed_request,
to use separate variables for the operation and flags. In the
next patches the struct request users will be converted like
was done for bios where the op and flags are set separately.Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
The bio users should now always be setting up the bio op. This patch
has the block layer copy that to the request.Signed-off-by: Mike Christie
Signed-off-by: Jens Axboe -
This patch converts the simple bi_rw use cases in the block,
drivers, mm and fs code to set/get the bio operation using
bio_set_op_attrs/bio_opThese should be simple one or two liner cases, so I just did them
in one patch. The next patches handle the more complicated
cases in a module per patch.Signed-off-by: Mike Christie
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
We currently set REQ_WRITE/WRITE for all non READ IOs
like discard, flush, writesame, etc. In the next patches where we
no longer set up the op as a bitmap, we will not be able to
detect a operation direction like writesame by testing if REQ_WRITE is
set.This patch converts the drivers and cgroup to use the
op_is_write helper. This should just cover the simple
cases. I did dm, md and bcache in their own patches
because they were more involved.Signed-off-by: Mike Christie
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe -
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw
instead of passing it in. This makes that use the same as
generic_make_request and how we set the other bio fields.Signed-off-by: Mike Christie
Fixed up fs/ext4/crypto.c
Signed-off-by: Jens Axboe
14 Apr, 2016
1 commit
-
Now that we converted everything to the newer block write cache
interface, kill off the queue flush_flags and queueable flush
entries.Signed-off-by: Jens Axboe
13 Apr, 2016
1 commit
-
We could kmalloc() the payload, so need the offset in page.
Signed-off-by: Ming Lin
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
05 Apr, 2016
1 commit
-
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.Let's stop pretending that pages in page cache are special. They are
not.The changes are pretty straight-forward:
- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds
19 Mar, 2016
1 commit
-
Pull libata updates from Tejun Heo:
- ahci grew runtime power management support so that the controller can
be turned off if no devices are attached.- sata_via isn't dead yet. It got hotplug support and more refined
workaround for certain WD drives.- Misc cleanups. There's a merge from for-4.5-fixes to avoid confusing
conflicts in ahci PCI ID table.* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: ahci_xgene: dereferencing uninitialized pointer in probe
AHCI: Remove obsolete Intel Lewisburg SATA RAID device IDs
ata: sata_rcar: Use ARCH_RENESAS
sata_via: Implement hotplug for VT6421
sata_via: Apply WD workaround only when needed on VT6421
ahci: Add runtime PM support for the host controller
ahci: Add functions to manage runtime PM of AHCI ports
ahci: Convert driver to use modern PM hooks
ahci: Cache host controller version
scsi: Drop runtime PM usage count after host is added
scsi: Set request queue runtime PM status back to active on resume
block: Add blk_set_runtime_active()
ata: ahci_mvebu: add support for Armada 3700 variant
libata: fix unbalanced spin_lock_irqsave/spin_unlock_irq() in ata_scsi_park_show()
libata: support AHCI on OCTEON platform
23 Feb, 2016
1 commit
-
Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
than if an underlying null_blk device were used directly. One of the
reasons for this drop in performance is that blk_insert_clone_request()
was calling blk_mq_insert_request() with @async=true. This forced the
use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
which ushered in ping-ponging between process context (fio in this case)
and kblockd's kworker to submit the cloned request. The ftrace
function_graph tracer showed:kworker-2013 => fio-12190
fio-12190 => kworker-2013
...
kworker-2013 => fio-12190
fio-12190 => kworker-2013
...Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
_not_ use kblockd to submit the cloned requests isn't enough to
eliminate the observed context switches.In addition to this dm-mq specific blk-core fix, there are 2 DM core
fixes to dm-mq that (when paired with the blk-core fix) completely
eliminate the observed context switching:1) don't blk_mq_run_hw_queues in blk-mq request completion
Motivated by desire to reduce overhead of dm-mq, punting to kblockd
just increases context switches.In my testing against a really fast null_blk device there was no benefit
to running blk_mq_run_hw_queues() on completion (and no other blk-mq
driver does this). So hopefully this change doesn't induce the need for
yet another revert like commit 621739b00e16ca2d !2) use blk_mq_complete_request() in dm_complete_request()
blk_complete_request() doesn't offer the traditional q->mq_ops vs
.request_fn branching pattern that other historic block interfaces
do (e.g. blk_get_request). Using blk_mq_complete_request() for
blk-mq requests is important for performance. It should be noted
that, like blk_complete_request(), blk_mq_complete_request() doesn't
natively handle partial completions -- but the request-based
DM-multipath target does provide the required partial completion
support by dm.c:end_clone_bio() triggering requeueing of the request
via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.dm-mq fix #2 is _much_ more important than #1 for eliminating the
context switches.
Before: cpu : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
After: cpu : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472With these changes multithreaded async read IOPs improved from ~950K
to ~1350K for this dm-mq stacked on null_blk test-case. The raw read
IOPs of the underlying null_blk device for the same workload is ~1950K.Fixes: 7fb4898e0 ("block: add blk-mq support to blk_insert_cloned_request()")
Fixes: bfebd1cdb ("dm: add full blk-mq support to request-based DM")
Cc: stable@vger.kernel.org # 4.1+
Reported-by: Sagi Grimberg
Signed-off-by: Mike Snitzer
Acked-by: Jens Axboe
19 Feb, 2016
1 commit
-
If block device is left runtime suspended during system suspend, resume
hook of the driver typically corrects runtime PM status of the device back
to "active" after it is resumed. However, this is not enough as queue's
runtime PM status is still "suspended". As long as it is in this state
blk_pm_peek_request() returns NULL and thus prevents new requests to be
processed.Add new function blk_set_runtime_active() that can be used to force the
queue status back to "active" as needed.Signed-off-by: Mika Westerberg
Acked-by: Jens Axboe
Signed-off-by: Tejun Heo
05 Feb, 2016
2 commits
-
When a storage device rejects a WRITE SAME command we will disable write
same functionality for the device and return -EREMOTEIO to the block
layer. -EREMOTEIO will in turn prevent DM from retrying the I/O and/or
failing the path.Yiwen Jiang discovered a small race where WRITE SAME requests issued
simultaneously would cause -EIO to be returned. This happened because
any requests being prepared after WRITE SAME had been disabled for the
device caused us to return BLKPREP_KILL. The latter caused the block
layer to return -EIO upon completion.To overcome this we introduce BLKPREP_INVALID which indicates that this
is an invalid request for the device. blk_peek_request() is modified to
return -EREMOTEIO in that case.Reported-by: Yiwen Jiang
Suggested-by: Mike Snitzer
Reviewed-by: Hannes Reinicke
Reviewed-by: Ewan Milne
Reviewed-by: Yiwen Jiang
Signed-off-by: Martin K. Petersen
22 Jan, 2016
1 commit
-
Pull NVMe updates from Jens Axboe:
"Last branch for this series is the nvme changes. It's in a separate
branch to avoid splitting too much between core and NVMe changes,
since NVMe is still helping drive some blk-mq changes. That said, not
a huge amount of core changes in here. The grunt of the work is the
continued split of the code"* 'for-4.5/nvme' of git://git.kernel.dk/linux-block: (67 commits)
uapi: update install list after nvme.h rename
NVMe: Export NVMe attributes to sysfs group
NVMe: Shutdown controller only for power-off
NVMe: IO queue deletion re-write
NVMe: Remove queue freezing on resets
NVMe: Use a retryable error code on reset
NVMe: Fix admin queue ring wrap
nvme: make SG_IO support optional
nvme: fixes for NVME_IOCTL_IO_CMD on the char device
nvme: synchronize access to ctrl->namespaces
nvme: Move nvme_freeze/unfreeze_queues to nvme core
PCI/AER: include header file
NVMe: Export namespace attributes to sysfs
NVMe: Add pci error handlers
block: remove REQ_NO_TIMEOUT flag
nvme: merge iod and cmd_info
nvme: meta_sg doesn't have to be an array
nvme: properly free resources for cancelled command
nvme: simplify completion handling
nvme: special case AEN requests
...
20 Jan, 2016
1 commit
-
Pull core block updates from Jens Axboe:
"We don't have a lot of core changes this time around, it's mostly in
drivers, which will come in a subsequent pull.The cores changes include:
- blk-mq
- Prep patch from Christoph, changing blk_mq_alloc_request() to
take flags instead of just using gfp_t for sleep/nosleep.
- Doc patch from me, clarifying the difference between legacy
and blk-mq for timer usage.
- Fixes from Raghavendra for memory-less numa nodes, and a reuse
of CPU masks.- Cleanup from Geliang Tang, using offset_in_page() instead of open
coding it.- From Ilya, rename request_queue slab to it reflects what it holds,
and a fix for proper use of bdgrab/put.- A real fix for the split across stripe boundaries from Keith. We
yanked a broken version of this from 4.4-rc final, this one works.- From Mike Krinkin, emit a trace message when we split.
- From Wei Tang, two small cleanups, not explicitly clearing memory
that is already cleared"* 'for-4.5/core' of git://git.kernel.dk/linux-block:
block: use bd{grab,put}() instead of open-coding
block: split bios to max possible length
block: add call to split trace point
blk-mq: Avoid memoryless numa node encoded in hctx numa_node
blk-mq: Reuse hardware context cpumask for tags
blk-mq: add a flags parameter to blk_mq_alloc_request
Revert "blk-flush: Queue through IO scheduler when flush not required"
block: clarify blk_add_timer() use case for blk-mq
bio: use offset_in_page macro
block: do not initialise statics to 0 or NULL
block: do not initialise globals to 0 or NULL
block: rename request_queue slab cache
29 Dec, 2015
1 commit
-
We currently only have an inline/sync helper to restart a stopped
queue. If drivers need an async version, they have to roll their
own. Add a generic helper instead.Signed-off-by: Jens Axboe
23 Dec, 2015
2 commits
-
blk_queue_bio() does split then bounce, which makes the segment
counting based on pages before bouncing and could go wrong. Move
the split to after bouncing, like we do for blk-mq, and the we
fix the issue of having the bio count for segments be wrong.Fixes: 54efd50bfd87 ("block: make generic_make_request handle arbitrarily sized bios")
Cc: stable@vger.kernel.org
Tested-by: Artem S. Tashkinov
Signed-off-by: Jens Axboe -
Timer context is not very useful for drivers to perform any meaningful abort
action from. So instead of calling the driver from this useless context
defer it to a workqueue as soon as possible.Note that while a delayed_work item would seem the right thing here I didn't
dare to use it due to the magic in blk_add_timer that pokes deep into timer
internals. But maybe this encourages Tejun to add a sensible API for that to
the workqueue API and we'll all be fine in the end :)Contains a major update from Keith Bush:
"This patch removes synchronizing the timeout work so that the timer can
start a freeze on its own queue. The timer enters the queue, so timer
context can only start a freeze, but not wait for frozen."Signed-off-by: Christoph Hellwig
Acked-by: Keith Busch
Signed-off-by: Jens Axboe
04 Dec, 2015
1 commit
-
The routines in scsi_pm.c assume that if a runtime-PM callback is
invoked for a SCSI device, it can only mean that the device's driver
has asked the block layer to handle the runtime power management (by
calling blk_pm_runtime_init(), which among other things sets q->dev).However, this assumption turns out to be wrong for things like the ses
driver. Normally ses devices are not allowed to do runtime PM, but
userspace can override this setting. If this happens, the kernel gets
a NULL pointer dereference when blk_post_runtime_resume() tries to use
the uninitialized q->dev pointer.This patch fixes the problem by checking q->dev in block layer before
handle runtime PM. Since ses doesn't define any PM callbacks and call
blk_pm_runtime_init(), the crash won't occur.This fixes Bugzilla #101371.
https://bugzilla.kernel.org/show_bug.cgi?id=101371More discussion can be found from below link.
http://marc.info/?l=linux-scsi&m=144163730531875&w=2Signed-off-by: Ken Xue
Acked-by: Alan Stern
Cc: Xiangliang Yu
Cc: James E.J. Bottomley
Cc: Jens Axboe
Cc: Michael Terry
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe
02 Dec, 2015
1 commit
-
We already have the reserved flag, and a nowait flag awkwardly encoded as
a gfp_t. Add a real flags argument to make the scheme more extensible and
allow for a nicer calling convention.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
30 Nov, 2015
1 commit
-
When a cloned request is retried on other queues it always needs
to be checked against the queue limits of that queue.
Otherwise the calculations for nr_phys_segments might be wrong,
leading to a crash in scsi_init_sgtable().To clarify this the patch renames blk_rq_check_limits()
to blk_cloned_rq_check_limits() and removes the symbol
export, as the new function should only be used for
cloned requests and never exported.Cc: Mike Snitzer
Cc: Ewan Milne
Cc: Jeff Moyer
Signed-off-by: Hannes Reinecke
Fixes: e2a60da74 ("block: Clean up special command handling logic")
Cc: stable@vger.kernel.org # 3.7+
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe
25 Nov, 2015
2 commits
-
This patch fixes the checkpatch.pl error to blk-exec.c:
ERROR: do not initialise globals to 0 or NULL
Signed-off-by: Wei Tang
Signed-off-by: Jens Axboe -
Name the cache after the actual name of the struct.
Signed-off-by: Ilya Dryomov
Signed-off-by: Jens Axboe