01 Aug, 2009
3 commits
-
When stacking block devices ensure that optimal I/O size is scaled
accordingly.Signed-off-by: Martin K. Petersen
Reviewed-by: Mike Snitzer
Signed-off-by: Jens Axboe -
Introduce blk_limits_io_min() and make blk_queue_io_min() call it.
Signed-off-by: Mike Snitzer
Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe -
blk_queue_stack_limits() has been superceded by blk_stack_limits() and
disk_stack_limits(). Wrap the function call for now, we'll deprecate it
later.Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe
29 Jul, 2009
1 commit
-
Prior to the change for more sane end_io functions, we exported
the helpers with the normal EXPORT_SYMBOL(). That got changed
to _GPL() for the new interface. Revert that particular change,
on the basis that this is basic functionality and doesn't dip
into internal structures. If these exports can't be non-GPL,
then we may as well make EXPORT_SYMBOL() imply GPL for
everything.Signed-off-by: Jens Axboe
28 Jul, 2009
2 commits
-
blk_integrity_unregister should use kobject_put to release the kobject,
otherwise after bi is freed, memory of bi->kobj->name is leaked.Signed-off-by: Xiaotian Feng
Signed-off-by: Jens Axboe -
Move the assignment of a default lock below blk_init_queue() to
blk_queue_make_request(), so we also get to set the default lock
for ->make_request_fn() based drivers. This is important since the
queue flag locking requires a lock to be in place.Signed-off-by: Jens Axboe
17 Jul, 2009
2 commits
-
In blk-sysfs.c, queue_var_store uses unsigned long to store data,
but queue_var_show uses unsigned int to show data. This causes,# echo 70000000000 > /sys/block//queue/read_ahead_kb
# cat /sys/block//queue/read_ahead_kb => get wrong valueFix it by using unsigned long.
While at it, convert queue_rq_affinity_show() such that it uses bool
variable instead of explicit != 0 testing.Signed-off-by: Xiaotian Feng
Signed-off-by: Tejun Heo -
Commit ab0fd1debe730ec9998678a0c53caefbd121ed10 tries to prevent merge
of requests with different failfast settings. In elv_rq_merge_ok(),
it compares new bio's failfast flags against the merge target
request's. However, the flag testing accessors for bio and blk don't
return boolean but the tested bit value directly and FAILFAST on bio
and blk don't match, so directly comparing them with == results in
false negative unnecessary preventing merge of readahead requests.This patch convert the results to boolean by negating them before
comparison.Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Boaz Harrosh
Cc: FUJITA Tomonori
Cc: James Bottomley
Cc: Jeff Garzik
11 Jul, 2009
2 commits
-
In case memory is scarce, we now default to oom_cfqq. Once memory is
available again, we should allocate a new cfqq and stop using oom_cfqq for
a particular io context.Once a new request comes in, check if we are using oom_cfqq, and if yes,
try to allocate a new cfqq.Tested the patch by forcing the use of oom_cfqq and upon next request thread
realized that it was using oom_cfqq and it allocated a new cfqq.Signed-off-by: Vivek Goyal
Signed-off-by: Jens Axboe -
Currently, blk_scsi_ioctl_init() is not called since it lacks
an initcall marking. This causes the command table to be
unitialized, hence somce commands are block when they should
not have been.This fixes a regression introduced by commit
018e0446890661504783f92388ecce7138c1566dSigned-off-by: FUJITA Tomonori
Signed-off-by: Jens Axboe
04 Jul, 2009
1 commit
-
Block layer used to merge requests and bios with different failfast
settings. This caused regular IOs to fail prematurely when they were
merged into failfast requests for readahead.Niel Lambrechts could trigger the problem semi-reliably on ext4 when
resuming from STR. ext4 uses readahead when reading inodes and
combined with the deterministic extra SATA PHY exception cycle during
resume on the specific configuration, non-readahead inode read would
fail causing ext4 errors. Please read the following thread for
details.http://lkml.org/lkml/2009/5/23/21
This patch makes block layer reject merging if the failfast settings
don't match. This is correct but likely to lower IO performance by
preventing regular IOs from mingling into surrounding readahead
requests. Changes to allow such mixed merges and handle errors
correctly will be added later.Signed-off-by: Tejun Heo
Reported-by: Niel Lambrechts
Cc: Theodore Tso
Signed-off-by: Jens Axboe
01 Jul, 2009
6 commits
-
With the changes for falling back to an oom_cfqq, we never fail
to find/allocate a queue in cfq_get_queue(). So remove the check.Signed-off-by: Shan Wei
Signed-off-by: Jens Axboe -
The next_ordered flag is only meaningful for devices that use __make_request.
So move the test against next_ordered out of generic code and in to
__make_requestSince this test was added, barriers have not worked on md or any
devices that don't use __make_request and so don't bother to set
next_ordered. (dm explicitly sets something other than
QUEUE_ORDERED_NONE since
commit 99360b4c18f7675b50d283301d46d755affe75fd
but notes in the comments that it is otherwise meaningless).Cc: Ken Milmore
Cc: stable@kernel.org
Signed-off-by: NeilBrown
Signed-off-by: Jens Axboe -
The initial patches to support this through sysfs export were broken
and have been if 0'ed out in any release. So lets just kill the code
and reclaim some space in struct request_queue, if anyone would later
like to fixup the sysfs bits, the git history can easily restore
the removed bits.Signed-off-by: Jens Axboe
-
This patch restores stacking ability to the block layer integrity
infrastructure by creating a set of dedicated bip slabs. Each bip slab
has an embedded bio_vec array at the end. This cuts down on memory
allocations and also simplifies the code compared to the original bvec
version. Only the largest bip slab is backed by a mempool. The pool is
contained in the bio_set so stacking drivers can ensure forward
progress.Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe -
Setup an emergency fallback cfqq that we allocate at IO scheduler init
time. If the slab allocation fails in cfq_find_alloc_queue(), we'll just
punt IO to that cfqq instead. This ensures that cfq_find_alloc_queue()
never fails without having to ensure free memory.On cfqq lookup, always try to allocate a new cfqq if the given cfq io
context has the oom_cfqq assigned. This ensures that we only temporarily
punt to this shared queue.Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe -
We're going to be needing that init code outside of that function
to get rid of the __GFP_NOFAIL in cfqq allocation.Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe
21 Jun, 2009
1 commit
-
The SMP handler (sas_smp_request) was fixed to use the block API
properly, so we don't need this workaround to avoid blk_put_request()
warning.Signed-off-by: FUJITA Tomonori
Signed-off-by: James Bottomley
19 Jun, 2009
2 commits
-
Warning(block/blk-settings.c:108): No description found for parameter 'lim'
Warning(block/blk-settings.c:108): Excess function parameter 'limits' description in 'blk_set_default_limits'Signed-off-by: Randy Dunlap
Signed-off-by: Jens Axboe -
Follow-up to "block: enable by default support for large devices
and files on 32-bit archs".Rename CONFIG_LBD to CONFIG_LBDAF to:
- allow update of existing [def]configs for "default y" change
- reflect that it is used also for large files support nowadaysSigned-off-by: Bartlomiej Zolnierkiewicz
Signed-off-by: Jens Axboe
18 Jun, 2009
1 commit
-
Correct stacking bounce_pfn limit setting and prevent warnings on
32-bit.Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe
17 Jun, 2009
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (64 commits)
debugfs: use specified mode to possibly mark files read/write only
debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem.
xen: remove driver_data direct access of struct device from more drivers
usb: gadget: at91_udc: remove driver_data direct access of struct device
uml: remove driver_data direct access of struct device
block/ps3: remove driver_data direct access of struct device
s390: remove driver_data direct access of struct device
parport: remove driver_data direct access of struct device
parisc: remove driver_data direct access of struct device
of_serial: remove driver_data direct access of struct device
mips: remove driver_data direct access of struct device
ipmi: remove driver_data direct access of struct device
infiniband: ehca: remove driver_data direct access of struct device
ibmvscsi: gadget: at91_udc: remove driver_data direct access of struct device
hvcs: remove driver_data direct access of struct device
xen block: remove driver_data direct access of struct device
thermal: remove driver_data direct access of struct device
scsi: remove driver_data direct access of struct device
pcmcia: remove driver_data direct access of struct device
PCIE: remove driver_data direct access of struct device
...Manually fix up trivial conflicts due to different direct driver_data
direct access fixups in drivers/block/{ps3disk.c,ps3vram.c}
16 Jun, 2009
7 commits
-
When porting blktrace to tracepoints, we changed to trace/block.h
for trace prober declarations.Signed-off-by: Li Zefan
Signed-off-by: Jens Axboe -
DM reuses the request queue when swapping in a new device table
Introduce blk_set_default_limits() which can be used to reset the the
queue_limits prior to stacking devices.Signed-off-by: Martin K. Petersen
Acked-by: Alasdair G Kergon
Acked-by: Mike Snitzer
Signed-off-by: Jens Axboe -
I noticed a blank line in blktrace output. This patch fixes that.
Signed-off-by: Jeff Moyer
Signed-off-by: Jens Axboe -
Move the defaults to where we do the init of the backing_dev_info.
Signed-off-by: Jens Axboe
-
Actually, last_end_request in cfq_data isn't used now. So lets
just remove it.Signed-off-by: Gui Jianfeng
Signed-off-by: Jens Axboe -
This adds support to the BSG driver to report the proper device name to
userspace for the bsg devices.Signed-off-by: Kay Sievers
Signed-off-by: Jan Blunck
Signed-off-by: Greg Kroah-Hartman -
This adds support for block drivers to report their requested nodename
to userspace. It also updates a number of block drivers to provide the
needed subdirectory and device name to be used for them.Signed-off-by: Kay Sievers
Signed-off-by: Jan Blunck
Signed-off-by: Greg Kroah-Hartman
12 Jun, 2009
3 commits
-
Fix kernel-doc warnings in recently changed block/ source code.
Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds -
* 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
block: add request clone interface (v2)
floppy: fix hibernation
ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
fs/bio.c: add missing __user annotation
block: prevent possible io_context->refcount overflow
Add serial number support for virtio_blk, V4a
block: Add missing bounce_pfn stacking and fix comments
Revert "block: Fix bounce limit setting in DM"
cciss: decode unit attention in SCSI error handling code
cciss: Remove no longer needed sendcmd reject processing code
cciss: change SCSI error handling routines to work with interrupts enabled.
cciss: separate error processing and command retrying code in sendcmd_withirq_core()
cciss: factor out fix target status processing code from sendcmd functions
cciss: simplify interface of sendcmd() and sendcmd_withirq()
cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
block: needs to set the residual length of a bidi request
Revert "block: implement blkdev_readpages"
block: Fix bounce limit setting in DM
Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
...Manually fix conflicts with tracing updates in:
block/blk-sysfs.c
drivers/ide/ide-atapi.c
drivers/ide/ide-cd.c
drivers/ide/ide-floppy.c
drivers/ide/ide-tape.c
include/trace/events/block.h
kernel/trace/blktrace.c -
* 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (28 commits)
ide-tape: fix debug call
alim15x3: Remove historical hacks, re-enable init_hwif for PowerPC
ide-dma: don't reset request fields on dma_timeout_retry()
ide: drop rq->data handling from ide_map_sg()
ide-atapi: kill unused fields and callbacks
ide-tape: simplify read/write functions
ide-tape: use byte size instead of sectors on rw issue functions
ide-tape: unify r/w init paths
ide-tape: kill idetape_bh
ide-tape: use standard data transfer mechanism
ide-tape: use single continuous buffer
ide-atapi,tape,floppy: allow ->pc_callback() to change rq->data_len
ide-tape,floppy: fix failed command completion after request sense
ide-pm: don't abuse rq->data
ide-cd,atapi: use bio for internal commands
ide-atapi: convert ide-{floppy,tape} to using preallocated sense buffer
ide-cd: convert to using generic sense request
ide: add helpers for preparing sense requests
ide-cd: don't abuse rq->buffer
ide-atapi: don't abuse rq->buffer
...
11 Jun, 2009
3 commits
-
This patch adds the following 2 interfaces for request-stacking drivers:
- blk_rq_prep_clone(struct request *clone, struct request *orig,
struct bio_set *bs, gfp_t gfp_mask,
int (*bio_ctr)(struct bio *, struct bio*, void *),
void *data)
* Clones bios in the original request to the clone request
(bio_ctr is called for each cloned bios.)
* Copies attributes of the original request to the clone request.
The actual data parts (e.g. ->cmd, ->buffer, ->sense) are not
copied.- blk_rq_unprep_clone(struct request *clone)
* Frees cloned bios from the clone request.Request stacking drivers (e.g. request-based dm) need to make a clone
request for a submitted request and dispatch it to other devices.To allocate request for the clone, request stacking drivers may not
be able to use blk_get_request() because the allocation may be done
in an irq-disabled context.
So blk_rq_prep_clone() takes a request allocated by the caller
as an argument.For each clone bio in the clone request, request stacking drivers
should be able to set up their own completion handler.
So blk_rq_prep_clone() takes a callback function which is called
for each clone bio, and a pointer for private data which is passed
to the callback.NOTE:
blk_rq_prep_clone() doesn't copy any actual data of the original
request. Pages are shared between original bios and cloned bios.
So caller must not complete the original request before the clone
request.Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Cc: Boaz Harrosh
Signed-off-by: Jens Axboe -
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits)
Revert "x86, bts: reenable ptrace branch trace support"
tracing: do not translate event helper macros in print format
ftrace/documentation: fix typo in function grapher name
tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK
tracing: add protection around module events unload
tracing: add trace_seq_vprint interface
tracing: fix the block trace points print size
tracing/events: convert block trace points to TRACE_EVENT()
ring-buffer: fix ret in rb_add_time_stamp
ring-buffer: pass in lockdep class key for reader_lock
tracing: add annotation to what type of stack trace is recorded
tracing: fix multiple use of __print_flags and __print_symbolic
tracing/events: fix output format of user stack
tracing/events: fix output format of kernel stack
tracing/trace_stack: fix the number of entries in the header
ring-buffer: discard timestamps that are at the start of the buffer
ring-buffer: try to discard unneeded timestamps
ring-buffer: fix bug in ring_buffer_discard_commit
ftrace: do not profile functions when disabled
tracing: make trace pipe recognize latency format flag
... -
Currently io_context has an atomic_t(32-bit) as refcount. In the case of
cfq, for each device against whcih a task does I/O, a reference to the
io_context would be taken. And when there are multiple process sharing
io_contexts(CLONE_IO) would also have a reference to the same io_context.Theoretically the possible maximum number of processes sharing the same
io_context + the number of disks/cfq_data referring to the same io_context
can overflow the 32-bit counter on a very high-end machine.Even though it is an improbable case, let us make it atomic_long_t.
Signed-off-by: Nikanth Karthikesan
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe
10 Jun, 2009
1 commit
-
TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
these new capabilities to this tracepoint:- zero-copy and per-cpu splice() tracing
- binary tracing without printf overhead
- structured logging records exposed under /debug/tracing/events
- trace events embedded in function tracer output and other plugins
- user-defined, per tracepoint filter expressions
...Cons:
- no dev_t info for the output of plug, unplug_timer and unplug_io events.
no dev_t info for getrq and sleeprq events if bio == NULL.
no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.This is mainly because we can't get the deivce from a request queue.
But this may change in the future.- A packet command is converted to a string in TP_assign, not TP_print.
While blktrace do the convertion just before output.Since pc requests should be rather rare, this is not a big issue.
- In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
has a unique format, which means we have some unused data in a trace entry.The overhead is minimized by using __dynamic_array() instead of __array().
I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:
dd dd + ioctl blktrace dd + TRACE_EVENT (splice)
1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s
2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s
3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/sSo the overhead of tracing is very small, and no regression when using
those trace events vs blktrace.And the binary output of TRACE_EVENT is much smaller than blktrace:
# ls -l -h
-rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
-rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
-rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.outFollowing are some comparisons between TRACE_EVENT and blktrace:
plug:
kjournald-480 [000] 303.084981: block_plug: [kjournald]
kjournald-480 [000] 303.084981: 8,0 P N [kjournald]unplug_io:
kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1
kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1remap:
kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 v3:- use the newly introduced __dynamic_array().
Changelog from v1 -> v2:
- use __string() instead of __array() to minimize the memory required
to store hex dump of rq->cmd().- support large pc requests.
- add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.
- some cleanups.
Signed-off-by: Li Zefan
LKML-Reference:
Signed-off-by: Steven Rostedt
09 Jun, 2009
4 commits
-
Due to commit 1cd96c242a829d52f7a5ae98f554ca9775429685 ("block: WARN
in __blk_put_request() for potential bio leak"), BSG SMP requests get
the false warnings:WARNING: at block/blk-core.c:1068 __blk_put_request+0x52/0xc0()
This sets rq->bio to NULL to avoid that false warnings.
Signed-off-by: FUJITA Tomonori
Signed-off-by: Jens Axboe -
DM no longer needs to set limits explicitly when calling blk_stack_limits.
Let the latter automatically deal with bounce_pfn scaling.Fix kerneldoc variable names.
Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe -
This reverts commit a05c0205ba031c01bba33a21bf0a35920eb64833.
DM doesn't need to access the bounce_pfn directly.
Signed-off-by: Jens Axboe
-
Tejun's "block: set rq->resid_len to blk_rq_bytes() on issue" patch
seems to be incomplete; It doesn't set rq->resid_len to blk_rq_bytes()
for a bidi request (req->next_rq). As a result, all bidi users are
broken.Signed-off-by: FUJITA Tomonori
Acked-by: Tejun Heo
Signed-off-by: Jens Axboe